espnet2.gan_svs.post_frontend.fused.FusedPostFrontends
espnet2.gan_svs.post_frontend.fused.FusedPostFrontends
class espnet2.gan_svs.post_frontend.fused.FusedPostFrontends(postfrontends=None, align_method='linear_projection', proj_dim=100, fs=16000, input_fs=24000)
Bases: AbsFrontend
FusedPostFrontends combines multiple post frontends using specified alignment methods. Currently, only the linear projection method is supported for fusing.
align_method
The method used for aligning features. Default is “linear_projection”.
- Type: str
proj_dim
The dimension of the projection done on each postfrontend. Default is 100.
- Type: int
postfrontends
A list of the postfrontends to combine.
- Type: torch.nn.ModuleList
gcd
The greatest common divisor of hop lengths of all postfrontends.
- Type: int
factors
A list of factors used for reshaping the features from each postfrontend.
- Type: list
projection_layers
A list of linear layers for projecting the output of each postfrontend.
Type: torch.nn.ModuleList
Parameters:
- postfrontends (list) – A list of dictionaries, where each dictionary contains the configuration for a postfrontend. Each should specify the ‘postfrontend_type’ and relevant parameters for initialization.
- align_method (str , optional) – The method for feature alignment. Defaults to “linear_projection”.
- proj_dim (int , optional) – The dimension of the projected features. Defaults to 100.
- fs (int , optional) – Sampling frequency for the postfrontends. Defaults to 16000.
- input_fs (int , optional) – Input sampling frequency. Defaults to 24000.
Returns: A tuple containing the concatenated projected features and their lengths.
Return type: Tuple[torch.Tensor, torch.Tensor]
Raises:NotImplementedError – If an unsupported postfrontend type is specified or an unsupported alignment method is requested.
######### Examples
>>> postfrontends_config = [
... {
... "postfrontend_type": "s3prl",
... "postfrontend_conf": {},
... "download_dir": "/path/to/dir",
... "multilayer_feature": True,
... }
... ]
>>> fused_frontend = FusedPostFrontends(postfrontends=postfrontends_config)
>>> input_tensor = torch.randn(5, 24000) # Batch of 5, 24000 samples
>>> input_lengths = torch.tensor([24000] * 5) # All inputs are of length 24000
>>> output_feats, output_lengths = fused_frontend(input_tensor, input_lengths)
####### NOTE This class currently supports only the S3PRL postfrontend type. Future implementations may include additional postfrontend types.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(input: Tensor, input_lengths: Tensor) → Tuple[Tensor, Tensor]
Forward pass for the FusedPostFrontends class, which processes input data
through multiple post-frontends and aligns their features using a specified alignment method. Currently, only the linear projection method is supported.
- Parameters:
- input (torch.Tensor) – Input tensor containing the audio features.
- input_lengths (torch.Tensor) – Lengths of the input sequences.
- Returns: A tuple containing: : - A tensor with the concatenated features from all post-frontends after applying the linear projection and reshaping.
- A tensor with the lengths of the processed features.
- Return type: Tuple[torch.Tensor, torch.Tensor]
- Raises:NotImplementedError – If an unsupported alignment method is specified.
######### Examples
>>> fused_frontend = FusedPostFrontends(postfrontends=[{"postfrontend_type": "s3prl",
... "postfrontend_conf": {}, "download_dir": "/path/to/dir",
... "multilayer_feature": False}])
>>> input_tensor = torch.randn(10, 16000) # Example input tensor
>>> input_lengths = torch.tensor([16000] * 10) # Example input lengths
>>> output_feats, output_lengths = fused_frontend.forward(input_tensor, input_lengths)
####### NOTE Ensure that the input tensor is correctly shaped and that the input lengths are accurate to avoid runtime errors during processing.
output_size() → int
A class to fuse multiple post frontends using linear projection.
This class combines different post frontends, specifically S3PRL post frontends, and aligns their outputs using a linear projection method. It is designed to facilitate the integration of various frontend features into a single representation.
align_method
The method used for fusing the frontends. Currently supports only “linear_projection”.
- Type: str
proj_dim
The dimensionality of the projection done on each post frontend.
- Type: int
postfrontends
A list of the post frontends to combine.
- Type: ModuleList
gcd
The greatest common divisor of the hop lengths of the post frontends.
- Type: int
factors
A list of factors derived from the hop lengths of the post frontends.
- Type: list
projection_layers
A list of linear layers for projecting features from each post frontend.
Type: ModuleList
Parameters:
- postfrontends (list) – A list of dictionaries, each containing configurations for the post frontends.
- align_method (str) – Method to align features, defaults to “linear_projection”.
- proj_dim (int) – Dimension of the projection, defaults to 100.
- fs (int) – Sampling frequency, defaults to 16000.
- input_fs (int) – Input sampling frequency, defaults to 24000.
Returns: The total output size of the fused features.
Return type: int
######### Examples
>>> post_frontends = [
... {
... "postfrontend_type": "s3prl",
... "postfrontend_conf": {...},
... "download_dir": "/path/to/download",
... "multilayer_feature": True,
... }
... ]
>>> fused_frontend = FusedPostFrontends(postfrontends=post_frontends)
>>> output_size = fused_frontend.output_size()
>>> print(output_size)
200 # Assuming proj_dim is 100 and there are 2 post frontends
####### NOTE The class currently only supports S3PRL post frontends. Any other type will raise a NotImplementedError.
- Raises:NotImplementedError – If a post frontend type other than “s3prl” is provided.