espnet2.gan_svs.post_frontend.fused.FusedPostFrontends

About 3 min

espnet2.gan_svs.post_frontend.fused.FusedPostFrontends

class espnet2.gan_svs.post_frontend.fused.FusedPostFrontends(postfrontends=None, align_method='linear_projection', proj_dim=100, fs=16000, input_fs=24000)

Bases: AbsFrontend

FusedPostFrontends combines multiple post frontends using specified alignment methods. Currently, only the linear projection method is supported for fusing.

align_method

The method used for aligning features. Default is “linear_projection”.

Type: str

proj_dim

The dimension of the projection done on each postfrontend. Default is 100.

Type: int

postfrontends

A list of the postfrontends to combine.

Type: torch.nn.ModuleList

gcd

The greatest common divisor of hop lengths of all postfrontends.

Type: int

factors

A list of factors used for reshaping the features from each postfrontend.

Type: list

projection_layers

A list of linear layers for projecting the output of each postfrontend.

Type: torch.nn.ModuleList
Parameters:
- postfrontends (list) – A list of dictionaries, where each dictionary contains the configuration for a postfrontend. Each should specify the ‘postfrontend_type’ and relevant parameters for initialization.
- align_method (str , optional) – The method for feature alignment. Defaults to “linear_projection”.
- proj_dim (int , optional) – The dimension of the projected features. Defaults to 100.
- fs (int , optional) – Sampling frequency for the postfrontends. Defaults to 16000.
- input_fs (int , optional) – Input sampling frequency. Defaults to 24000.
Returns: A tuple containing the concatenated projected features and their lengths.
Return type: Tuple[torch.Tensor, torch.Tensor]
Raises:NotImplementedError – If an unsupported postfrontend type is specified or an unsupported alignment method is requested.

######### Examples

>>> postfrontends_config = [
...     {
...         "postfrontend_type": "s3prl",
...         "postfrontend_conf": {},
...         "download_dir": "/path/to/dir",
...         "multilayer_feature": True,
...     }
... ]
>>> fused_frontend = FusedPostFrontends(postfrontends=postfrontends_config)
>>> input_tensor = torch.randn(5, 24000)  # Batch of 5, 24000 samples
>>> input_lengths = torch.tensor([24000] * 5)  # All inputs are of length 24000
>>> output_feats, output_lengths = fused_frontend(input_tensor, input_lengths)

####### NOTE This class currently supports only the S3PRL postfrontend type. Future implementations may include additional postfrontend types.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(input: Tensor, input_lengths: Tensor) → Tuple[Tensor, Tensor]

Forward pass for the FusedPostFrontends class, which processes input data

through multiple post-frontends and aligns their features using a specified alignment method. Currently, only the linear projection method is supported.

Parameters:
- input (torch.Tensor) – Input tensor containing the audio features.
- input_lengths (torch.Tensor) – Lengths of the input sequences.
Returns: A tuple containing: : - A tensor with the concatenated features from all post-frontends after applying the linear projection and reshaping.
- A tensor with the lengths of the processed features.
Return type: Tuple[torch.Tensor, torch.Tensor]
Raises:NotImplementedError – If an unsupported alignment method is specified.

######### Examples

>>> fused_frontend = FusedPostFrontends(postfrontends=[{"postfrontend_type": "s3prl",
... "postfrontend_conf": {}, "download_dir": "/path/to/dir",
... "multilayer_feature": False}])
>>> input_tensor = torch.randn(10, 16000)  # Example input tensor
>>> input_lengths = torch.tensor([16000] * 10)  # Example input lengths
>>> output_feats, output_lengths = fused_frontend.forward(input_tensor, input_lengths)

####### NOTE Ensure that the input tensor is correctly shaped and that the input lengths are accurate to avoid runtime errors during processing.

output_size() → int

A class to fuse multiple post frontends using linear projection.

This class combines different post frontends, specifically S3PRL post frontends, and aligns their outputs using a linear projection method. It is designed to facilitate the integration of various frontend features into a single representation.

align_method

The method used for fusing the frontends. Currently supports only “linear_projection”.

Type: str

proj_dim

The dimensionality of the projection done on each post frontend.

Type: int

postfrontends

A list of the post frontends to combine.

Type: ModuleList

gcd

The greatest common divisor of the hop lengths of the post frontends.

Type: int

factors

A list of factors derived from the hop lengths of the post frontends.

Type: list

projection_layers

A list of linear layers for projecting features from each post frontend.

Type: ModuleList
Parameters:
- postfrontends (list) – A list of dictionaries, each containing configurations for the post frontends.
- align_method (str) – Method to align features, defaults to “linear_projection”.
- proj_dim (int) – Dimension of the projection, defaults to 100.
- fs (int) – Sampling frequency, defaults to 16000.
- input_fs (int) – Input sampling frequency, defaults to 24000.
Returns: The total output size of the fused features.
Return type: int

######### Examples

>>> post_frontends = [
...     {
...         "postfrontend_type": "s3prl",
...         "postfrontend_conf": {...},
...         "download_dir": "/path/to/download",
...         "multilayer_feature": True,
...     }
... ]
>>> fused_frontend = FusedPostFrontends(postfrontends=post_frontends)
>>> output_size = fused_frontend.output_size()
>>> print(output_size)
200  # Assuming proj_dim is 100 and there are 2 post frontends

####### NOTE The class currently only supports S3PRL post frontends. Any other type will raise a NotImplementedError.

Raises:NotImplementedError – If a post frontend type other than “s3prl” is provided.