espnet2.gan_codec.shared.decoder.seanet_2d.SEANetDecoder2d

About 3 min

espnet2.gan_codec.shared.decoder.seanet_2d.SEANetDecoder2d

class espnet2.gan_codec.shared.decoder.seanet_2d.SEANetDecoder2d(channels: int = 1, dimension: int = 128, n_filters: int = 32, n_residual_layers: int = 1, ratios: List[Tuple[int, int]] = [(4, 1), (4, 1), (4, 2), (4, 1)], activation: str = 'ELU', activation_params: dict = {'alpha': 1.0}, final_activation: str | None = None, final_activation_params: dict | None = None, norm: str = 'weight_norm', norm_params: Dict[str, Any] = {}, kernel_size: int = 7, last_kernel_size: int = 7, residual_kernel_size: int = 3, dilation_base: int = 2, causal: bool = False, pad_mode: str = 'reflect', true_skip: bool = False, compress: int = 2, lstm: int = 2, trim_right_ratio: float = 1.0, res_seq=True, last_out_padding: List[int] = [(0, 1), (0, 0)], tr_conv_group_ratio: int = -1, conv_group_ratio: int = -1)

Bases: Module

SEANet decoder for audio signal processing.

This class implements the SEANet decoder architecture, which is designed to decode intermediate representations into audio signals. The decoder consists of a series of convolutional and residual layers, with optional normalization and activation functions applied throughout the network.

Parameters:
- channels (int) – Audio channels.
- dimension (int) – Intermediate representation dimension.
- n_filters (int) – Base width for the model.
- n_residual_layers (int) – Number of residual layers.
- ratios (Sequence *[*int ]) – Kernel size and stride ratios.
- activation (str) – Activation function.
- activation_params (dict) – Parameters to provide to the activation function.
- final_activation (str) – Final activation function after all convolutions.
- final_activation_params (dict) – Parameters to provide to the activation function.
- norm (str) – Normalization method.
- norm_params (dict) – Parameters to provide to the underlying normalization used along with the convolution.
- kernel_size (int) – Kernel size for the initial convolution.
- last_kernel_size (int) – Kernel size for the last convolution.
- residual_kernel_size (int) – Kernel size for the residual layers.
- dilation_base (int) – How much to increase the dilation with each layer.
- causal (bool) – Whether to use fully causal convolution.
- pad_mode (str) – Padding mode for the convolutions.
- true_skip (bool) – Whether to use true skip connection or a simple (streamable) convolution as the skip connection in the residual network blocks.
- compress (int) – Reduced dimensionality in residual branches (from Demucs v3).
- lstm (int) – Number of LSTM layers at the end of the encoder.
- trim_right_ratio (float) – Ratio for trimming at the right of the transposed convolution under the causal setup. If equal to 1.0, it means that all the trimming is done at the right.

######### Examples

>>> decoder = SEANetDecoder2d(channels=1, dimension=128, n_filters=32)
>>> input_tensor = torch.randn(10, 128, 64)  # Batch of 10, 128 channels, 64 time steps
>>> output = decoder(input_tensor)
>>> print(output.shape)
torch.Size([10, 1, T])  # Output shape will depend on the model configuration

model

The sequential model composed of layers defined in the constructor.

Type: nn.Sequential

NOTE

The final activation is optional and defaults to None.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(z)

SEANet decoder for audio signal reconstruction.

This class implements a decoder based on the SEANet architecture, designed to convert latent representations back into audio signals.

Parameters:
- channels (int) – Audio channels.
- dimension (int) – Intermediate representation dimension.
- n_filters (int) – Base width for the model.
- n_residual_layers (int) – Number of residual layers.
- ratios (Sequence *[*int ]) – Kernel size and stride ratios.
- activation (str) – Activation function.
- activation_params (dict) – Parameters to provide to the activation function.
- final_activation (str) – Final activation function after all convolutions.
- final_activation_params (dict) – Parameters to provide to the activation function.
- norm (str) – Normalization method.
- norm_params (dict) – Parameters to provide to the underlying normalization used along with the convolution.
- kernel_size (int) – Kernel size for the initial convolution.
- last_kernel_size (int) – Kernel size for the final convolution.
- residual_kernel_size (int) – Kernel size for the residual layers.
- dilation_base (int) – How much to increase the dilation with each layer.
- causal (bool) – Whether to use fully causal convolution.
- pad_mode (str) – Padding mode for the convolutions.
- true_skip (bool) – Whether to use true skip connection or a simple (streamable) convolution as the skip connection in the residual network blocks.
- compress (int) – Reduced dimensionality in residual branches (from Demucs v3).
- lstm (int) – Number of LSTM layers at the end of the encoder.
- trim_right_ratio (float) – Ratio for trimming at the right of the transposed convolution under the causal setup. If equal to 1.0, it means that all the trimming is done at the right.
- res_seq (bool) – Whether to use residual sequences.
- last_out_padding (List *[*Union *[*int , int ] ]) – Padding for the last output.
- tr_conv_group_ratio (int) – Group ratio for transposed convolution.
- conv_group_ratio (int) – Group ratio for convolution.

######### Examples

>>> decoder = SEANetDecoder2d(channels=1, dimension=128, n_filters=32)
>>> z = torch.randn(1, 32, 256)  # Latent representation
>>> output = decoder(z)
>>> print(output.shape)  # Expected output shape: (1, 1, T)

NOTE

The decoder expects the input tensor to have shape (B, C, T), where B is the batch size, C is the number of channels, and T is the sequence length.

Raises:AssertionError – If the dimensions of input parameters are inconsistent.

output_size()

Returns the number of output channels for the SEANet decoder.

This method provides the output size of the decoder, which corresponds to the number of audio channels that the decoder will produce. It is primarily used to configure the final layer of the decoder to ensure that the output shape matches the expected audio channel format.

Returns: The number of output channels of the decoder.
Return type: int

######### Examples

decoder = SEANetDecoder2d(channels=2) output_channels = decoder.output_size() print(output_channels) # Output: 2