espnet2.gan_codec.shared.decoder.seanet.SEANetDecoder
espnet2.gan_codec.shared.decoder.seanet.SEANetDecoder
class espnet2.gan_codec.shared.decoder.seanet.SEANetDecoder(channels: int = 1, dimension: int = 128, n_filters: int = 32, n_residual_layers: int = 1, ratios: List[int] = [8, 5, 4, 2], activation: str = 'ELU', activation_params: dict = {'alpha': 1.0}, final_activation: str | None = None, final_activation_params: dict | None = None, norm: str = 'weight_norm', norm_params: Dict[str, Any] = {}, kernel_size: int = 7, last_kernel_size: int = 7, residual_kernel_size: int = 3, dilation_base: int = 2, causal: bool = False, pad_mode: str = 'reflect', true_skip: bool = False, compress: int = 2, lstm: int = 2, trim_right_ratio: float = 1.0)
Bases: Module
SEANet decoder.
This class implements a SEANet-based decoder for audio processing using transposed convolutions and optional LSTM layers. It allows for customizable activation functions, normalization methods, and residual connections.
- Parameters:
- channels (int) – Audio channels.
- dimension (int) – Intermediate representation dimension.
- n_filters (int) – Base width for the model.
- n_residual_layers (int) – Number of residual layers.
- ratios (Sequence *[*int ]) – Kernel size and stride ratios.
- activation (str) – Activation function.
- activation_params (dict) – Parameters to provide to the activation function.
- final_activation (Optional *[*str ]) – Final activation function after all convolutions.
- final_activation_params (Optional *[*dict ]) – Parameters to provide to the final activation function.
- norm (str) – Normalization method.
- norm_params (dict) – Parameters to provide to the underlying normalization used along with the convolution.
- kernel_size (int) – Kernel size for the initial convolution.
- last_kernel_size (int) – Kernel size for the final convolution.
- residual_kernel_size (int) – Kernel size for the residual layers.
- dilation_base (int) – How much to increase the dilation with each layer.
- causal (bool) – Whether to use fully causal convolution.
- pad_mode (str) – Padding mode for the convolutions.
- true_skip (bool) – Whether to use true skip connection or a simple (streamable) convolution as the skip connection in the residual network blocks.
- compress (int) – Reduced dimensionality in residual branches (from Demucs v3).
- lstm (int) – Number of LSTM layers at the end of the encoder.
- trim_right_ratio (float) – Ratio for trimming at the right of the transposed convolution under the causal setup. If equal to 1.0, it means that all the trimming is done at the right.
####### Examples
>>> decoder = SEANetDecoder(channels=1, dimension=128, n_filters=32)
>>> input_tensor = torch.randn(1, 32, 256) # (batch_size, channels, length)
>>> output_tensor = decoder(input_tensor)
>>> print(output_tensor.shape) # Output shape will depend on configuration
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(z)
SEANet decoder.
This class implements the SEANet decoder, which is designed for audio processing tasks. It employs a series of convolutional and residual layers to decode an intermediate representation into an audio signal.
- Parameters:
- channels (int) – Audio channels.
- dimension (int) – Intermediate representation dimension.
- n_filters (int) – Base width for the model.
- n_residual_layers (int) – Number of residual layers.
- ratios (Sequence *[*int ]) – Kernel size and stride ratios.
- activation (str) – Activation function.
- activation_params (dict) – Parameters to provide to the activation function.
- final_activation (str) – Final activation function after all convolutions.
- final_activation_params (dict) – Parameters to provide to the activation function.
- norm (str) – Normalization method.
- norm_params (dict) – Parameters to provide to the underlying normalization used along with the convolution.
- kernel_size (int) – Kernel size for the initial convolution.
- last_kernel_size (int) – Kernel size for the last convolution.
- residual_kernel_size (int) – Kernel size for the residual layers.
- dilation_base (int) – How much to increase the dilation with each layer.
- causal (bool) – Whether to use fully causal convolution.
- pad_mode (str) – Padding mode for the convolutions.
- true_skip (bool) – Whether to use true skip connection or a simple (streamable) convolution as the skip connection in the residual network blocks.
- compress (int) – Reduced dimensionality in residual branches (from Demucs v3).
- lstm (int) – Number of LSTM layers at the end of the encoder.
- trim_right_ratio (float) – Ratio for trimming at the right of the transposed convolution under the causal setup. If equal to 1.0, it means that all the trimming is done at the right.
####### Examples
>>> decoder = SEANetDecoder(channels=1, dimension=128)
>>> input_tensor = torch.randn(1, 128, 100) # Example input
>>> output_tensor = decoder(input_tensor)
>>> output_tensor.shape
torch.Size([1, 1, <output_length>]) # Output length depends on config
- Returns: The decoded audio signal as a tensor.
- Return type: torch.Tensor