espnet2.gan_tts.melgan.melgan.MelGANDiscriminator

About 2 min

espnet2.gan_tts.melgan.melgan.MelGANDiscriminator

class espnet2.gan_tts.melgan.melgan.MelGANDiscriminator(in_channels: int = 1, out_channels: int = 1, kernel_sizes: List[int] = [5, 3], channels: int = 16, max_downsample_channels: int = 1024, bias: bool = True, downsample_scales: List[int] = [4, 4, 4, 4], nonlinear_activation: str = 'LeakyReLU', nonlinear_activation_params: Dict[str, Any] = {'negative_slope': 0.2}, pad: str = 'ReflectionPad1d', pad_params: Dict[str, Any] = {})

Bases: Module

MelGAN discriminator module.

This class implements the discriminator used in the MelGAN architecture. It is designed to process audio signals, learning to distinguish between real and generated samples.

layers

A list of layers that constitute the discriminator.

Type: torch.nn.ModuleList
Parameters:
- in_channels (int) – Number of input channels. Default is 1.
- out_channels (int) – Number of output channels. Default is 1.
- kernel_sizes (List *[*int ]) – List of two kernel sizes. The product will be used for the first conv layer, and the first and the second kernel sizes will be used for the last two layers. For example if kernel_sizes = [5, 3], the first layer kernel size will be 5 * 3 = 15, the last two layers’ kernel size will be 5 and 3, respectively.
- channels (int) – Initial number of channels for conv layer. Default is 16.
- max_downsample_channels (int) – Maximum number of channels for downsampling layers. Default is 1024.
- bias (bool) – Whether to add bias parameter in convolution layers. Default is True.
- downsample_scales (List *[*int ]) – List of downsampling scales. Default is [4, 4, 4, 4].
- nonlinear_activation (str) – Activation function module name. Default is “LeakyReLU”.
- nonlinear_activation_params (Dict *[*str , Any ]) – Hyperparameters for activation function. Default is {“negative_slope”: 0.2}.
- pad (str) – Padding function module name before dilated convolution layer. Default is “ReflectionPad1d”.
- pad_params (Dict *[*str , Any ]) – Hyperparameters for padding function. Default is {}.

####### Examples

>>> discriminator = MelGANDiscriminator()
>>> input_tensor = torch.randn(2, 1, 16000)  # (B, 1, T)
>>> output = discriminator(input_tensor)
>>> len(output)  # Should return the number of layers in the discriminator

Returns: List of output tensors of each layer.
Return type: List[Tensor]
Raises:AssertionError – If the kernel sizes are not valid (not odd).

Initilize MelGANDiscriminator module.

Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- kernel_sizes (List *[*int ]) – List of two kernel sizes. The prod will be used for the first conv layer, and the first and the second kernel sizes will be used for the last two layers. For example if kernel_sizes = [5, 3], the first layer kernel size will be 5 * 3 = 15, the last two layers’ kernel size will be 5 and 3, respectively.
- channels (int) – Initial number of channels for conv layer.
- max_downsample_channels (int) – Maximum number of channels for downsampling layers.
- bias (bool) – Whether to add bias parameter in convolution layers.
- downsample_scales (List *[*int ]) – List of downsampling scales.
- nonlinear_activation (str) – Activation function module name.
- nonlinear_activation_params (Dict *[*str , Any ]) – Hyperparameters for activation function.
- pad (str) – Padding function module name before dilated convolution layer.
- pad_params (Dict *[*str , Any ]) – Hyperparameters for padding function.

forward(x: Tensor) → List[Tensor]

Calculate forward propagation.

This method performs the forward pass of the MelGAN discriminator. It takes an input tensor and processes it through the defined layers of the model, returning the output tensor.

Parameters:c (Tensor) – Input tensor (B, channels, T).
Returns: Output tensor (B, 1, T ** prod(upsample_scales)).
Return type: Tensor

####### Examples

>>> model = MelGANGenerator()
>>> input_tensor = torch.randn(16, 80, 100)  # Batch size of 16
>>> output_tensor = model(input_tensor)
>>> output_tensor.shape
torch.Size([16, 1, 1600])  # Example output shape based on upsampling