espnet2.gan_tts.hifigan.hifigan.HiFiGANScaleDiscriminator

About 4 min

espnet2.gan_tts.hifigan.hifigan.HiFiGANScaleDiscriminator

class espnet2.gan_tts.hifigan.hifigan.HiFiGANScaleDiscriminator(in_channels: int = 1, out_channels: int = 1, kernel_sizes: List[int] = [15, 41, 5, 3], channels: int = 128, max_downsample_channels: int = 1024, max_groups: int = 16, bias: int = True, downsample_scales: List[int] = [2, 2, 4, 4, 1], nonlinear_activation: str = 'LeakyReLU', nonlinear_activation_params: Dict[str, Any] = {'negative_slope': 0.1}, use_weight_norm: bool = True, use_spectral_norm: bool = False)

Bases: Module

HiFi-GAN scale discriminator module.

This class implements a scale discriminator for the HiFi-GAN model, which is responsible for distinguishing between real and generated audio at different scales. It uses convolutional layers to extract features from the input audio signal and applies downsampling to capture various frequency components.

layers

A list of sequential layers comprising the discriminator.

Type: ModuleList
Parameters:
- in_channels (int) – Number of input channels. Defaults to 1.
- out_channels (int) – Number of output channels. Defaults to 1.
- kernel_sizes (List *[*int ]) – List of four kernel sizes. The first will be used for the first conv layer, and the second is for downsampling part, and the remaining two are for the last two output layers. Defaults to [15, 41, 5, 3].
- channels (int) – Initial number of channels for conv layer. Defaults to 128.
- max_downsample_channels (int) – Maximum number of channels for downsampling layers. Defaults to 1024.
- max_groups (int) – Maximum number of groups for group convolution. Defaults to 16.
- bias (bool) – Whether to add bias parameter in convolution layers. Defaults to True.
- downsample_scales (List *[*int ]) – List of downsampling scales. Defaults to [2, 2, 4, 4, 1].
- nonlinear_activation (str) – Activation function module name. Defaults to “LeakyReLU”.
- nonlinear_activation_params (Dict *[*str , Any ]) – Hyperparameters for activation function. Defaults to {“negative_slope”: 0.1}.
- use_weight_norm (bool) – Whether to use weight norm. If set to true, it will be applied to all of the conv layers. Defaults to True.
- use_spectral_norm (bool) – Whether to use spectral norm. If set to true, it will be applied to all of the conv layers. Defaults to False.
Raises:ValueError – If both use_weight_norm and use_spectral_norm are True.

############### Examples

>>> discriminator = HiFiGANScaleDiscriminator()
>>> input_tensor = torch.randn(1, 1, 1024)  # (B, C, T)
>>> outputs = discriminator(input_tensor)
>>> print([out.shape for out in outputs])
[torch.Size([1, 128, 1024]), torch.Size([1, 128, 512]),
 torch.Size([1, 128, 128]), torch.Size([1, 1, 128])]

Initilize HiFiGAN scale discriminator module.

Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- kernel_sizes (List *[*int ]) – List of four kernel sizes. The first will be used for the first conv layer, and the second is for downsampling part, and the remaining two are for the last two output layers.
- channels (int) – Initial number of channels for conv layer.
- max_downsample_channels (int) – Maximum number of channels for downsampling layers.
- bias (bool) – Whether to add bias parameter in convolution layers.
- downsample_scales (List *[*int ]) – List of downsampling scales.
- nonlinear_activation (str) – Activation function module name.
- nonlinear_activation_params (Dict *[*str , Any ]) – Hyperparameters for activation function.
- use_weight_norm (bool) – Whether to use weight norm. If set to true, it will be applied to all of the conv layers.
- use_spectral_norm (bool) – Whether to use spectral norm. If set to true, it will be applied to all of the conv layers.

apply_spectral_norm()

Apply spectral normalization module from all of the layers.

This method applies spectral normalization to all Conv1d layers within the HiFiGANScaleDiscriminator module. Spectral normalization is a technique used to stabilize the training of generative adversarial networks (GANs) by controlling the Lipschitz constant of the network.

######### NOTE This method modifies the layers in place, so it is recommended to call this method after the module has been initialized.

############### Examples

>>> discriminator = HiFiGANScaleDiscriminator(use_spectral_norm=True)
>>> discriminator.apply_spectral_norm()

apply_weight_norm()

Apply weight normalization module from all of the layers.

This method applies weight normalization to all convolutional layers in the HiFiGAN scale discriminator. Weight normalization can improve the training speed and stability of the model by reparameterizing the weights of the layers. It is important to note that weight normalization should be applied during the initialization of the model if the use_weight_norm parameter is set to True.

############### Examples

>>> discriminator = HiFiGANScaleDiscriminator(use_weight_norm=True)
>>> discriminator.apply_weight_norm()

######### NOTE Weight normalization is particularly useful when training deep neural networks as it can lead to faster convergence and improved performance.

Raises:None – This method does not raise any exceptions.

forward(x: Tensor) → List[Tensor]

Calculate forward propagation.

This method computes the forward pass of the HiFiGAN generator by processing the input tensor through several convolutional layers, upsampling layers, and residual blocks. If a global conditioning tensor is provided, it will be added to the processed input before proceeding through the network.

Parameters:
- c (torch.Tensor) – Input tensor of shape (B, in_channels, T), where B is the batch size, in_channels is the number of input channels, and T is the length of the input sequence.
- g (Optional *[*torch.Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). This tensor is optional and, if provided, is added to the input tensor after the initial convolution.
Returns: Output tensor of shape (B, out_channels, T), : where out_channels is the number of output channels.
Return type: torch.Tensor

############### Examples

>>> generator = HiFiGANGenerator()
>>> input_tensor = torch.randn(1, 80, 100)  # Example input
>>> output_tensor = generator(input_tensor)
>>> print(output_tensor.shape)  # Output shape should be (1, 1, T)

######### NOTE The input tensor must have the correct number of channels as specified during the initialization of the HiFiGANGenerator. The global conditioning tensor must have the same batch size as the input tensor if provided.

Raises:
- AssertionError – If the input tensor does not match the expected
- shape or if the global conditioning tensor has an incompatible shape. –

remove_spectral_norm()

Remove spectral normalization module from all of the layers.

This method iterates through all layers of the model and removes the spectral normalization applied to each layer. It is useful when you want to revert the model to its original state without spectral normalization.

Raises:ValueError – If the module does not have spectral normalization.

############### Examples

>>> discriminator = HiFiGANScaleDiscriminator()
>>> discriminator.apply_spectral_norm()  # Apply spectral norm
>>> discriminator.remove_spectral_norm()  # Remove spectral norm

######### NOTE Ensure to call this method if you need to switch between weight and spectral normalization during model training or inference.

remove_weight_norm()

Remove weight normalization module from all of the layers.

This method traverses through all layers of the HiFiGANScaleDiscriminator and removes weight normalization if it is applied. It logs a message for each layer from which the weight normalization is removed. If a layer does not have weight normalization applied, it catches the ValueError and continues without raising an exception.

######### NOTE This method is useful for models that were previously using weight normalization and need to revert back to the standard weight parameters for compatibility or performance reasons.

############### Examples

>>> discriminator = HiFiGANScaleDiscriminator()
>>> discriminator.apply_weight_norm()  # Apply weight normalization
>>> discriminator.remove_weight_norm()  # Remove weight normalization

Raises:ValueError – If a module does not have weight normalization applied.