espnet2.gan_tts.hifigan.hifigan.HiFiGANMultiScaleMultiPeriodDiscriminator

About 2 min

espnet2.gan_tts.hifigan.hifigan.HiFiGANMultiScaleMultiPeriodDiscriminator

class espnet2.gan_tts.hifigan.hifigan.HiFiGANMultiScaleMultiPeriodDiscriminator(scales: int = 3, scale_downsample_pooling: str = 'AvgPool1d', scale_downsample_pooling_params: Dict[str, Any] = {'kernel_size': 4, 'padding': 2, 'stride': 2}, scale_discriminator_params: Dict[str, Any] = {'bias': True, 'channels': 128, 'downsample_scales': [2, 2, 4, 4, 1], 'in_channels': 1, 'kernel_sizes': [15, 41, 5, 3], 'max_downsample_channels': 1024, 'max_groups': 16, 'nonlinear_activation': 'LeakyReLU', 'nonlinear_activation_params': {'negative_slope': 0.1}, 'out_channels': 1}, follow_official_norm: bool = True, periods: List[int] = [2, 3, 5, 7, 11], period_discriminator_params: Dict[str, Any] = {'bias': True, 'channels': 32, 'downsample_scales': [3, 3, 3, 3, 1], 'in_channels': 1, 'kernel_sizes': [5, 3], 'max_downsample_channels': 1024, 'nonlinear_activation': 'LeakyReLU', 'nonlinear_activation_params': {'negative_slope': 0.1}, 'out_channels': 1, 'use_spectral_norm': False, 'use_weight_norm': True})

Bases: Module

HiFi-GAN multi-scale + multi-period discriminator module.

This module combines multiple scale and period discriminators to evaluate generated audio signals across different scales and periods, enhancing the ability to distinguish real audio from generated audio.

Parameters:
- scales (int) – Number of multi-scales.
- scale_downsample_pooling (str) – Pooling module name for downsampling of the inputs.
- scale_downsample_pooling_params (Dict *[*str , Any ]) – Parameters for the above pooling module.
- scale_discriminator_params (Dict *[*str , Any ]) – Parameters for HiFi-GAN scale discriminator module.
- follow_official_norm (bool) – Whether to follow the norm setting of the official implementation. The first discriminator uses spectral norm and the other discriminators use weight norm.
- periods (List *[*int ]) – List of periods.
- period_discriminator_params (Dict *[*str , Any ]) – Parameters for HiFi-GAN period discriminator module. The period parameter will be overwritten.

####### Examples

>>> discriminator = HiFiGANMultiScaleMultiPeriodDiscriminator()
>>> noise_signal = torch.randn(1, 1, 16000)  # Example noise signal
>>> outputs = discriminator(noise_signal)
>>> print(len(outputs))  # Should output the total number of
# discriminators (scale + period)

Returns: List of list of each discriminator outputs, : which consists of each layer output tensors. Multi-scale and multi-period ones are concatenated.
Return type: List[List[Tensor]]

Initilize HiFiGAN multi-scale + multi-period discriminator module.

Parameters:
- scales (int) – Number of multi-scales.
- scale_downsample_pooling (str) – Pooling module name for downsampling of the inputs.
- scale_downsample_pooling_params (dict) – Parameters for the above pooling module.
- scale_discriminator_params (dict) – Parameters for hifi-gan scale discriminator module.
- follow_official_norm (bool) – Whether to follow the norm setting of the official implementaion. The first discriminator uses spectral norm and the other discriminators use weight norm.
- periods (list) – List of periods.
- period_discriminator_params (dict) – Parameters for hifi-gan period discriminator module. The period parameter will be overwritten.

forward(x: Tensor) → List[List[Tensor]]

Calculate forward propagation.

This method computes the forward pass of the HiFiGAN generator by processing the input tensor through several convolutional layers, upsampling layers, and residual blocks. If a global conditioning tensor is provided, it will be added to the processed input before proceeding through the network.

Parameters:
- c (torch.Tensor) – Input tensor of shape (B, in_channels, T), where B is the batch size, in_channels is the number of input channels, and T is the length of the input sequence.
- g (Optional *[*torch.Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). This tensor is optional and, if provided, is added to the input tensor after the initial convolution.
Returns: Output tensor of shape (B, out_channels, T), : where out_channels is the number of output channels.
Return type: torch.Tensor

####### Examples

>>> generator = HiFiGANGenerator()
>>> input_tensor = torch.randn(1, 80, 100)  # Example input
>>> output_tensor = generator(input_tensor)
>>> print(output_tensor.shape)  # Output shape should be (1, 1, T)

NOTE

The input tensor must have the correct number of channels as specified during the initialization of the HiFiGANGenerator. The global conditioning tensor must have the same batch size as the input tensor if provided.

Raises:
- AssertionError – If the input tensor does not match the expected
- shape or if the global conditioning tensor has an incompatible shape. –