espnet2.gan_tts.hifigan.hifigan.HiFiGANMultiScaleDiscriminator
espnet2.gan_tts.hifigan.hifigan.HiFiGANMultiScaleDiscriminator
class espnet2.gan_tts.hifigan.hifigan.HiFiGANMultiScaleDiscriminator(scales: int = 3, downsample_pooling: str = 'AvgPool1d', downsample_pooling_params: Dict[str, Any] = {'kernel_size': 4, 'padding': 2, 'stride': 2}, discriminator_params: Dict[str, Any] = {'bias': True, 'channels': 128, 'downsample_scales': [2, 2, 4, 4, 1], 'in_channels': 1, 'kernel_sizes': [15, 41, 5, 3], 'max_downsample_channels': 1024, 'max_groups': 16, 'nonlinear_activation': 'LeakyReLU', 'nonlinear_activation_params': {'negative_slope': 0.1}, 'out_channels': 1}, follow_official_norm: bool = False)
Bases: Module
HiFi-GAN multi-scale discriminator module.
This module implements a multi-scale discriminator for HiFi-GAN, which is designed to distinguish real and generated audio signals at multiple scales. It uses a series of discriminators, each processing the input signal at a different scale, enabling the model to capture various frequency characteristics.
discriminators
A list of individual HiFiGAN scale discriminators.
- Type: torch.nn.ModuleList
pooling
A downsampling pooling layer applied to the input signal between scales, if multiple scales are used.
Type: Optional[torch.nn.Module]
Parameters:
- scales (int) – Number of multi-scales.
- downsample_pooling (str) – Pooling module name for downsampling of the inputs.
- downsample_pooling_params (Dict *[*str , Any ]) – Parameters for the above pooling module.
- discriminator_params (Dict *[*str , Any ]) – Parameters for HiFi-GAN scale discriminator module.
- follow_official_norm (bool) – Whether to follow the norm setting of the official implementation. The first discriminator uses spectral norm and the other discriminators use weight norm.
####### Examples
>>> discriminator = HiFiGANMultiScaleDiscriminator(scales=3)
>>> input_signal = torch.randn(1, 1, 2048) # (Batch, Channels, Time)
>>> outputs = discriminator(input_signal)
>>> print(len(outputs)) # Should print 3 for 3 scales
Initilize HiFiGAN multi-scale discriminator module.
- Parameters:
- scales (int) – Number of multi-scales.
- downsample_pooling (str) – Pooling module name for downsampling of the inputs.
- downsample_pooling_params (Dict *[*str , Any ]) – Parameters for the above pooling module.
- discriminator_params (Dict *[*str , Any ]) – Parameters for hifi-gan scale discriminator module.
- follow_official_norm (bool) – Whether to follow the norm setting of the official implementaion. The first discriminator uses spectral norm and the other discriminators use weight norm.
forward(x: Tensor) → List[List[Tensor]]
Calculate forward propagation.
This method computes the forward pass of the HiFiGAN generator by processing the input tensor through several convolutional layers, upsampling layers, and residual blocks. If a global conditioning tensor is provided, it will be added to the processed input before proceeding through the network.
- Parameters:
- c (torch.Tensor) – Input tensor of shape (B, in_channels, T), where B is the batch size, in_channels is the number of input channels, and T is the length of the input sequence.
- g (Optional *[*torch.Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). This tensor is optional and, if provided, is added to the input tensor after the initial convolution.
- Returns: Output tensor of shape (B, out_channels, T), : where out_channels is the number of output channels.
- Return type: torch.Tensor
####### Examples
>>> generator = HiFiGANGenerator()
>>> input_tensor = torch.randn(1, 80, 100) # Example input
>>> output_tensor = generator(input_tensor)
>>> print(output_tensor.shape) # Output shape should be (1, 1, T)
NOTE
The input tensor must have the correct number of channels as specified during the initialization of the HiFiGANGenerator. The global conditioning tensor must have the same batch size as the input tensor if provided.
- Raises:
- AssertionError – If the input tensor does not match the expected
- shape or if the global conditioning tensor has an incompatible shape. –