espnet2.gan_tts.style_melgan.style_melgan.StyleMelGANGenerator
espnet2.gan_tts.style_melgan.style_melgan.StyleMelGANGenerator
class espnet2.gan_tts.style_melgan.style_melgan.StyleMelGANGenerator(in_channels: int = 128, aux_channels: int = 80, channels: int = 64, out_channels: int = 1, kernel_size: int = 9, dilation: int = 2, bias: bool = True, noise_upsample_scales: List[int] = [11, 2, 2, 2], noise_upsample_activation: str = 'LeakyReLU', noise_upsample_activation_params: Dict[str, Any] = {'negative_slope': 0.2}, upsample_scales: List[int] = [2, 2, 2, 2, 2, 2, 2, 2, 1], upsample_mode: str = 'nearest', gated_function: str = 'softmax', use_weight_norm: bool = True)
Bases: Module
Style MelGAN generator module.
This module implements the StyleMelGAN generator, which is a neural network architecture designed for generating high-quality audio waveforms from mel-spectrograms and auxiliary noise inputs.
in_channels
Number of input noise channels.
- Type: int
noise_upsample
Sequential model for noise upsampling.
- Type: torch.nn.Sequential
blocks
List of TADEResBlock modules for audio generation.
- Type: torch.nn.ModuleList
upsample_factor
Total upsampling factor for output generation.
- Type: int
output_conv
Final convolutional layer for output.
Type: torch.nn.Sequential
Parameters:
- in_channels (int) – Number of input noise channels.
- aux_channels (int) – Number of auxiliary input channels.
- channels (int) – Number of channels for convolutional layers.
- out_channels (int) – Number of output channels.
- kernel_size (int) – Kernel size of convolutional layers.
- dilation (int) – Dilation factor for convolutional layers.
- bias (bool) – Whether to add bias parameter in convolution layers.
- noise_upsample_scales (List *[*int ]) – List of noise upsampling scales.
- noise_upsample_activation (str) – Activation function for noise upsampling.
- noise_upsample_activation_params (Dict *[*str , Any ]) – Hyperparameters for the activation function.
- upsample_scales (List *[*int ]) – List of upsampling scales.
- upsample_mode (str) – Upsampling mode in TADE layer.
- gated_function (str) – Gated function used in TADEResBlock (“softmax” or “sigmoid”).
- use_weight_norm (bool) – Whether to use weight normalization.
Returns: None
############### Examples
Creating a StyleMelGAN generator instance
generator = StyleMelGANGenerator(in_channels=128, aux_channels=80)
Forward pass with auxiliary input and noise
c = torch.randn(1, 80, 100) # Auxiliary input z = torch.randn(1, 128, 1) # Noise input output = generator(c, z)
######## NOTE This module is modified from the original implementation found at https://github.com/kan-bayashi/ParallelWaveGAN.
Initilize StyleMelGANGenerator module.
- Parameters:
- in_channels (int) – Number of input noise channels.
- aux_channels (int) – Number of auxiliary input channels.
- channels (int) – Number of channels for conv layer.
- out_channels (int) – Number of output channels.
- kernel_size (int) – Kernel size of conv layers.
- dilation (int) – Dilation factor for conv layers.
- bias (bool) – Whether to add bias parameter in convolution layers.
- noise_upsample_scales (List *[*int ]) – List of noise upsampling scales.
- noise_upsample_activation (str) – Activation function module name for noise upsampling.
- noise_upsample_activation_params (Dict *[*str , Any ]) – Hyperparameters for the above activation function.
- upsample_scales (List *[*int ]) – List of upsampling scales.
- upsample_mode (str) – Upsampling mode in TADE layer.
- gated_function (str) – Gated function used in TADEResBlock (“softmax” or “sigmoid”).
- use_weight_norm (bool) – Whether to use weight norm. If set to true, it will be applied to all of the conv layers.
apply_weight_norm()
Apply weight normalization module from all of the layers.
This method applies weight normalization to all convolutional layers in the generator. Weight normalization is a technique that can help to stabilize training and improve convergence.
It specifically targets layers of type torch.nn.Conv1d and torch.nn.ConvTranspose1d. After applying weight normalization, the model’s layers will be able to utilize the benefits of this normalization technique during training.
############### Examples
>>> model = StyleMelGANGenerator()
>>> model.apply_weight_norm()
# This will apply weight normalization to all Conv1d layers in the model.
######## NOTE It is recommended to apply weight normalization during the initialization of the model to achieve optimal training performance.
forward(c: Tensor, z: Tensor | None = None) → Tensor
Calculate forward propagation.
This method computes the forward pass of the StyleMelGAN generator. It takes an auxiliary input tensor and an optional noise tensor, and produces an output tensor. If the noise tensor is not provided, a random tensor is generated.
- Parameters:
- c (Tensor) – Auxiliary input tensor (B, channels, T).
- z (Optional *[*Tensor ]) – Input noise tensor (B, in_channels, 1). If not provided, a random tensor will be generated.
- Returns: Output tensor (B, out_channels, T ** prod(upsample_scales)).
- Return type: Tensor
############### Examples
>>> generator = StyleMelGANGenerator()
>>> aux_input = torch.randn(4, 64, 100) # Batch of 4, 64 channels, length 100
>>> noise_input = torch.randn(4, 128, 1) # Batch of 4, 128 noise channels
>>> output = generator.forward(aux_input, noise_input)
>>> output.shape
torch.Size([4, 1, 2000]) # Example output shape
>>> output_random_noise = generator.forward(aux_input)
>>> output_random_noise.shape
torch.Size([4, 1, 2000]) # Example output shape with random noise
inference(c: Tensor) → Tensor
Perform inference.
This method takes an input tensor and generates an output tensor using the trained StyleMelGAN generator. The input tensor is expected to be of shape (T, in_channels), where T is the time dimension and in_channels corresponds to the number of input channels.
- Parameters:c (Tensor) – Input tensor of shape (T, in_channels).
- Returns: Output tensor of shape (T ** prod(upsample_scales), out_channels).
- Return type: Tensor
############### Examples
>>> generator = StyleMelGANGenerator()
>>> input_tensor = torch.randn(100, 128) # Example input
>>> output_tensor = generator.inference(input_tensor)
>>> print(output_tensor.shape)
torch.Size([1000, 1]) # Output shape based on upsampling
######## NOTE The input tensor is transposed and reshaped before processing. Additionally, noise is generated internally to aid in the inference process.
remove_weight_norm()
Remove weight normalization module from all layers.
This method iterates through all layers of the StyleMelGANGenerator and removes the weight normalization applied to the convolutional layers. It utilizes the torch.nn.utils.remove_weight_norm function to perform the operation. If a layer does not have weight normalization, a ValueError is caught and ignored.
############### Examples
>>> generator = StyleMelGANGenerator(use_weight_norm=True)
>>> generator.remove_weight_norm() # Removes weight normalization
reset_parameters()
Reset parameters.
This method resets the weights of the convolutional layers within the model to a normal distribution with a mean of 0.0 and a standard deviation of 0.02. This is typically used to initialize the weights of the model before training or after loading a pre-trained model to ensure the model starts with a fresh set of parameters.
The function applies the reset operation to all instances of Conv1d and ConvTranspose1d layers in the model.
############### Examples
To reset parameters of a StyleMelGANGenerator instance, you can call:
python generator = StyleMelGANGenerator() generator.reset_parameters()
Similarly, for a StyleMelGANDiscriminator instance:
python discriminator = StyleMelGANDiscriminator() discriminator.reset_parameters()
######## NOTE This function is automatically called during the initialization of the generator and discriminator classes.