espnet2.gan_tts.parallel_wavegan.parallel_wavegan.ParallelWaveGANGenerator
espnet2.gan_tts.parallel_wavegan.parallel_wavegan.ParallelWaveGANGenerator
class espnet2.gan_tts.parallel_wavegan.parallel_wavegan.ParallelWaveGANGenerator(in_channels: int = 1, out_channels: int = 1, kernel_size: int = 3, layers: int = 30, stacks: int = 3, residual_channels: int = 64, gate_channels: int = 128, skip_channels: int = 64, aux_channels: int = 80, aux_context_window: int = 2, dropout_rate: float = 0.0, bias: bool = True, use_weight_norm: bool = True, upsample_conditional_features: bool = True, upsample_net: str = 'ConvInUpsampleNetwork', upsample_params: Dict[str, Any] = {'upsample_scales': [4, 4, 4, 4]})
Bases: Module
Parallel WaveGAN Generator module.
This module implements the generator architecture for Parallel WaveGAN, which is designed for generating high-quality audio waveforms from mel-spectrogram features. The generator consists of a series of residual blocks and an upsampling network.
in_channels
Number of input channels.
- Type: int
out_channels
Number of output channels.
- Type: int
aux_channels
Number of channels for auxiliary feature conv.
- Type: int
aux_context_window
Context window size for auxiliary feature.
- Type: int
layers
Number of residual block layers.
- Type: int
stacks
Number of stacks i.e., dilation cycles.
- Type: int
kernel_size
Kernel size of dilated convolution.
- Type: int
upsample_net
Upsampling network architecture.
- Type: torch.nn.Module
upsample_factor
Factor by which to upsample the input.
- Type: int
conv_layers
List of residual blocks.
- Type: torch.nn.ModuleList
last_conv_layers
Final layers for output.
Type: torch.nn.ModuleList
Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- kernel_size (int) – Kernel size of dilated convolution.
- layers (int) – Number of residual block layers.
- stacks (int) – Number of stacks i.e., dilation cycles.
- residual_channels (int) – Number of channels in residual conv.
- gate_channels (int) – Number of channels in gated conv.
- skip_channels (int) – Number of channels in skip conv.
- aux_channels (int) – Number of channels for auxiliary feature conv.
- aux_context_window (int) – Context window size for auxiliary feature.
- dropout_rate (float) – Dropout rate. 0.0 means no dropout applied.
- bias (bool) – Whether to use bias parameter in conv layer.
- use_weight_norm (bool) – Whether to use weight norm. If set to true, it will be applied to all of the conv layers.
- upsample_conditional_features (bool) – Whether to use upsampling network.
- upsample_net (str) – Upsampling network architecture.
- upsample_params (Dict *[*str , Any ]) – Upsampling network parameters.
########
Example
>>> generator = ParallelWaveGANGenerator(in_channels=1, out_channels=1)
>>> c = torch.randn(8, 80, 100) # Example conditioning features
>>> output = generator(c) # Generate audio waveform
>>> print(output.shape) # Output shape should be (8, 1, T_wav)
- Raises:AssertionError – If the number of layers is not divisible by stacks.
######## NOTE This code is modified from https://github.com/kan-bayashi/ParallelWaveGAN.
Initialize ParallelWaveGANGenerator module.
- Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- kernel_size (int) – Kernel size of dilated convolution.
- layers (int) – Number of residual block layers.
- stacks (int) – Number of stacks i.e., dilation cycles.
- residual_channels (int) – Number of channels in residual conv.
- gate_channels (int) – Number of channels in gated conv.
- skip_channels (int) – Number of channels in skip conv.
- aux_channels (int) – Number of channels for auxiliary feature conv.
- aux_context_window (int) – Context window size for auxiliary feature.
- dropout_rate (float) – Dropout rate. 0.0 means no dropout applied.
- bias (bool) – Whether to use bias parameter in conv layer.
- use_weight_norm (bool) – Whether to use weight norm. If set to true, it will be applied to all of the conv layers.
- upsample_conditional_features (bool) – Whether to use upsampling network.
- upsample_net (str) – Upsampling network architecture.
- upsample_params (Dict *[*str , Any ]) – Upsampling network parameters.
apply_weight_norm()
Apply weight normalization module from all of the layers.
This method applies weight normalization to all convolutional layers within the generator. Weight normalization can improve the convergence of the model during training.
It checks each module in the generator and applies weight normalization if the module is an instance of torch.nn.Conv1d or torch.nn.Conv2d.
Example
>>> generator = ParallelWaveGANGenerator()
>>> generator.apply_weight_norm() # Applies weight normalization
######## NOTE Weight normalization is particularly beneficial for deeper networks where the training dynamics can be unstable.
forward(c: Tensor, z: Tensor | None = None) → Tensor
Calculate forward propagation.
This method computes the forward pass of the Parallel WaveGAN Generator module, transforming the local conditioning auxiliary features and an optional input noise signal into the output tensor.
- Parameters:
- c (Tensor) – Local conditioning auxiliary features of shape (B, C, T_feats).
- z (Optional *[*Tensor ]) – Input noise signal of shape (B, 1, T_wav). If not provided, a random tensor will be generated.
- Returns: Output tensor of shape (B, out_channels, T_wav).
- Return type: Tensor
########
Example
>>> generator = ParallelWaveGANGenerator()
>>> c = torch.randn(8, 80, 100) # Batch of 8, 80 channels, 100 features
>>> z = torch.randn(8, 1, 400) # Batch of 8, 1 channel, 400 time steps
>>> output = generator.forward(c, z)
>>> print(output.shape) # Should be (8, out_channels, 400)
######## NOTE The output tensor will have the same temporal dimension as the upsampled version of the local conditioning auxiliary features when the upsampling network is used.
inference(c: Tensor, z: Tensor | None = None) → Tensor
Perform inference.
This method processes local conditioning auxiliary features and an optional input noise signal to produce an output tensor. The input noise can be specified, or if not provided, a random noise tensor will be generated.
- Parameters:
- c (Tensor) – Local conditioning auxiliary features (T_feats, C).
- z (Optional *[*Tensor ]) – Input noise signal (T_wav, 1). If provided, it will be used as the input noise for the inference.
- Returns: Output tensor (T_wav, out_channels), which represents the generated waveform.
- Return type: Tensor
########
Example
>>> generator = ParallelWaveGANGenerator()
>>> c = torch.randn(100, 80) # Example conditioning features
>>> output = generator.inference(c)
>>> print(output.shape)
torch.Size([T_wav, out_channels])
>>> z = torch.randn(200, 1) # Example input noise
>>> output_with_noise = generator.inference(c, z)
>>> print(output_with_noise.shape)
torch.Size([T_wav, out_channels])
property receptive_field_size
Return receptive field size.
remove_weight_norm()
Remove weight normalization module from all of the layers.
This method traverses all the layers of the generator and removes the weight normalization applied to the convolutional layers. Weight normalization can improve the training dynamics of neural networks, but there might be scenarios where it is preferable to remove it.
######## NOTE This method will log a debug message for each layer from which weight normalization is removed. If a layer does not have weight normalization applied, it will catch the ValueError and continue without interruption.
########
Example
>>> generator = ParallelWaveGANGenerator(use_weight_norm=True)
>>> generator.remove_weight_norm() # Removes weight normalization