espnet2.gan_tts.parallel_wavegan.parallel_wavegan.ParallelWaveGANGenerator

About 4 min

espnet2.gan_tts.parallel_wavegan.parallel_wavegan.ParallelWaveGANGenerator

class espnet2.gan_tts.parallel_wavegan.parallel_wavegan.ParallelWaveGANGenerator(in_channels: int = 1, out_channels: int = 1, kernel_size: int = 3, layers: int = 30, stacks: int = 3, residual_channels: int = 64, gate_channels: int = 128, skip_channels: int = 64, aux_channels: int = 80, aux_context_window: int = 2, dropout_rate: float = 0.0, bias: bool = True, use_weight_norm: bool = True, upsample_conditional_features: bool = True, upsample_net: str = 'ConvInUpsampleNetwork', upsample_params: Dict[str, Any] = {'upsample_scales': [4, 4, 4, 4]})

Bases: Module

Parallel WaveGAN Generator module.

This module implements the generator architecture for Parallel WaveGAN, which is designed for generating high-quality audio waveforms from mel-spectrogram features. The generator consists of a series of residual blocks and an upsampling network.

in_channels

Number of input channels.

Type: int

out_channels

Number of output channels.

Type: int

aux_channels

Number of channels for auxiliary feature conv.

Type: int

aux_context_window

Context window size for auxiliary feature.

Type: int

layers

Number of residual block layers.

Type: int

stacks

Number of stacks i.e., dilation cycles.

Type: int

kernel_size

Kernel size of dilated convolution.

Type: int

upsample_net

Upsampling network architecture.

Type: torch.nn.Module

upsample_factor

Factor by which to upsample the input.

Type: int

conv_layers

List of residual blocks.

Type: torch.nn.ModuleList

last_conv_layers

Final layers for output.

Type: torch.nn.ModuleList
Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- kernel_size (int) – Kernel size of dilated convolution.
- layers (int) – Number of residual block layers.
- stacks (int) – Number of stacks i.e., dilation cycles.
- residual_channels (int) – Number of channels in residual conv.
- gate_channels (int) – Number of channels in gated conv.
- skip_channels (int) – Number of channels in skip conv.
- aux_channels (int) – Number of channels for auxiliary feature conv.
- aux_context_window (int) – Context window size for auxiliary feature.
- dropout_rate (float) – Dropout rate. 0.0 means no dropout applied.
- bias (bool) – Whether to use bias parameter in conv layer.
- use_weight_norm (bool) – Whether to use weight norm. If set to true, it will be applied to all of the conv layers.
- upsample_conditional_features (bool) – Whether to use upsampling network.
- upsample_net (str) – Upsampling network architecture.
- upsample_params (Dict *[*str , Any ]) – Upsampling network parameters.

########

Example

>>> generator = ParallelWaveGANGenerator(in_channels=1, out_channels=1)
>>> c = torch.randn(8, 80, 100)  # Example conditioning features
>>> output = generator(c)  # Generate audio waveform
>>> print(output.shape)  # Output shape should be (8, 1, T_wav)

Raises:AssertionError – If the number of layers is not divisible by stacks.

######## NOTE This code is modified from https://github.com/kan-bayashi/ParallelWaveGAN.

Initialize ParallelWaveGANGenerator module.

Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- kernel_size (int) – Kernel size of dilated convolution.
- layers (int) – Number of residual block layers.
- stacks (int) – Number of stacks i.e., dilation cycles.
- residual_channels (int) – Number of channels in residual conv.
- gate_channels (int) – Number of channels in gated conv.
- skip_channels (int) – Number of channels in skip conv.
- aux_channels (int) – Number of channels for auxiliary feature conv.
- aux_context_window (int) – Context window size for auxiliary feature.
- dropout_rate (float) – Dropout rate. 0.0 means no dropout applied.
- bias (bool) – Whether to use bias parameter in conv layer.
- use_weight_norm (bool) – Whether to use weight norm. If set to true, it will be applied to all of the conv layers.
- upsample_conditional_features (bool) – Whether to use upsampling network.
- upsample_net (str) – Upsampling network architecture.
- upsample_params (Dict *[*str , Any ]) – Upsampling network parameters.

apply_weight_norm()

Apply weight normalization module from all of the layers.

This method applies weight normalization to all convolutional layers within the generator. Weight normalization can improve the convergence of the model during training.

It checks each module in the generator and applies weight normalization if the module is an instance of torch.nn.Conv1d or torch.nn.Conv2d.

Example

>>> generator = ParallelWaveGANGenerator()
>>> generator.apply_weight_norm()  # Applies weight normalization

######## NOTE Weight normalization is particularly beneficial for deeper networks where the training dynamics can be unstable.

forward(c: Tensor, z: Tensor | None = None) → Tensor

Calculate forward propagation.

This method computes the forward pass of the Parallel WaveGAN Generator module, transforming the local conditioning auxiliary features and an optional input noise signal into the output tensor.

Parameters:
- c (Tensor) – Local conditioning auxiliary features of shape (B, C, T_feats).
- z (Optional *[*Tensor ]) – Input noise signal of shape (B, 1, T_wav). If not provided, a random tensor will be generated.
Returns: Output tensor of shape (B, out_channels, T_wav).
Return type: Tensor

########

Example

>>> generator = ParallelWaveGANGenerator()
>>> c = torch.randn(8, 80, 100)  # Batch of 8, 80 channels, 100 features
>>> z = torch.randn(8, 1, 400)    # Batch of 8, 1 channel, 400 time steps
>>> output = generator.forward(c, z)
>>> print(output.shape)  # Should be (8, out_channels, 400)

######## NOTE The output tensor will have the same temporal dimension as the upsampled version of the local conditioning auxiliary features when the upsampling network is used.

inference(c: Tensor, z: Tensor | None = None) → Tensor

Perform inference.

This method processes local conditioning auxiliary features and an optional input noise signal to produce an output tensor. The input noise can be specified, or if not provided, a random noise tensor will be generated.

Parameters:
- c (Tensor) – Local conditioning auxiliary features (T_feats, C).
- z (Optional *[*Tensor ]) – Input noise signal (T_wav, 1). If provided, it will be used as the input noise for the inference.
Returns: Output tensor (T_wav, out_channels), which represents the generated waveform.
Return type: Tensor

########

Example

>>> generator = ParallelWaveGANGenerator()
>>> c = torch.randn(100, 80)  # Example conditioning features
>>> output = generator.inference(c)
>>> print(output.shape)
torch.Size([T_wav, out_channels])

>>> z = torch.randn(200, 1)  # Example input noise
>>> output_with_noise = generator.inference(c, z)
>>> print(output_with_noise.shape)
torch.Size([T_wav, out_channels])

property receptive_field_size

Return receptive field size.

remove_weight_norm()

Remove weight normalization module from all of the layers.

This method traverses all the layers of the generator and removes the weight normalization applied to the convolutional layers. Weight normalization can improve the training dynamics of neural networks, but there might be scenarios where it is preferable to remove it.

######## NOTE This method will log a debug message for each layer from which weight normalization is removed. If a layer does not have weight normalization applied, it will catch the ValueError and continue without interruption.

########

Example

>>> generator = ParallelWaveGANGenerator(use_weight_norm=True)
>>> generator.remove_weight_norm()  # Removes weight normalization