espnet2.gan_svs.avocodo.avocodo.AvocodoGenerator

About 4 min

espnet2.gan_svs.avocodo.avocodo.AvocodoGenerator

class espnet2.gan_svs.avocodo.avocodo.AvocodoGenerator(in_channels: int = 80, out_channels: int = 1, channels: int = 512, global_channels: int = -1, kernel_size: int = 7, upsample_scales: List[int] = [8, 8, 2, 2], upsample_kernel_sizes: List[int] = [16, 16, 4, 4], resblock_kernel_sizes: List[int] = [3, 7, 11], resblock_dilations: List[List[int]] = [[1, 3, 5], [1, 3, 5], [1, 3, 5]], projection_filters: List[int] = [0, 1, 1, 1], projection_kernels: List[int] = [0, 5, 7, 11], use_additional_convs: bool = True, bias: bool = True, nonlinear_activation: str = 'LeakyReLU', nonlinear_activation_params: Dict[str, Any] = {'negative_slope': 0.2}, use_weight_norm: bool = True)

Bases: Module

Avocodo generator module for generating audio signals.

This module utilizes various convolutional layers and residual blocks to generate audio signals from input features, allowing for multi-scale and multi-resolution processing.

num_upsamples

Number of upsampling layers.

Type: int

num_blocks

Number of residual blocks.

Type: int

input_conv

Initial convolutional layer.

Type:Conv1d

upsamples

List of upsampling layers.

Type: ModuleList

blocks

List of residual blocks.

Type: ModuleList

output_conv

List of output convolutional layers.

Type: ModuleList

global_conv

Global conditioning convolutional layer.

Type: Optional[Conv1d]
Parameters:
- in_channels (int) – Number of input channels. Defaults to 80.
- out_channels (int) – Number of output channels. Defaults to 1.
- channels (int) – Number of hidden representation channels. Defaults to 512.
- global_channels (int) – Number of global conditioning channels. Defaults to -1.
- kernel_size (int) – Kernel size of initial and final conv layer. Defaults to 7.
- upsample_scales (List *[*int ]) – List of upsampling scales. Defaults to [8, 8, 2, 2].
- upsample_kernel_sizes (List *[*int ]) – List of kernel sizes for upsample layers. Defaults to [16, 16, 4, 4].
- resblock_kernel_sizes (List *[*int ]) – List of kernel sizes for residual blocks. Defaults to [3, 7, 11].
- resblock_dilations (List *[*List *[*int ] ]) – List of list of dilations for residual blocks. Defaults to [[1, 3, 5], [1, 3, 5], [1, 3, 5]].
- projection_filters (List *[*int ]) – List of projection filters. Defaults to [0, 1, 1, 1].
- projection_kernels (List *[*int ]) – List of projection kernels. Defaults to [0, 5, 7, 11].
- use_additional_convs (bool) – Whether to use additional conv layers in residual blocks. Defaults to True.
- bias (bool) – Whether to add bias parameter in convolution layers. Defaults to True.
- nonlinear_activation (str) – Activation function module name. Defaults to “LeakyReLU”.
- nonlinear_activation_params (Dict *[*str , Any ]) – Hyperparameters for activation function. Defaults to {“negative_slope”: 0.2}.
- use_weight_norm (bool) – Whether to use weight norm. Defaults to True.
Raises:AssertionError – If kernel size is not odd or the lengths of upsample parameters do not match.

############# Examples

>>> generator = AvocodoGenerator(in_channels=80, out_channels=1)
>>> input_tensor = torch.randn(1, 80, 100)  # Batch size of 1, 80 channels, length 100
>>> output = generator(input_tensor)
>>> print([o.shape for o in output])  # Output shapes for each upsampled tensor

Initialize AvocodoGenerator module.

Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- channels (int) – Number of hidden representation channels.
- global_channels (int) – Number of global conditioning channels.
- kernel_size (int) – Kernel size of initial and final conv layer.
- upsample_scales (List *[*int ]) – List of upsampling scales.
- upsample_kernel_sizes (List *[*int ]) – List of kernel sizes for upsample layers.
- resblock_kernel_sizes (List *[*int ]) – List of kernel sizes for residual blocks.
- resblock_dilations (List *[*List *[*int ] ]) – List of list of dilations for residual blocks.
- use_additional_convs (bool) – Whether to use additional conv layers in residual blocks.
- bias (bool) – Whether to add bias parameter in convolution layers.
- nonlinear_activation (str) – Activation function module name.
- nonlinear_activation_params (Dict *[*str , Any ]) – Hyperparameters for activation function.
- use_weight_norm (bool) – Whether to use weight norm. If set to true, it will be applied to all of the conv layers.

apply_weight_norm()

Apply weight normalization module from all of the layers.

This method applies weight normalization to all convolutional layers (Conv1d and ConvTranspose1d) within the AvocodoGenerator module. Weight normalization helps in stabilizing the training of deep neural networks by reparameterizing the weight vectors, which can improve convergence speed and model performance.

This method is automatically called during the initialization of the AvocodoGenerator class if the use_weight_norm parameter is set to True.

############# Examples

Create an instance of AvocodoGenerator with weight normalization

generator = AvocodoGenerator(use_weight_norm=True)

Create an instance without weight normalization

generator_no_norm = AvocodoGenerator(use_weight_norm=False)

######## NOTE This method should be called only after the model has been fully constructed and the layers have been defined.

forward(c: Tensor, g: Tensor | None = None) → Tensor

Calculate forward propagation.

This method computes the forward pass of the AvocodoGenerator. It takes an input tensor and, optionally, a global conditioning tensor. The output is a list of output tensors generated from the input.

Parameters:
- c (Tensor) – Input tensor of shape (B, in_channels, T).
- g (Optional *[*Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). If provided, it will be added to the input tensor after the initial convolution.
Returns: List of output tensors of shape (B, out_channels, T).
Return type: List[Tensor]

############# Examples

>>> generator = AvocodoGenerator()
>>> input_tensor = torch.randn(2, 80, 100)  # Batch size 2, 80 channels
>>> global_conditioning = torch.randn(2, 10, 1)  # Batch size 2, 10 channels
>>> outputs = generator(input_tensor, global_conditioning)
>>> for output in outputs:
...     print(output.shape)
torch.Size([2, 1, 100])  # Output shape for each output tensor

######## NOTE The number of outputs in the returned list corresponds to the number of upsampling layers defined during initialization.

remove_weight_norm()

Remove weight normalization module from all of the layers.

This method iterates through all layers of the AvocodoGenerator and removes weight normalization if it has been applied. It will log a debug message each time weight normalization is removed from a layer. If a layer does not have weight normalization, it will catch the ValueError and continue without interruption.

############# Examples

>>> generator = AvocodoGenerator(use_weight_norm=True)
>>> generator.remove_weight_norm()  # Removes weight normalization
>>> # Subsequent calls to generator will not use weight normalization.

######## NOTE This method is particularly useful when fine-tuning or modifying the model’s architecture after training.

reset_parameters()

Reset parameters.

This method initializes the weights of the convolutional layers in the generator according to the official implementation manner described in the HiFi-GAN repository: https://github.com/jik876/hifi-gan/blob/master/models.py. The weights are drawn from a normal distribution with a mean of 0 and a standard deviation of 0.01.

This method is called during the initialization of the generator to ensure that all layers start with appropriate weights.

######## NOTE The logging module is used to output debug information when resetting parameters for each layer.

############# Examples

>>> generator = AvocodoGenerator()
>>> generator.reset_parameters()  # Reset parameters to initial values