espnet2.gan_svs.avocodo.avocodo.AvocodoGenerator
espnet2.gan_svs.avocodo.avocodo.AvocodoGenerator
class espnet2.gan_svs.avocodo.avocodo.AvocodoGenerator(in_channels: int = 80, out_channels: int = 1, channels: int = 512, global_channels: int = -1, kernel_size: int = 7, upsample_scales: List[int] = [8, 8, 2, 2], upsample_kernel_sizes: List[int] = [16, 16, 4, 4], resblock_kernel_sizes: List[int] = [3, 7, 11], resblock_dilations: List[List[int]] = [[1, 3, 5], [1, 3, 5], [1, 3, 5]], projection_filters: List[int] = [0, 1, 1, 1], projection_kernels: List[int] = [0, 5, 7, 11], use_additional_convs: bool = True, bias: bool = True, nonlinear_activation: str = 'LeakyReLU', nonlinear_activation_params: Dict[str, Any] = {'negative_slope': 0.2}, use_weight_norm: bool = True)
Bases: Module
Avocodo generator module for generating audio signals.
This module utilizes various convolutional layers and residual blocks to generate audio signals from input features, allowing for multi-scale and multi-resolution processing.
num_upsamples
Number of upsampling layers.
- Type: int
num_blocks
Number of residual blocks.
- Type: int
input_conv
Initial convolutional layer.
- Type:Conv1d
upsamples
List of upsampling layers.
- Type: ModuleList
blocks
List of residual blocks.
- Type: ModuleList
output_conv
List of output convolutional layers.
- Type: ModuleList
global_conv
Global conditioning convolutional layer.
Type: Optional[Conv1d]
Parameters:
- in_channels (int) – Number of input channels. Defaults to 80.
- out_channels (int) – Number of output channels. Defaults to 1.
- channels (int) – Number of hidden representation channels. Defaults to 512.
- global_channels (int) – Number of global conditioning channels. Defaults to -1.
- kernel_size (int) – Kernel size of initial and final conv layer. Defaults to 7.
- upsample_scales (List *[*int ]) – List of upsampling scales. Defaults to [8, 8, 2, 2].
- upsample_kernel_sizes (List *[*int ]) – List of kernel sizes for upsample layers. Defaults to [16, 16, 4, 4].
- resblock_kernel_sizes (List *[*int ]) – List of kernel sizes for residual blocks. Defaults to [3, 7, 11].
- resblock_dilations (List *[*List *[*int ] ]) – List of list of dilations for residual blocks. Defaults to [[1, 3, 5], [1, 3, 5], [1, 3, 5]].
- projection_filters (List *[*int ]) – List of projection filters. Defaults to [0, 1, 1, 1].
- projection_kernels (List *[*int ]) – List of projection kernels. Defaults to [0, 5, 7, 11].
- use_additional_convs (bool) – Whether to use additional conv layers in residual blocks. Defaults to True.
- bias (bool) – Whether to add bias parameter in convolution layers. Defaults to True.
- nonlinear_activation (str) – Activation function module name. Defaults to “LeakyReLU”.
- nonlinear_activation_params (Dict *[*str , Any ]) – Hyperparameters for activation function. Defaults to {“negative_slope”: 0.2}.
- use_weight_norm (bool) – Whether to use weight norm. Defaults to True.
Raises:AssertionError – If kernel size is not odd or the lengths of upsample parameters do not match.
############# Examples
>>> generator = AvocodoGenerator(in_channels=80, out_channels=1)
>>> input_tensor = torch.randn(1, 80, 100) # Batch size of 1, 80 channels, length 100
>>> output = generator(input_tensor)
>>> print([o.shape for o in output]) # Output shapes for each upsampled tensor
Initialize AvocodoGenerator module.
- Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- channels (int) – Number of hidden representation channels.
- global_channels (int) – Number of global conditioning channels.
- kernel_size (int) – Kernel size of initial and final conv layer.
- upsample_scales (List *[*int ]) – List of upsampling scales.
- upsample_kernel_sizes (List *[*int ]) – List of kernel sizes for upsample layers.
- resblock_kernel_sizes (List *[*int ]) – List of kernel sizes for residual blocks.
- resblock_dilations (List *[*List *[*int ] ]) – List of list of dilations for residual blocks.
- use_additional_convs (bool) – Whether to use additional conv layers in residual blocks.
- bias (bool) – Whether to add bias parameter in convolution layers.
- nonlinear_activation (str) – Activation function module name.
- nonlinear_activation_params (Dict *[*str , Any ]) – Hyperparameters for activation function.
- use_weight_norm (bool) – Whether to use weight norm. If set to true, it will be applied to all of the conv layers.
apply_weight_norm()
Apply weight normalization module from all of the layers.
This method applies weight normalization to all convolutional layers (Conv1d and ConvTranspose1d) within the AvocodoGenerator module. Weight normalization helps in stabilizing the training of deep neural networks by reparameterizing the weight vectors, which can improve convergence speed and model performance.
This method is automatically called during the initialization of the AvocodoGenerator class if the use_weight_norm parameter is set to True.
############# Examples
Create an instance of AvocodoGenerator with weight normalization
generator = AvocodoGenerator(use_weight_norm=True)
Create an instance without weight normalization
generator_no_norm = AvocodoGenerator(use_weight_norm=False)
######## NOTE This method should be called only after the model has been fully constructed and the layers have been defined.
forward(c: Tensor, g: Tensor | None = None) → Tensor
Calculate forward propagation.
This method computes the forward pass of the AvocodoGenerator. It takes an input tensor and, optionally, a global conditioning tensor. The output is a list of output tensors generated from the input.
- Parameters:
- c (Tensor) – Input tensor of shape (B, in_channels, T).
- g (Optional *[*Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). If provided, it will be added to the input tensor after the initial convolution.
- Returns: List of output tensors of shape (B, out_channels, T).
- Return type: List[Tensor]
############# Examples
>>> generator = AvocodoGenerator()
>>> input_tensor = torch.randn(2, 80, 100) # Batch size 2, 80 channels
>>> global_conditioning = torch.randn(2, 10, 1) # Batch size 2, 10 channels
>>> outputs = generator(input_tensor, global_conditioning)
>>> for output in outputs:
... print(output.shape)
torch.Size([2, 1, 100]) # Output shape for each output tensor
######## NOTE The number of outputs in the returned list corresponds to the number of upsampling layers defined during initialization.
remove_weight_norm()
Remove weight normalization module from all of the layers.
This method iterates through all layers of the AvocodoGenerator and removes weight normalization if it has been applied. It will log a debug message each time weight normalization is removed from a layer. If a layer does not have weight normalization, it will catch the ValueError and continue without interruption.
############# Examples
>>> generator = AvocodoGenerator(use_weight_norm=True)
>>> generator.remove_weight_norm() # Removes weight normalization
>>> # Subsequent calls to generator will not use weight normalization.
######## NOTE This method is particularly useful when fine-tuning or modifying the model’s architecture after training.
reset_parameters()
Reset parameters.
This method initializes the weights of the convolutional layers in the generator according to the official implementation manner described in the HiFi-GAN repository: https://github.com/jik876/hifi-gan/blob/master/models.py. The weights are drawn from a normal distribution with a mean of 0 and a standard deviation of 0.01.
This method is called during the initialization of the generator to ensure that all layers start with appropriate weights.
######## NOTE The logging module is used to output debug information when resetting parameters for each layer.
############# Examples
>>> generator = AvocodoGenerator()
>>> generator.reset_parameters() # Reset parameters to initial values