espnet2.gan_svs.visinger2.visinger2_vocoder.ConvReluNorm

About 2 min

espnet2.gan_svs.visinger2.visinger2_vocoder.ConvReluNorm

class espnet2.gan_svs.visinger2.visinger2_vocoder.ConvReluNorm(in_channels, hidden_channels, out_channels, kernel_size, n_layers, dropout_rate)

Bases: Module

Convolutional Layer with ReLU Activation and Layer Normalization.

This module consists of multiple convolutional layers followed by layer normalization and ReLU activation with dropout. It is designed to facilitate learning in deep neural networks by maintaining stable activations through normalization and regularization via dropout.

in_channels

Number of input channels.

Type: int

hidden_channels

Number of hidden channels.

Type: int

out_channels

Number of output channels.

Type: int

kernel_size

Size of the convolutional kernel.

Type: int

n_layers

Number of convolutional layers.

Type: int

dropout_rate

Dropout rate for regularization.

Type: float

conv_layers

List of convolutional layers.

Type: ModuleList

norm_layers

List of layer normalization layers.

Type: ModuleList

relu_drop

Sequential container for ReLU and dropout.

Type: Sequential

proj

Final projection layer.

Type:Conv1d
Parameters:
- in_channels (int) – Number of input channels.
- hidden_channels (int) – Number of hidden representation channels.
- out_channels (int) – Number of output channels.
- kernel_size (int) – Size of the convolutional kernel.
- n_layers (int) – Number of convolutional layers (must be > 1).
- dropout_rate (float) – Dropout rate for the layers.
Raises:AssertionError – If n_layers is less than or equal to 1.

####### Examples

>>> model = ConvReluNorm(in_channels=64, hidden_channels=128,
...                       out_channels=32, kernel_size=3,
...                       n_layers=4, dropout_rate=0.1)
>>> input_tensor = torch.randn(10, 64, 100)  # (batch_size, channels, length)
>>> output_tensor = model(input_tensor)
>>> output_tensor.shape
torch.Size([10, 32, 100])  # Output shape

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Calculate forward propagation.

This method takes the input tensors and processes them through the generator’s architecture, producing an output tensor that can be used for audio synthesis.

Parameters:
- c (Tensor) – Input tensor (B, in_channels, T) representing the conditioning information.
- ddsp (Tensor) – Input tensor (B, n_harmonic + 2, T * hop_length) representing the harmonic content and additional features.
- g (Optional *[*Tensor ]) – Global conditioning tensor (B, global_channels, 1). This tensor can be used to provide additional context or information for the generation process. If not provided, the model will proceed without global conditioning.
Returns: Output tensor (B, out_channels, T) representing the : generated audio signal.
Return type: Tensor

####### Examples

>>> generator = VISinger2VocoderGenerator()
>>> c = torch.randn(1, 80, 100)  # Example conditioning tensor
>>> ddsp = torch.randn(1, 66, 800)  # Example DDSP tensor
>>> output = generator(c, ddsp)
>>> print(output.shape)  # Should be (1, 1, 100)

NOTE

The method applies several convolutional and upsampling layers to the input tensors, followed by residual blocks, to synthesize the output audio signal.