espnet2.gan_svs.visinger2.visinger2_vocoder.ConvReluNorm
espnet2.gan_svs.visinger2.visinger2_vocoder.ConvReluNorm
class espnet2.gan_svs.visinger2.visinger2_vocoder.ConvReluNorm(in_channels, hidden_channels, out_channels, kernel_size, n_layers, dropout_rate)
Bases: Module
Convolutional Layer with ReLU Activation and Layer Normalization.
This module consists of multiple convolutional layers followed by layer normalization and ReLU activation with dropout. It is designed to facilitate learning in deep neural networks by maintaining stable activations through normalization and regularization via dropout.
in_channels
Number of input channels.
- Type: int
hidden_channels
Number of hidden channels.
- Type: int
out_channels
Number of output channels.
- Type: int
kernel_size
Size of the convolutional kernel.
- Type: int
n_layers
Number of convolutional layers.
- Type: int
dropout_rate
Dropout rate for regularization.
- Type: float
conv_layers
List of convolutional layers.
- Type: ModuleList
norm_layers
List of layer normalization layers.
- Type: ModuleList
relu_drop
Sequential container for ReLU and dropout.
- Type: Sequential
proj
Final projection layer.
Type:Conv1d
Parameters:
- in_channels (int) – Number of input channels.
- hidden_channels (int) – Number of hidden representation channels.
- out_channels (int) – Number of output channels.
- kernel_size (int) – Size of the convolutional kernel.
- n_layers (int) – Number of convolutional layers (must be > 1).
- dropout_rate (float) – Dropout rate for the layers.
Raises:AssertionError – If n_layers is less than or equal to 1.
####### Examples
>>> model = ConvReluNorm(in_channels=64, hidden_channels=128,
... out_channels=32, kernel_size=3,
... n_layers=4, dropout_rate=0.1)
>>> input_tensor = torch.randn(10, 64, 100) # (batch_size, channels, length)
>>> output_tensor = model(input_tensor)
>>> output_tensor.shape
torch.Size([10, 32, 100]) # Output shape
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(x)
Calculate forward propagation.
This method takes the input tensors and processes them through the generator’s architecture, producing an output tensor that can be used for audio synthesis.
- Parameters:
- c (Tensor) – Input tensor (B, in_channels, T) representing the conditioning information.
- ddsp (Tensor) – Input tensor (B, n_harmonic + 2, T * hop_length) representing the harmonic content and additional features.
- g (Optional *[*Tensor ]) – Global conditioning tensor (B, global_channels, 1). This tensor can be used to provide additional context or information for the generation process. If not provided, the model will proceed without global conditioning.
- Returns: Output tensor (B, out_channels, T) representing the : generated audio signal.
- Return type: Tensor
####### Examples
>>> generator = VISinger2VocoderGenerator()
>>> c = torch.randn(1, 80, 100) # Example conditioning tensor
>>> ddsp = torch.randn(1, 66, 800) # Example DDSP tensor
>>> output = generator(c, ddsp)
>>> print(output.shape) # Should be (1, 1, 100)
NOTE
The method applies several convolutional and upsampling layers to the input tensors, followed by residual blocks, to synthesize the output audio signal.