espnet2.gan_svs.avocodo.avocodo.MDC
espnet2.gan_svs.avocodo.avocodo.MDC
class espnet2.gan_svs.avocodo.avocodo.MDC(in_channels, out_channels, strides, kernel_size, dilations, use_spectral_norm=False)
Bases: Module
Multiscale Dilated Convolution module.
This class implements a multiscale dilated convolution block as described in the paper: https://arxiv.org/pdf/1609.07093.pdf. It utilizes dilated convolutions to increase the receptive field without increasing the number of parameters, making it suitable for various tasks such as time series prediction and audio processing.
d_convs
A list of dilated convolution layers.
- Type: ModuleList
post_conv
A convolution layer applied after the dilated convolutions.
- Type:Conv1d
softmax
A softmax layer applied to the output.
Type: Softmax
Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- strides (int) – Stride value for the post-convolution layer.
- kernel_size (List *[*int ]) – List of kernel sizes for the dilated convolutions.
- dilations (List *[*int ]) – List of dilation rates for the dilated convolutions.
- use_spectral_norm (bool) – Whether to use spectral normalization for the convolution layers.
####### Examples
>>> mdc = MDC(in_channels=64, out_channels=128, strides=1,
... kernel_size=[3, 5], dilations=[1, 2])
>>> input_tensor = torch.randn(10, 64, 100) # (batch_size, channels, length)
>>> output = mdc(input_tensor)
>>> print(output.shape)
torch.Size([10, 128, new_length]) # new_length depends on kernel_size
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(x)
Calculate forward propagation.
This method performs the forward pass through the Avocodo generator. It processes the input tensor c and optionally uses the global conditioning tensor g. The output is a list of tensors representing the generated outputs.
- Parameters:
- c (Tensor) – Input tensor of shape (B, in_channels, T), where B is the batch size, in_channels is the number of input channels, and T is the length of the input sequence.
- g (Optional *[*Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). This is an optional input that can provide additional context for the generation.
- Returns: A list of output tensors, each of shape : (B, out_channels, T), where out_channels is the number of output channels. The length of the list corresponds to the number of upsampling layers.
- Return type: List[Tensor]
####### Examples
>>> generator = AvocodoGenerator()
>>> input_tensor = torch.randn(2, 80, 100) # Batch size 2, 80 channels, 100 time steps
>>> global_conditioning = torch.randn(2, 64, 1) # Batch size 2, 64 global channels
>>> outputs = generator(input_tensor, global_conditioning)
>>> for output in outputs:
... print(output.shape)
torch.Size([2, 1, 100])
torch.Size([2, 1, 50])
torch.Size([2, 1, 25])
torch.Size([2, 1, 12])
NOTE
Ensure that the dimensions of the input tensor c and the global conditioning tensor g match the expected shapes.