espnet2.gan_svs.avocodo.avocodo.MDC

About 2 min

espnet2.gan_svs.avocodo.avocodo.MDC

class espnet2.gan_svs.avocodo.avocodo.MDC(in_channels, out_channels, strides, kernel_size, dilations, use_spectral_norm=False)

Bases: Module

Multiscale Dilated Convolution module.

This class implements a multiscale dilated convolution block as described in the paper: https://arxiv.org/pdf/1609.07093.pdf. It utilizes dilated convolutions to increase the receptive field without increasing the number of parameters, making it suitable for various tasks such as time series prediction and audio processing.

d_convs

A list of dilated convolution layers.

Type: ModuleList

post_conv

A convolution layer applied after the dilated convolutions.

Type:Conv1d

softmax

A softmax layer applied to the output.

Type: Softmax
Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- strides (int) – Stride value for the post-convolution layer.
- kernel_size (List *[*int ]) – List of kernel sizes for the dilated convolutions.
- dilations (List *[*int ]) – List of dilation rates for the dilated convolutions.
- use_spectral_norm (bool) – Whether to use spectral normalization for the convolution layers.

####### Examples

>>> mdc = MDC(in_channels=64, out_channels=128, strides=1,
...            kernel_size=[3, 5], dilations=[1, 2])
>>> input_tensor = torch.randn(10, 64, 100)  # (batch_size, channels, length)
>>> output = mdc(input_tensor)
>>> print(output.shape)
torch.Size([10, 128, new_length])  # new_length depends on kernel_size

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Calculate forward propagation.

This method performs the forward pass through the Avocodo generator. It processes the input tensor c and optionally uses the global conditioning tensor g. The output is a list of tensors representing the generated outputs.

Parameters:
- c (Tensor) – Input tensor of shape (B, in_channels, T), where B is the batch size, in_channels is the number of input channels, and T is the length of the input sequence.
- g (Optional *[*Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). This is an optional input that can provide additional context for the generation.
Returns: A list of output tensors, each of shape : (B, out_channels, T), where out_channels is the number of output channels. The length of the list corresponds to the number of upsampling layers.
Return type: List[Tensor]

####### Examples

>>> generator = AvocodoGenerator()
>>> input_tensor = torch.randn(2, 80, 100)  # Batch size 2, 80 channels, 100 time steps
>>> global_conditioning = torch.randn(2, 64, 1)  # Batch size 2, 64 global channels
>>> outputs = generator(input_tensor, global_conditioning)
>>> for output in outputs:
...     print(output.shape)
torch.Size([2, 1, 100])
torch.Size([2, 1, 50])
torch.Size([2, 1, 25])
torch.Size([2, 1, 12])

NOTE

Ensure that the dimensions of the input tensor c and the global conditioning tensor g match the expected shapes.