espnet2.gan_svs.avocodo.avocodo.SBDBlock

About 2 min

espnet2.gan_svs.avocodo.avocodo.SBDBlock

class espnet2.gan_svs.avocodo.avocodo.SBDBlock(segment_dim, strides, filters, kernel_size, dilations, use_spectral_norm=False)

Bases: Module

SBD (Sub-band Discriminator) Block.

This class implements a sub-band discriminator block, which applies multiple dilated convolutions to input audio segments to capture multi-resolution features. It is part of a larger framework designed for audio synthesis and evaluation.

convs

List of convolutional layers for processing input features.

Type: torch.nn.ModuleList

post_conv

Final convolutional layer for output projection.

Type: torch.nn.Module
Parameters:
- segment_dim (int) – Dimension of the input segments.
- strides (List *[*int ]) – List of strides for each convolutional layer.
- filters (List *[*int ]) – List of output filter sizes for each layer.
- kernel_size (List *[*List *[*int ] ]) – List of kernel sizes for each layer.
- dilations (List *[*List *[*int ] ]) – List of dilation rates for each layer.
- use_spectral_norm (bool) – Whether to apply spectral normalization to the convolutional layers.
Returns: A tuple containing the output tensor of shape (B, C_out, T_out) and a list of feature maps of shape (B, C, T) at each Conv1d layer.
Return type: Tuple[Tensor, List[Tensor]]

####### Examples

>>> sbd_block = SBDBlock(segment_dim=64, strides=[1, 1],
...                       filters=[32, 64],
...                       kernel_size=[[3, 3], [3, 3]],
...                       dilations=[[1, 2], [1, 2]])
>>> input_tensor = torch.randn(10, 64, 128)  # (B, C_in, T_in)
>>> output, feature_maps = sbd_block(input_tensor)
>>> print(output.shape)  # Should output (10, 1, T_out)
>>> print(len(feature_maps))  # Should match number of conv layers

NOTE

The output shape (B, C_out, T_out) will depend on the input dimensions, the strides, and the kernel sizes used in the convolutional layers.

Raises:
- ValueError – If the input dimensions do not match expected shapes
- for the convolutional layers. –

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Calculate forward propagation.

This method performs the forward pass of the AvocodoGenerator model. It takes an input tensor c and an optional global conditioning tensor g, processes them through the defined layers, and returns the output tensor.

Parameters:
- c (Tensor) – Input tensor of shape (B, in_channels, T), where B is the batch size, in_channels is the number of input channels, and T is the length of the input sequence.
- g (Optional *[*Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). If provided, it is added to the output of the initial convolution layer.
Returns: List of output tensors, each of shape (B, out_channels, T), : where each tensor corresponds to an upsampling stage in the generator.
Return type: List[Tensor]

####### Examples

>>> generator = AvocodoGenerator()
>>> input_tensor = torch.randn(8, 80, 100)  # Batch of 8, 80 channels, 100 time steps
>>> output = generator(input_tensor)
>>> len(output)  # Number of outputs corresponds to the number of upsample stages
4

NOTE

The output tensors from the generator correspond to different scales of the generated audio signal.