espnet2.gan_svs.avocodo.avocodo.SBDBlock
espnet2.gan_svs.avocodo.avocodo.SBDBlock
class espnet2.gan_svs.avocodo.avocodo.SBDBlock(segment_dim, strides, filters, kernel_size, dilations, use_spectral_norm=False)
Bases: Module
SBD (Sub-band Discriminator) Block.
This class implements a sub-band discriminator block, which applies multiple dilated convolutions to input audio segments to capture multi-resolution features. It is part of a larger framework designed for audio synthesis and evaluation.
convs
List of convolutional layers for processing input features.
- Type: torch.nn.ModuleList
post_conv
Final convolutional layer for output projection.
Type: torch.nn.Module
Parameters:
- segment_dim (int) – Dimension of the input segments.
- strides (List *[*int ]) – List of strides for each convolutional layer.
- filters (List *[*int ]) – List of output filter sizes for each layer.
- kernel_size (List *[*List *[*int ] ]) – List of kernel sizes for each layer.
- dilations (List *[*List *[*int ] ]) – List of dilation rates for each layer.
- use_spectral_norm (bool) – Whether to apply spectral normalization to the convolutional layers.
Returns: A tuple containing the output tensor of shape (B, C_out, T_out) and a list of feature maps of shape (B, C, T) at each Conv1d layer.
Return type: Tuple[Tensor, List[Tensor]]
####### Examples
>>> sbd_block = SBDBlock(segment_dim=64, strides=[1, 1],
... filters=[32, 64],
... kernel_size=[[3, 3], [3, 3]],
... dilations=[[1, 2], [1, 2]])
>>> input_tensor = torch.randn(10, 64, 128) # (B, C_in, T_in)
>>> output, feature_maps = sbd_block(input_tensor)
>>> print(output.shape) # Should output (10, 1, T_out)
>>> print(len(feature_maps)) # Should match number of conv layers
NOTE
The output shape (B, C_out, T_out) will depend on the input dimensions, the strides, and the kernel sizes used in the convolutional layers.
- Raises:
- ValueError – If the input dimensions do not match expected shapes
- for the convolutional layers. –
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(x)
Calculate forward propagation.
This method performs the forward pass of the AvocodoGenerator model. It takes an input tensor c and an optional global conditioning tensor g, processes them through the defined layers, and returns the output tensor.
- Parameters:
- c (Tensor) – Input tensor of shape (B, in_channels, T), where B is the batch size, in_channels is the number of input channels, and T is the length of the input sequence.
- g (Optional *[*Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). If provided, it is added to the output of the initial convolution layer.
- Returns: List of output tensors, each of shape (B, out_channels, T), : where each tensor corresponds to an upsampling stage in the generator.
- Return type: List[Tensor]
####### Examples
>>> generator = AvocodoGenerator()
>>> input_tensor = torch.randn(8, 80, 100) # Batch of 8, 80 channels, 100 time steps
>>> output = generator(input_tensor)
>>> len(output) # Number of outputs corresponds to the number of upsample stages
4
NOTE
The output tensors from the generator correspond to different scales of the generated audio signal.