espnet2.gan_svs.avocodo.avocodo.CoMBDBlock

About 2 min

espnet2.gan_svs.avocodo.avocodo.CoMBDBlock

class espnet2.gan_svs.avocodo.avocodo.CoMBDBlock(h_u: List[int], d_k: List[int], d_s: List[int], d_d: List[int], d_g: List[int], d_p: List[int], op_f: int, op_k: int, op_g: int, use_spectral_norm=False)

Bases: Module

CoMBD (Collaborative Multi-band Discriminator) block module.

This module implements a collaborative multi-band discriminator block that processes input signals through a series of convolutional layers. The design allows for different configurations of kernel sizes, strides, dilations, and groups for each convolutional layer, enabling flexibility in the architecture.

convs

A list of convolutional layers defined by the input parameters.

Type: torch.nn.ModuleList

projection_conv

A final convolutional layer for output projection.

Type: torch.nn.Module
Parameters:
- h_u (List *[*int ]) – List of hidden units for each layer.
- d_k (List *[*int ]) – List of kernel sizes for each convolutional layer.
- d_s (List *[*int ]) – List of strides for each convolutional layer.
- d_d (List *[*int ]) – List of dilations for each convolutional layer.
- d_g (List *[*int ]) – List of groups for each convolutional layer.
- d_p (List *[*int ]) – List of paddings for each convolutional layer.
- op_f (int) – Number of output filters for the final projection layer.
- op_k (int) – Kernel size for the final projection layer.
- op_g (int) – Number of groups for the final projection layer.
- use_spectral_norm (bool) – Whether to apply spectral normalization to the convolutional layers.
Returns: None

####### Examples

>>> block = CoMBDBlock(
...     h_u=[16, 64, 256],
...     d_k=[3, 5, 7],
...     d_s=[1, 2, 1],
...     d_d=[1, 1, 1],
...     d_g=[1, 1, 1],
...     d_p=[1, 2, 3],
...     op_f=1,
...     op_k=3,
...     op_g=1,
...     use_spectral_norm=True
... )
>>> input_tensor = torch.randn(1, 16, 1024)  # Example input
>>> output, feature_maps = block(input_tensor)

Returns: Tuple containing the output tensor of shape (B, C_out, T_out) and a list of feature maps of shape (B, C, T) at each Conv1d layer.
Return type: Tuple[Tensor, List[Tensor]]

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Calculate forward propagation.

This method performs the forward pass of the AvocodoGenerator model, taking in an input tensor and an optional global conditioning tensor, and producing a list of output tensors.

Parameters:
- c (Tensor) – Input tensor of shape (B, in_channels, T).
- g (Optional *[*Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). If provided, it will be added to the input tensor after the initial convolution.
Returns: List of output tensors, each of shape : (B, out_channels, T).
Return type: List[Tensor]

####### Examples

>>> generator = AvocodoGenerator()
>>> input_tensor = torch.randn(1, 80, 160)  # Example input
>>> global_tensor = torch.randn(1, 256, 1)  # Example global cond.
>>> outputs = generator(input_tensor, global_tensor)
>>> print([output.shape for output in outputs])
[(1, 1, 40), (1, 1, 80)]  # Example output shapes

NOTE

The output tensors will be generated at different scales based on the upsampling strategy defined in the generator’s initialization.