espnet2.gan_svs.avocodo.avocodo.CoMBD

About 1 min

espnet2.gan_svs.avocodo.avocodo.CoMBD

class espnet2.gan_svs.avocodo.avocodo.CoMBD(h, pqmf_list=None, use_spectral_norm=False)

Bases: Module

CoMBD (Collaborative Multi-band Discriminator) module.

This module implements the Collaborative Multi-band Discriminator as described in the paper: https://arxiv.org/abs/2206.13404. It processes input signals using a series of convolutional blocks, which can be configured for spectral normalization.

Configuration parameters for the CoMBD module.

Type: dict

pqmf

List of PQMF instances for signal analysis.

Type: List[PQMF]
Parameters:
- h (dict) – A dictionary containing configuration parameters such as “combd_h_u”, “combd_d_k”, “combd_d_s”, “combd_d_d”, “combd_d_g”, “combd_d_p”, “combd_op_f”, “combd_op_k”, and “combd_op_g”.
- pqmf_list (Optional *[*List [PQMF ] ]) – List of PQMF instances. If None, default PQMF instances will be created based on h.
- use_spectral_norm (bool) – Flag to determine whether to use spectral normalization in convolutional layers.

####### Examples

>>> combd_params = {
...     "combd_h_u": [[16, 64, 256], [16, 64, 256]],
...     "combd_d_k": [[7, 11], [11, 21]],
...     "combd_d_s": [[1, 1], [1, 1]],
...     "combd_d_d": [[1, 1], [1, 1]],
...     "combd_d_g": [[1, 4], [1, 4]],
...     "combd_d_p": [[3, 5], [5, 10]],
...     "combd_op_f": [1, 1],
...     "combd_op_k": [3, 3],
...     "combd_op_g": [1, 1],
... }
>>> combd = CoMBD(combd_params)
>>> output_real, output_fake, fmaps_real, fmaps_fake = combd(ys, ys_hat)

Returns: A tuple containing the output tensors for real and fake signals, along with the feature maps for each Conv1d layer for both real and fake signals.
Return type: Tuple[List[Tensor], List[Tensor], List[List[Tensor]], List[List[Tensor]]]

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(ys, ys_hat)

Calculate forward propagation.

This method performs the forward pass of the AvocodoGenerator, which takes an input tensor and optionally a global conditioning tensor. It applies a series of convolutional layers, upsampling, and residual blocks to generate the output tensors.

Parameters:
- c (Tensor) – Input tensor of shape (B, in_channels, T).
- g (Optional *[*Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). If provided, it is added to the input tensor after the initial convolution.
Returns: List of output tensors, each of shape (B, out_channels, T).
Return type: List[Tensor]

####### Examples

>>> generator = AvocodoGenerator()
>>> input_tensor = torch.randn(8, 80, 100)  # Batch size 8, 80 channels, T=100
>>> global_tensor = torch.randn(8, 256, 1)  # Global conditioning
>>> outputs = generator(input_tensor, global_tensor)
>>> for output in outputs:
...     print(output.shape)  # Should print shapes corresponding to output channels

NOTE

The number of output tensors depends on the upsampling configuration of the generator. Each output tensor corresponds to a different stage of the upsampling process.