espnet2.gan_svs.avocodo.avocodo.CoMBD
espnet2.gan_svs.avocodo.avocodo.CoMBD
class espnet2.gan_svs.avocodo.avocodo.CoMBD(h, pqmf_list=None, use_spectral_norm=False)
Bases: Module
CoMBD (Collaborative Multi-band Discriminator) module.
This module implements the Collaborative Multi-band Discriminator as described in the paper: https://arxiv.org/abs/2206.13404. It processes input signals using a series of convolutional blocks, which can be configured for spectral normalization.
h
Configuration parameters for the CoMBD module.
- Type: dict
pqmf
List of PQMF instances for signal analysis.
Type: List[PQMF]
Parameters:
- h (dict) – A dictionary containing configuration parameters such as “combd_h_u”, “combd_d_k”, “combd_d_s”, “combd_d_d”, “combd_d_g”, “combd_d_p”, “combd_op_f”, “combd_op_k”, and “combd_op_g”.
- pqmf_list (Optional *[*List [PQMF ] ]) – List of PQMF instances. If None, default PQMF instances will be created based on h.
- use_spectral_norm (bool) – Flag to determine whether to use spectral normalization in convolutional layers.
####### Examples
>>> combd_params = {
... "combd_h_u": [[16, 64, 256], [16, 64, 256]],
... "combd_d_k": [[7, 11], [11, 21]],
... "combd_d_s": [[1, 1], [1, 1]],
... "combd_d_d": [[1, 1], [1, 1]],
... "combd_d_g": [[1, 4], [1, 4]],
... "combd_d_p": [[3, 5], [5, 10]],
... "combd_op_f": [1, 1],
... "combd_op_k": [3, 3],
... "combd_op_g": [1, 1],
... }
>>> combd = CoMBD(combd_params)
>>> output_real, output_fake, fmaps_real, fmaps_fake = combd(ys, ys_hat)
- Returns: A tuple containing the output tensors for real and fake signals, along with the feature maps for each Conv1d layer for both real and fake signals.
- Return type: Tuple[List[Tensor], List[Tensor], List[List[Tensor]], List[List[Tensor]]]
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(ys, ys_hat)
Calculate forward propagation.
This method performs the forward pass of the AvocodoGenerator, which takes an input tensor and optionally a global conditioning tensor. It applies a series of convolutional layers, upsampling, and residual blocks to generate the output tensors.
- Parameters:
- c (Tensor) – Input tensor of shape (B, in_channels, T).
- g (Optional *[*Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). If provided, it is added to the input tensor after the initial convolution.
- Returns: List of output tensors, each of shape (B, out_channels, T).
- Return type: List[Tensor]
####### Examples
>>> generator = AvocodoGenerator()
>>> input_tensor = torch.randn(8, 80, 100) # Batch size 8, 80 channels, T=100
>>> global_tensor = torch.randn(8, 256, 1) # Global conditioning
>>> outputs = generator(input_tensor, global_tensor)
>>> for output in outputs:
... print(output.shape) # Should print shapes corresponding to output channels
NOTE
The number of output tensors depends on the upsampling configuration of the generator. Each output tensor corresponds to a different stage of the upsampling process.