espnet2.gan_svs.avocodo.avocodo.SBD
espnet2.gan_svs.avocodo.avocodo.SBD
class espnet2.gan_svs.avocodo.avocodo.SBD(h, use_spectral_norm=False)
Bases: Module
SBD (Sub-band Discriminator) from https://arxiv.org/pdf/2206.13404.pdf
This module implements a Sub-band Discriminator designed for audio processing tasks, utilizing multi-band analysis to discriminate between real and generated audio signals. It processes the input signals using a series of convolutional blocks, which are structured to capture the characteristics of different frequency bands.
config
Configuration parameters for the SBD.
- Type:MDCDConfig
pqmf
Perfect Reconstruction Filter Bank for signal analysis.
- Type:PQMF
f_pqmf
Secondary PQMF for frequency analysis, if needed.
- Type:PQMF or None
discriminators
List of sub-band discriminators.
Type: ModuleList
Parameters:
- h (Dict *[*str , Any ]) – Configuration dictionary containing filter, kernel sizes, dilations, strides, band ranges, and transpose parameters for the discriminators.
- use_spectral_norm (bool) – Flag to indicate whether to use spectral normalization for the convolutional layers.
Returns: None
####### Examples
>>> sbd_config = {
... "sbd_filters": [[64, 128], [128, 256]],
... "sbd_kernel_sizes": [[[3, 3], [3, 3]]],
... "sbd_dilations": [[[1, 2], [1, 2]]],
... "sbd_strides": [[1, 1]],
... "sbd_band_ranges": [[0, 1]],
... "sbd_transpose": [False],
... "pqmf_config": {"sbd": [16, 256, 0.03, 10.0]}
... }
>>> sbd = SBD(sbd_config)
>>> real_audio = torch.randn(1, 1, 8192)
>>> fake_audio = torch.randn(1, 1, 8192)
>>> real_outputs, fake_outputs, real_fmaps, fake_fmaps = sbd(real_audio, fake_audio)
NOTE
The SBD is part of a larger framework designed for GAN-based audio synthesis, and it is specifically tailored for evaluating the quality of generated audio signals.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(y, y_hat)
Calculate forward propagation.
This method performs the forward pass of the AvocodoGenerator module. It takes an input tensor and, optionally, a global conditioning tensor, and computes the output tensors through a series of convolutional layers.
- Parameters:
- c (Tensor) – Input tensor of shape (B, in_channels, T).
- g (Optional *[*Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). If provided, it is added to the input tensor after the initial convolution.
- Returns: A list of output tensors, each of shape : (B, out_channels, T), representing the generated outputs.
- Return type: List[Tensor]
####### Examples
>>> generator = AvocodoGenerator()
>>> input_tensor = torch.randn(1, 80, 100) # (B, in_channels, T)
>>> global_tensor = torch.randn(1, 256, 1) # (B, global_channels, 1)
>>> output = generator(input_tensor, global_tensor)
>>> len(output) # Output will be a list of tensors
4
NOTE
The input tensor should have the specified number of input channels and the global conditioning tensor should have the specified number of global channels if provided.