espnet2.gan_svs.avocodo.avocodo.SBD

About 2 min

espnet2.gan_svs.avocodo.avocodo.SBD

class espnet2.gan_svs.avocodo.avocodo.SBD(h, use_spectral_norm=False)

Bases: Module

SBD (Sub-band Discriminator) from https://arxiv.org/pdf/2206.13404.pdf

This module implements a Sub-band Discriminator designed for audio processing tasks, utilizing multi-band analysis to discriminate between real and generated audio signals. It processes the input signals using a series of convolutional blocks, which are structured to capture the characteristics of different frequency bands.

config

Configuration parameters for the SBD.

Type:MDCDConfig

pqmf

Perfect Reconstruction Filter Bank for signal analysis.

Type:PQMF

f_pqmf

Secondary PQMF for frequency analysis, if needed.

Type:PQMF or None

discriminators

List of sub-band discriminators.

Type: ModuleList
Parameters:
- h (Dict *[*str , Any ]) – Configuration dictionary containing filter, kernel sizes, dilations, strides, band ranges, and transpose parameters for the discriminators.
- use_spectral_norm (bool) – Flag to indicate whether to use spectral normalization for the convolutional layers.
Returns: None

####### Examples

>>> sbd_config = {
...     "sbd_filters": [[64, 128], [128, 256]],
...     "sbd_kernel_sizes": [[[3, 3], [3, 3]]],
...     "sbd_dilations": [[[1, 2], [1, 2]]],
...     "sbd_strides": [[1, 1]],
...     "sbd_band_ranges": [[0, 1]],
...     "sbd_transpose": [False],
...     "pqmf_config": {"sbd": [16, 256, 0.03, 10.0]}
... }
>>> sbd = SBD(sbd_config)
>>> real_audio = torch.randn(1, 1, 8192)
>>> fake_audio = torch.randn(1, 1, 8192)
>>> real_outputs, fake_outputs, real_fmaps, fake_fmaps = sbd(real_audio, fake_audio)

NOTE

The SBD is part of a larger framework designed for GAN-based audio synthesis, and it is specifically tailored for evaluating the quality of generated audio signals.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(y, y_hat)

Calculate forward propagation.

This method performs the forward pass of the AvocodoGenerator module. It takes an input tensor and, optionally, a global conditioning tensor, and computes the output tensors through a series of convolutional layers.

Parameters:
- c (Tensor) – Input tensor of shape (B, in_channels, T).
- g (Optional *[*Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). If provided, it is added to the input tensor after the initial convolution.
Returns: A list of output tensors, each of shape : (B, out_channels, T), representing the generated outputs.
Return type: List[Tensor]

####### Examples

>>> generator = AvocodoGenerator()
>>> input_tensor = torch.randn(1, 80, 100)  # (B, in_channels, T)
>>> global_tensor = torch.randn(1, 256, 1)  # (B, global_channels, 1)
>>> output = generator(input_tensor, global_tensor)
>>> len(output)  # Output will be a list of tensors
4

NOTE

The input tensor should have the specified number of input channels and the global conditioning tensor should have the specified number of global channels if provided.