espnet2.gan_svs.visinger2.visinger2_vocoder.MultiFrequencyDiscriminator

About 2 min

espnet2.gan_svs.visinger2.visinger2_vocoder.MultiFrequencyDiscriminator

class espnet2.gan_svs.visinger2.visinger2_vocoder.MultiFrequencyDiscriminator(sample_rate: int = 22050, hop_lengths=[128, 256, 512], hidden_channels=[256, 512, 512], domain='double', mel_scale=True, divisors=[32, 16, 8, 4, 2, 1, 1], strides=[1, 2, 1, 2, 1, 2, 1])

Bases: Module

Multi-Frequency Discriminator module in UnivNet.

This module implements a multi-frequency discriminator that processes input signals through multiple STFT (Short-Time Fourier Transform) layers, enabling the extraction of frequency features across different time scales.

stfts

List of STFT layers for different hop lengths.

Type: ModuleList

domain

Domain of input signal, can be “double” or “single”.

Type: str

discriminators

List of frequency discriminators for each hop length.

Type: ModuleList
Parameters:
- sample_rate (int) – Sample rate of the input audio. Default is 22050.
- hop_lengths (list) – List of hop lengths used for STFT. Default is [128, 256, 512].
- hidden_channels (list) – List of number of channels in hidden layers. Default is [256, 512, 512].
- domain (str) – Domain of input signal. Default is “double”.
- mel_scale (bool) – Whether to use mel-scale frequency. Default is True.
- divisors (list) – List of divisors for each layer in the discriminator. Default is [32, 16, 8, 4, 2, 1, 1].
- strides (list) – List of strides for each layer in the discriminator. Default is [1, 2, 1, 2, 1, 2, 1].

####### Examples

>>> discriminator = MultiFrequencyDiscriminator()
>>> input_tensor = torch.randn(1, 1, 22050)  # Example input
>>> output_features = discriminator(input_tensor)
>>> len(output_features)  # Should return the number of discriminators

Initialize Multi-Frequency Discriminator module.

Parameters:
- hop_lengths (list) – List of hop lengths.
- hidden_channels (list) – List of number of channels in hidden layers.
- domain (str) – Domain of input signal. Default is “double”.
- mel_scale (bool) – Whether to use mel-scale frequency. Default is True.
- divisors (list) – List of divisors for each layer in the discriminator. Default is [32, 16, 8, 4, 2, 1, 1].
- strides (list) – List of strides for each layer in the discriminator. Default is [1, 2, 1, 2, 1, 2, 1].

forward(x)

Calculate forward propagation.

This method computes the forward pass of the VISinger2 vocoder generator by processing the input tensors and applying a series of convolutional and residual blocks to generate the output tensor.

Parameters:
- c (Tensor) – Input tensor (B, in_channels, T).
- ddsp (Tensor) – Input tensor (B, n_harmonic + 2, T * hop_length).
- g (Optional *[*Tensor ]) – Global conditioning tensor (B, global_channels, 1). This can be set to None if no global conditioning is used.
Returns: Output tensor (B, out_channels, T).
Return type: Tensor

####### Examples

>>> generator = VISinger2VocoderGenerator()
>>> c = torch.randn(1, 80, 100)  # Example input
>>> ddsp = torch.randn(1, 66, 400)  # Example ddsp input
>>> output = generator.forward(c, ddsp)
>>> print(output.shape)  # Should print: torch.Size([1, 1, 100])

NOTE

The input tensor c should match the expected number of input channels, while ddsp should contain the appropriate harmonic data.