espnet2.gan_svs.visinger2.visinger2_vocoder.MultiFrequencyDiscriminator
espnet2.gan_svs.visinger2.visinger2_vocoder.MultiFrequencyDiscriminator
class espnet2.gan_svs.visinger2.visinger2_vocoder.MultiFrequencyDiscriminator(sample_rate: int = 22050, hop_lengths=[128, 256, 512], hidden_channels=[256, 512, 512], domain='double', mel_scale=True, divisors=[32, 16, 8, 4, 2, 1, 1], strides=[1, 2, 1, 2, 1, 2, 1])
Bases: Module
Multi-Frequency Discriminator module in UnivNet.
This module implements a multi-frequency discriminator that processes input signals through multiple STFT (Short-Time Fourier Transform) layers, enabling the extraction of frequency features across different time scales.
stfts
List of STFT layers for different hop lengths.
- Type: ModuleList
domain
Domain of input signal, can be “double” or “single”.
- Type: str
discriminators
List of frequency discriminators for each hop length.
Type: ModuleList
Parameters:
- sample_rate (int) – Sample rate of the input audio. Default is 22050.
- hop_lengths (list) – List of hop lengths used for STFT. Default is [128, 256, 512].
- hidden_channels (list) – List of number of channels in hidden layers. Default is [256, 512, 512].
- domain (str) – Domain of input signal. Default is “double”.
- mel_scale (bool) – Whether to use mel-scale frequency. Default is True.
- divisors (list) – List of divisors for each layer in the discriminator. Default is [32, 16, 8, 4, 2, 1, 1].
- strides (list) – List of strides for each layer in the discriminator. Default is [1, 2, 1, 2, 1, 2, 1].
####### Examples
>>> discriminator = MultiFrequencyDiscriminator()
>>> input_tensor = torch.randn(1, 1, 22050) # Example input
>>> output_features = discriminator(input_tensor)
>>> len(output_features) # Should return the number of discriminators
Initialize Multi-Frequency Discriminator module.
- Parameters:
- hop_lengths (list) – List of hop lengths.
- hidden_channels (list) – List of number of channels in hidden layers.
- domain (str) – Domain of input signal. Default is “double”.
- mel_scale (bool) – Whether to use mel-scale frequency. Default is True.
- divisors (list) – List of divisors for each layer in the discriminator. Default is [32, 16, 8, 4, 2, 1, 1].
- strides (list) – List of strides for each layer in the discriminator. Default is [1, 2, 1, 2, 1, 2, 1].
forward(x)
Calculate forward propagation.
This method computes the forward pass of the VISinger2 vocoder generator by processing the input tensors and applying a series of convolutional and residual blocks to generate the output tensor.
- Parameters:
- c (Tensor) – Input tensor (B, in_channels, T).
- ddsp (Tensor) – Input tensor (B, n_harmonic + 2, T * hop_length).
- g (Optional *[*Tensor ]) – Global conditioning tensor (B, global_channels, 1). This can be set to None if no global conditioning is used.
- Returns: Output tensor (B, out_channels, T).
- Return type: Tensor
####### Examples
>>> generator = VISinger2VocoderGenerator()
>>> c = torch.randn(1, 80, 100) # Example input
>>> ddsp = torch.randn(1, 66, 400) # Example ddsp input
>>> output = generator.forward(c, ddsp)
>>> print(output.shape) # Should print: torch.Size([1, 1, 100])
NOTE
The input tensor c should match the expected number of input channels, while ddsp should contain the appropriate harmonic data.