espnet2.gan_codec.shared.discriminator.msmpmb_discriminator.MultiScaleDiscriminator
espnet2.gan_codec.shared.discriminator.msmpmb_discriminator.MultiScaleDiscriminator
class espnet2.gan_codec.shared.discriminator.msmpmb_discriminator.MultiScaleDiscriminator(rate: int = 1, sample_rate: int = 44100)
Bases: Module
MultiScaleDiscriminator is a neural network module that implements a multi-scale
discriminator for evaluating the quality of audio signals. It utilizes a series of 1D convolutional layers with weight normalization to extract features from audio inputs at different scales.
convs
A list of convolutional layers for feature extraction.
- Type: nn.ModuleList
conv_post
A final convolutional layer to process the output.
- Type: nn.Module
sample_rate
The sampling rate of the audio input.
- Type: int
rate
The downsampling rate applied to the input audio.
Type: int
Parameters:
- rate (int) – The downsampling rate for the input audio. Default is 1.
- sample_rate (int) – The original sampling rate of the audio. Default is 44100.
Returns: A list of feature maps extracted from the audio input at different layers of the network.
Return type: List[Tensor]
Raises:ValueError – If the input tensor x does not have the correct shape.
####### Examples
>>> discriminator = MultiScaleDiscriminator(rate=2, sample_rate=48000)
>>> audio_input = torch.randn(1, 1, 48000) # Batch size 1, 1 channel, 1 sec
>>> feature_maps = discriminator(audio_input)
>>> len(feature_maps) # Should return the number of layers + 1
7
NOTE
The input tensor should have the shape (batch_size, channels, length).
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(x)
Forward pass through the MultiScaleDiscriminator.
This method processes the input tensor x through a series of convolutional layers and returns feature maps at each stage of the network. The input audio is first resampled based on the specified sample rate and downsample rate. The processed output includes feature maps from all convolutional layers in the MultiScaleDiscriminator.
- Parameters:x (torch.Tensor) – Input tensor representing the audio signal with shape (batch_size, 1, num_samples).
- Returns: A list of feature maps from each convolutional : layer, including the final output shape.
- Return type: List[torch.Tensor]
####### Examples
>>> discriminator = MultiScaleDiscriminator(rate=2, sample_rate=44100)
>>> audio_input = torch.randn(1, 1, 44100) # Example audio input
>>> feature_maps = discriminator(audio_input)
>>> print(len(feature_maps)) # Output: Number of feature maps
>>> print(feature_maps[0].shape) # Shape of the first feature map
NOTE
Ensure that the input tensor x is on the same device as the model (CPU or GPU) for the processing to work correctly.
- Raises:RuntimeError – If the input tensor does not have the expected shape or is not on the correct device.