espnet2.gan_codec.shared.discriminator.msmpmb_discriminator.MultiScaleDiscriminator

About 1 min

espnet2.gan_codec.shared.discriminator.msmpmb_discriminator.MultiScaleDiscriminator

class espnet2.gan_codec.shared.discriminator.msmpmb_discriminator.MultiScaleDiscriminator(rate: int = 1, sample_rate: int = 44100)

Bases: Module

MultiScaleDiscriminator is a neural network module that implements a multi-scale

discriminator for evaluating the quality of audio signals. It utilizes a series of 1D convolutional layers with weight normalization to extract features from audio inputs at different scales.

convs

A list of convolutional layers for feature extraction.

Type: nn.ModuleList

conv_post

A final convolutional layer to process the output.

Type: nn.Module

sample_rate

The sampling rate of the audio input.

Type: int

rate

The downsampling rate applied to the input audio.

Type: int
Parameters:
- rate (int) – The downsampling rate for the input audio. Default is 1.
- sample_rate (int) – The original sampling rate of the audio. Default is 44100.
Returns: A list of feature maps extracted from the audio input at different layers of the network.
Return type: List[Tensor]
Raises:ValueError – If the input tensor x does not have the correct shape.

####### Examples

>>> discriminator = MultiScaleDiscriminator(rate=2, sample_rate=48000)
>>> audio_input = torch.randn(1, 1, 48000)  # Batch size 1, 1 channel, 1 sec
>>> feature_maps = discriminator(audio_input)
>>> len(feature_maps)  # Should return the number of layers + 1
7

NOTE

The input tensor should have the shape (batch_size, channels, length).

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Forward pass through the MultiScaleDiscriminator.

This method processes the input tensor x through a series of convolutional layers and returns feature maps at each stage of the network. The input audio is first resampled based on the specified sample rate and downsample rate. The processed output includes feature maps from all convolutional layers in the MultiScaleDiscriminator.

Parameters:x (torch.Tensor) – Input tensor representing the audio signal with shape (batch_size, 1, num_samples).
Returns: A list of feature maps from each convolutional : layer, including the final output shape.
Return type: List[torch.Tensor]

####### Examples

>>> discriminator = MultiScaleDiscriminator(rate=2, sample_rate=44100)
>>> audio_input = torch.randn(1, 1, 44100)  # Example audio input
>>> feature_maps = discriminator(audio_input)
>>> print(len(feature_maps))  # Output: Number of feature maps
>>> print(feature_maps[0].shape)  # Shape of the first feature map

NOTE

Ensure that the input tensor x is on the same device as the model (CPU or GPU) for the processing to work correctly.

Raises:RuntimeError – If the input tensor does not have the expected shape or is not on the correct device.