espnet2.gan_svs.visinger2.visinger2_vocoder.MelScale

About 2 min

espnet2.gan_svs.visinger2.visinger2_vocoder.MelScale

class espnet2.gan_svs.visinger2.visinger2_vocoder.MelScale(n_mels: int = 128, sample_rate: int = 24000, f_min: float = 0.0, f_max: float | None = None, n_stft: int | None = None)

Bases: Module

Turn a normal STFT into a mel frequency STFT using triangular filter banks.

User can control which device the filter bank (fb) is (e.g. fb.to(spec_f.device)).

Parameters:
- n_mels (int , optional) – Number of mel filterbanks. (Default: 128)
- sample_rate (int , optional) – Sample rate of audio signal. (Default: 24000)
- f_min (float , optional) – Minimum frequency. (Default: 0.)
- f_max (float or None , optional) – Maximum frequency. (Default: sample_rate // 2)
- n_stft (int , optional) – Number of bins in STFT. Calculated from first input if None is given. See n_fft in :class:Spectrogram. (Default: None)

n_mels

The number of mel filterbanks.

Type: int

sample_rate

The sample rate of the audio signal.

Type: int

f_min

The minimum frequency.

Type: float

f_max

The maximum frequency.

Type: float

The filter bank matrix for converting STFT to mel scale.

Type: Tensor
Returns: Mel frequency spectrogram of size (…, n_mels, time).
Return type: Tensor

####### Examples

>>> mel_scale = MelScale(n_mels=128, sample_rate=24000)
>>> spectrogram = torch.randn(1, 1025, 100)  # Example STFT
>>> mel_spectrogram = mel_scale(spectrogram)

Raises:AssertionError – If f_min is greater than f_max.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(specgram: Tensor) → Tensor

Turn a normal STFT into a mel frequency STFT using triangular filter banks.

The user can control which device the filter bank (fb) is on (e.g., fb.to(spec_f.device)).

Parameters:
- n_mels (int , optional) – Number of mel filterbanks. (Default: 128)
- sample_rate (int , optional) – Sample rate of audio signal. (Default: 16000)
- f_min (float , optional) – Minimum frequency. (Default: 0.0)
- f_max (float or None , optional) – Maximum frequency. (Default: sample_rate // 2)
- n_stft (int , optional) – Number of bins in STFT. Calculated from first input if None is given. See n_fft in :class:Spectrogram. (Default: None)

n_mels

Number of mel filterbanks.

Type: int

sample_rate

Sample rate of audio signal.

Type: int

f_min

Minimum frequency.

Type: float

f_max

Maximum frequency.

Type: float

Frequency bin conversion matrix.

Type: Tensor
Returns: Mel frequency spectrogram of size (…, n_mels, time).
Return type: Tensor

####### Examples

>>> mel_scale = MelScale(n_mels=128, sample_rate=16000)
>>> mel_spec = mel_scale(specgram)

Raises:AssertionError – If f_min is greater than f_max.