espnet2.gan_svs.visinger2.visinger2_vocoder.MelScale
espnet2.gan_svs.visinger2.visinger2_vocoder.MelScale
class espnet2.gan_svs.visinger2.visinger2_vocoder.MelScale(n_mels: int = 128, sample_rate: int = 24000, f_min: float = 0.0, f_max: float | None = None, n_stft: int | None = None)
Bases: Module
Turn a normal STFT into a mel frequency STFT using triangular filter banks.
User can control which device the filter bank (fb) is (e.g. fb.to(spec_f.device)).
- Parameters:
- n_mels (int , optional) – Number of mel filterbanks. (Default: 128)
- sample_rate (int , optional) – Sample rate of audio signal. (Default: 24000)
- f_min (float , optional) – Minimum frequency. (Default: 0.)
- f_max (float or None , optional) – Maximum frequency. (Default: sample_rate // 2)
- n_stft (int , optional) – Number of bins in STFT. Calculated from first input if None is given. See n_fft in :class:Spectrogram. (Default: None)
n_mels
The number of mel filterbanks.
- Type: int
sample_rate
The sample rate of the audio signal.
- Type: int
f_min
The minimum frequency.
- Type: float
f_max
The maximum frequency.
- Type: float
fb
The filter bank matrix for converting STFT to mel scale.
Type: Tensor
Returns: Mel frequency spectrogram of size (…, n_mels, time).
Return type: Tensor
####### Examples
>>> mel_scale = MelScale(n_mels=128, sample_rate=24000)
>>> spectrogram = torch.randn(1, 1025, 100) # Example STFT
>>> mel_spectrogram = mel_scale(spectrogram)
- Raises:AssertionError – If f_min is greater than f_max.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(specgram: Tensor) → Tensor
Turn a normal STFT into a mel frequency STFT using triangular filter banks.
The user can control which device the filter bank (fb) is on (e.g., fb.to(spec_f.device)).
- Parameters:
- n_mels (int , optional) – Number of mel filterbanks. (Default: 128)
- sample_rate (int , optional) – Sample rate of audio signal. (Default: 16000)
- f_min (float , optional) – Minimum frequency. (Default: 0.0)
- f_max (float or None , optional) – Maximum frequency. (Default: sample_rate // 2)
- n_stft (int , optional) – Number of bins in STFT. Calculated from first input if None is given. See n_fft in :class:Spectrogram. (Default: None)
n_mels
Number of mel filterbanks.
- Type: int
sample_rate
Sample rate of audio signal.
- Type: int
f_min
Minimum frequency.
- Type: float
f_max
Maximum frequency.
- Type: float
fb
Frequency bin conversion matrix.
Type: Tensor
Returns: Mel frequency spectrogram of size (…, n_mels, time).
Return type: Tensor
####### Examples
>>> mel_scale = MelScale(n_mels=128, sample_rate=16000)
>>> mel_spec = mel_scale(specgram)
- Raises:AssertionError – If f_min is greater than f_max.