espnet2.layers.log_mel.LogMel

About 3 min

espnet2.layers.log_mel.LogMel

class espnet2.layers.log_mel.LogMel(fs: int = 16000, n_fft: int = 512, n_mels: int = 80, fmin: float | None = None, fmax: float | None = None, htk: bool = False, log_base: float | None = None)

Bases: Module

Convert STFT to log Mel filterbank features.

This module transforms short-time Fourier transform (STFT) features into log Mel filterbank features. The parameters for this class are the same as those used in librosa.filters.mel.

mel_options

Configuration options for the Mel filterbank.

Type: dict

log_base

Base of the logarithm used for log Mel calculation.

Type: float or None
Parameters:
- fs (int) – Sampling rate of the incoming signal (default: 16000).
- n_fft (int) – Number of FFT components (default: 512).
- n_mels (int) – Number of Mel bands to generate (default: 80).
- fmin (float or None) – Lowest frequency (in Hz) (default: None). If None, defaults to 0.
- fmax (float or None) – Highest frequency (in Hz) (default: None). If None, defaults to fs / 2.0.
- htk (bool) – Use HTK formula instead of Slaney (default: False).
- log_base (float or None) – Base of the logarithm (default: None).
Returns: A tuple containing: : - logmel_feat (torch.Tensor): Log Mel filterbank features.
- ilens (torch.Tensor): Input lengths.
Return type: Tuple[torch.Tensor, torch.Tensor]

######### Examples

>>> logmel = LogMel(fs=16000, n_fft=512, n_mels=80)
>>> feat = torch.randn(2, 100, 512)  # (B, T, D1)
>>> ilens = torch.tensor([100, 80])  # input lengths
>>> logmel_feat, ilens = logmel(feat, ilens)

NOTE

The Mel matrix created by librosa is different from the one used in Kaldi.

Raises:ValueError – If any of the input parameters are invalid.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

extra_repr()

Returns a string representation of the LogMel parameters.

This method provides a detailed string representation of the LogMel instance’s parameters, specifically the mel options used for generating the Mel filter bank. It is typically used for debugging and logging purposes to understand the configuration of the LogMel instance.

mel_options

A dictionary containing the parameters used to create the Mel filter bank, including:

sr: Sampling rate of the incoming signal
n_fft: Number of FFT components
n_mels: Number of Mel bands to generate
fmin: Lowest frequency (in Hz)
fmax: Highest frequency (in Hz)
htk: Boolean indicating the use of HTK formula instead : of Slaney

Returns: A comma-separated string representation of the mel options.
Return type: str

######### Examples

>>> logmel = LogMel(fs=16000, n_fft=512, n_mels=80, fmin=0, fmax=8000)
>>> print(logmel.extra_repr())
sr=16000, n_fft=512, n_mels=80, fmin=0, fmax=8000, htk=False

forward(feat: Tensor, ilens: Tensor | None = None) → Tuple[Tensor, Tensor]

Convert STFT to log Mel spectrogram features.

This module takes the Short-Time Fourier Transform (STFT) features as input and converts them into log Mel spectrogram features. The conversion is based on the Mel filter bank, and the arguments are similar to those used in librosa.filters.mel.

mel_options

Dictionary containing the parameters for the Mel filter.

Type: dict

log_base

Base of the logarithm used for computing log Mel.

Type: float or None
Parameters:
- fs (int) – Sampling rate of the incoming signal. Must be greater than 0.
- n_fft (int) – Number of FFT components. Must be greater than 0.
- n_mels (int) – Number of Mel bands to generate. Must be greater than 0.
- fmin (float) – Lowest frequency (in Hz). Must be greater than or equal to 0.
- fmax (float) – Highest frequency (in Hz). If None, uses fmax = fs / 2.0.
- htk (bool) – If True, uses HTK formula instead of Slaney.
- log_base (float or None) – Base of the logarithm for log Mel computation.
Returns: A tuple containing: : - logmel_feat (torch.Tensor): The computed log Mel features of shape (B, T, D2), where B is the batch size, T is the number of time frames, and D2 is the number of Mel bands.
- ilens (torch.Tensor): The lengths of the input features for each batch.
Return type: Tuple[torch.Tensor, torch.Tensor]

######### Examples

>>> logmel = LogMel()
>>> feat = torch.rand(2, 100, 512)  # Example input (B=2, T=100, D1=512)
>>> ilens = torch.tensor([100, 80])  # Example lengths
>>> logmel_feat, ilens = logmel(feat, ilens)
>>> print(logmel_feat.shape)  # Output: (2, 100, 80)

NOTE

The Mel matrix generated by librosa differs from that used in Kaldi.
The input tensor feat should have the shape (B, T, D1), where B is the batch size, T is the number of time frames, and D1 is the number of FFT components.

Raises:ValueError – If the input tensor feat does not have the expected shape.