espnet2.layers.log_mel.LogMel
espnet2.layers.log_mel.LogMel
class espnet2.layers.log_mel.LogMel(fs: int = 16000, n_fft: int = 512, n_mels: int = 80, fmin: float | None = None, fmax: float | None = None, htk: bool = False, log_base: float | None = None)
Bases: Module
Convert STFT to log Mel filterbank features.
This module transforms short-time Fourier transform (STFT) features into log Mel filterbank features. The parameters for this class are the same as those used in librosa.filters.mel.
mel_options
Configuration options for the Mel filterbank.
- Type: dict
log_base
Base of the logarithm used for log Mel calculation.
Type: float or None
Parameters:
- fs (int) – Sampling rate of the incoming signal (default: 16000).
- n_fft (int) – Number of FFT components (default: 512).
- n_mels (int) – Number of Mel bands to generate (default: 80).
- fmin (float or None) – Lowest frequency (in Hz) (default: None). If None, defaults to 0.
- fmax (float or None) – Highest frequency (in Hz) (default: None). If None, defaults to fs / 2.0.
- htk (bool) – Use HTK formula instead of Slaney (default: False).
- log_base (float or None) – Base of the logarithm (default: None).
Returns: A tuple containing: : - logmel_feat (torch.Tensor): Log Mel filterbank features.
- ilens (torch.Tensor): Input lengths.
Return type: Tuple[torch.Tensor, torch.Tensor]
######### Examples
>>> logmel = LogMel(fs=16000, n_fft=512, n_mels=80)
>>> feat = torch.randn(2, 100, 512) # (B, T, D1)
>>> ilens = torch.tensor([100, 80]) # input lengths
>>> logmel_feat, ilens = logmel(feat, ilens)
NOTE
The Mel matrix created by librosa is different from the one used in Kaldi.
- Raises:ValueError – If any of the input parameters are invalid.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
extra_repr()
Returns a string representation of the LogMel parameters.
This method provides a detailed string representation of the LogMel instance’s parameters, specifically the mel options used for generating the Mel filter bank. It is typically used for debugging and logging purposes to understand the configuration of the LogMel instance.
mel_options
A dictionary containing the parameters used to create the Mel filter bank, including:
- sr: Sampling rate of the incoming signal
- n_fft: Number of FFT components
- n_mels: Number of Mel bands to generate
- fmin: Lowest frequency (in Hz)
- fmax: Highest frequency (in Hz)
- htk: Boolean indicating the use of HTK formula instead : of Slaney
- Returns: A comma-separated string representation of the mel options.
- Return type: str
######### Examples
>>> logmel = LogMel(fs=16000, n_fft=512, n_mels=80, fmin=0, fmax=8000)
>>> print(logmel.extra_repr())
sr=16000, n_fft=512, n_mels=80, fmin=0, fmax=8000, htk=False
forward(feat: Tensor, ilens: Tensor | None = None) → Tuple[Tensor, Tensor]
Convert STFT to log Mel spectrogram features.
This module takes the Short-Time Fourier Transform (STFT) features as input and converts them into log Mel spectrogram features. The conversion is based on the Mel filter bank, and the arguments are similar to those used in librosa.filters.mel.
mel_options
Dictionary containing the parameters for the Mel filter.
- Type: dict
log_base
Base of the logarithm used for computing log Mel.
Type: float or None
Parameters:
- fs (int) – Sampling rate of the incoming signal. Must be greater than 0.
- n_fft (int) – Number of FFT components. Must be greater than 0.
- n_mels (int) – Number of Mel bands to generate. Must be greater than 0.
- fmin (float) – Lowest frequency (in Hz). Must be greater than or equal to 0.
- fmax (float) – Highest frequency (in Hz). If None, uses fmax = fs / 2.0.
- htk (bool) – If True, uses HTK formula instead of Slaney.
- log_base (float or None) – Base of the logarithm for log Mel computation.
Returns: A tuple containing: : - logmel_feat (torch.Tensor): The computed log Mel features of shape (B, T, D2), where B is the batch size, T is the number of time frames, and D2 is the number of Mel bands.
- ilens (torch.Tensor): The lengths of the input features for each batch.
Return type: Tuple[torch.Tensor, torch.Tensor]
######### Examples
>>> logmel = LogMel()
>>> feat = torch.rand(2, 100, 512) # Example input (B=2, T=100, D1=512)
>>> ilens = torch.tensor([100, 80]) # Example lengths
>>> logmel_feat, ilens = logmel(feat, ilens)
>>> print(logmel_feat.shape) # Output: (2, 100, 80)
NOTE
- The Mel matrix generated by librosa differs from that used in Kaldi.
- The input tensor feat should have the shape (B, T, D1), where B is the batch size, T is the number of time frames, and D1 is the number of FFT components.
- Raises:ValueError – If the input tensor feat does not have the expected shape.