espnet2.layers.sinc_conv.MelScale
espnet2.layers.sinc_conv.MelScale
class espnet2.layers.sinc_conv.MelScale
Bases: object
Mel frequency scale.
This class provides methods for converting between Hertz and Mel frequency scales, as well as generating filter banks based on the Mel scale.
The Mel scale is a perceptual scale of pitches which approximates the human ear’s response to different frequencies. It is widely used in audio processing, particularly in speech and music analysis.
convert(f)
Convert frequency in Hertz to Mel scale.
invert(x)
Convert frequency in Mel scale back to Hertz.
bank(channels, fs)
Obtain initialization values for the Mel scale filter bank.
- Parameters:
- channels – Number of channels for the filter bank.
- fs – Sample rate of the input signal.
- Returns: Filter start frequencies and stop frequencies.
- Return type: torch.Tensor
######### Examples
>>> mel = MelScale()
>>> mel_freq = mel.convert(torch.tensor([1000.0])) # Convert Hz to Mel
>>> hz_freq = mel.invert(mel_freq) # Convert Mel back to Hz
>>> filter_bank = mel.bank(channels=40, fs=16000) # Create Mel filter bank
classmethod bank(channels: int, fs: float) → Tensor
Sinc convolutions.
This module contains implementations of the Log Compression activation function, Sinc convolution, and the Mel and Bark frequency scales. The Sinc convolution is performed using Sinc filters in the time domain, acting as band passes in the spectral domain.
The filtering is done through convolution in the time domain, eliminating the need for transformations to the spectral domain. This implementation is inspired by Ravanelli et al. and adapted for the ESpnet toolkit. It combines Sinc convolutions with a log compression activation function.
Notes: Currently, the same filters are applied to all input channels. The windowing function is applied on the kernel to obtain a smoother filter, differing from traditional Automatic Speech Recognition (ASR).
Classes:
- LogCompression: Log Compression Activation Function.
- SincConv: Sinc Convolution Layer.
- MelScale: Mel Frequency Scale.
- BarkScale: Bark Frequency Scale.
static convert(f)
Convert Hz to mel.
This method converts a frequency in Hertz to the corresponding value in the mel scale using the formula:
mel = 1125 * log(f / 700 + 1)
- Parameters:f – A tensor representing frequency values in Hertz.
- Returns: A tensor containing the converted values in the mel scale.
######### Examples
>>> import torch
>>> mel_values = MelScale.convert(torch.tensor([440.0, 880.0]))
>>> print(mel_values)
tensor([ ale in tensor format , ale in tensor format ])
static invert(x)
Convert mel to Hz.
This function takes a tensor of values in the mel scale and converts them back to their corresponding frequencies in Hertz (Hz) using the inverse transformation of the mel scale.
- Parameters:x (torch.Tensor) – A tensor containing values in the mel scale.
- Returns: A tensor containing the corresponding frequencies in Hz.
- Return type: torch.Tensor
######### Examples
>>> mel_values = torch.tensor([0.0, 100.0, 200.0])
>>> hz_values = MelScale.invert(mel_values)
>>> print(hz_values)
tensor([ 0.0000, 700.0000, 1415.0000])