espnet2.gan_svs.visinger2.visinger2_vocoder.create_fb_matrix
Less than 1 minute
espnet2.gan_svs.visinger2.visinger2_vocoder.create_fb_matrix
espnet2.gan_svs.visinger2.visinger2_vocoder.create_fb_matrix(n_freqs: int, f_min: float, f_max: float, n_mels: int, sample_rate: int, norm: str | None = None) → Tensor
Create a frequency bin conversion matrix.
This function generates a triangular filter bank matrix used for converting frequencies to mel scale. Each filter bank represents a triangular filter, and the output matrix can be used to apply these filters to a spectrogram.
- Parameters:
- n_freqs (int) – Number of frequencies to highlight/apply.
- f_min (float) – Minimum frequency (Hz).
- f_max (float) – Maximum frequency (Hz).
- n_mels (int) – Number of mel filterbanks.
- sample_rate (int) – Sample rate of the audio waveform.
- norm (Optional *[*str ]) – If ‘slaney’, divides the triangular mel weights by the width of the mel band (area normalization). (Default: None).
- Returns: Triangular filter banks (fb matrix) of size (n_freqs, n_mels), : meaning number of frequencies to highlight/apply to the number of filterbanks. Each column is a filterbank so that assuming there is a matrix A of size (…, n_freqs), the applied result would be A * create_fb_matrix(A.size(-1), …).
- Return type: Tensor
- Raises:ValueError – If norm is not None or ‘slaney’.
Examples
>>> fb_matrix = create_fb_matrix(1024, 0.0, 8000.0, 128, 22050)
>>> fb_matrix.shape
torch.Size([1024, 128])
NOTE
The triangular filter banks are constructed in a way similar to Librosa’s implementation for creating mel filter banks.