espnet2.layers.sinc_conv.SincConv

About 5 min

espnet2.layers.sinc_conv.SincConv

class espnet2.layers.sinc_conv.SincConv(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, window_func: str = 'hamming', scale_type: str = 'mel', fs: int | float = 16000)

Bases: Module

Sinc Convolution.

This module performs a convolution using Sinc filters in the time domain as the kernel. Sinc filters function as band passes in the spectral domain. The filtering is done as a convolution in the time domain, and no transformation to the spectral domain is necessary.

This implementation of the Sinc convolution is heavily inspired by Ravanelli et al. (https://github.com/mravanelli/SincNet) and adapted for the ESpnet toolkit. It combines Sinc convolutions with a log compression activation function, as described in: https://arxiv.org/abs/2010.07597.

Notes

Currently, the same filters are applied to all input channels. The windowing function is applied on the kernel to obtain a smoother filter and not on the input values, which is different from traditional ASR.

in_channels

Number of input channels.

Type: int

out_channels

Number of output channels.

Type: int

kernel_size

Sinc filter kernel size (must be odd).

Type: int

stride

Stride for the convolution.

Type: int

padding

Padding for the convolution.

Type: int

dilation

Dilation for the convolution.

Type: int

window_func

Window function applied to the filter.

Type: callable

scale

Scale type for frequency representation.

Type: callable

Sample rate of the input data.

Type: float

sinc_filters

Calculated Sinc filters for convolution.

Type: torch.Tensor
Parameters:
- in_channels – Number of input channels.
- out_channels – Number of output channels.
- kernel_size – Sinc filter kernel size (needs to be an odd number).
- stride – See torch.nn.functional.conv1d.
- padding – See torch.nn.functional.conv1d.
- dilation – See torch.nn.functional.conv1d.
- window_func – Window function on the filter, one of [“hamming”, “none”].
- fs – Sample rate of the input data.
Raises:
- NotImplementedError – If an unsupported window function or scale type is
- specified. –
- ValueError – If the kernel size is not an odd number.

############### Examples

>>> sinc_conv = SincConv(in_channels=1, out_channels=16, kernel_size=31)
>>> input_tensor = torch.randn(10, 1, 160)  # Batch size 10, 1 channel, 160 samples
>>> output_tensor = sinc_conv(input_tensor)
>>> output_tensor.shape
torch.Size([10, 16, D_out])  # D_out depends on padding and stride

Initialize Sinc convolutions.

Parameters:
- in_channels – Number of input channels.
- out_channels – Number of output channels.
- kernel_size – Sinc filter kernel size (needs to be an odd number).
- stride – See torch.nn.functional.conv1d.
- padding – See torch.nn.functional.conv1d.
- dilation – See torch.nn.functional.conv1d.
- window_func – Window function on the filter, one of [“hamming”, “none”].
- fs (str , int , float) – Sample rate of the input data

forward(xs: Tensor) → Tensor

Sinc convolution forward function.

Applies the Sinc convolution operation to the input tensor xs. The input tensor should have the shape (B, C_in, D_in), where B is the batch size, C_in is the number of input channels, and D_in is the input dimension. The output tensor will have the shape (B, C_out, D_out), where C_out is the number of output channels and D_out is the output dimension calculated based on the input size, stride, padding, and dilation.

Parameters:xs – Batch in form of torch.Tensor (B, C_in, D_in).
Returns: Batch in form of torch.Tensor (B, C_out, D_out).
Return type: torch.Tensor

############### Examples

>>> sinc_conv = SincConv(in_channels=1, out_channels=2,
...                       kernel_size=31)
>>> input_tensor = torch.randn(10, 1, 100)  # (B, C_in, D_in)
>>> output_tensor = sinc_conv(input_tensor)
>>> print(output_tensor.shape)  # Output shape: (10, 2, D_out)

NOTE

This method requires that the Sinc filters be created before performing the convolution, which is handled internally by calling _create_filters().

Raises:
- RuntimeError – If the input tensor does not have the expected
- shape. –

get_odim(idim: int) → int

Obtain the output dimension of the filter.

static hamming_window(x: Tensor) → Tensor

Hamming Windowing function.

This function computes the Hamming window, which is a type of tapering function used to smooth the Sinc filter coefficients. The Hamming window is particularly effective in reducing spectral leakage in frequency analysis.

Parameters:x – A tensor of shape (N,) where N is the number of points to generate the window.
Returns: A tensor containing the Hamming window values of : the same shape as the input tensor x.
Return type: torch.Tensor

############### Examples

>>> import torch
>>> window_size = 10
>>> x = torch.linspace(0, window_size - 1, window_size)
>>> hamming_window = SincConv.hamming_window(x)
>>> print(hamming_window)
tensor([0.08, 0.54, 0.92, 1.00, 0.92, 0.54, 0.08, 0.00, 0.00, 0.00])

init_filters()

Initialize filters with filterbank values.

This method computes the initial filterbank values based on the specified scale (Mel or Bark) and the sample rate. The resulting filterbank is stored as a parameter that can be learned during training.

Raises:NotImplementedError – If the specified scale type is not supported.

############### Examples

>>> sinc_conv = SincConv(in_channels=1, out_channels=10, kernel_size=51)
>>> sinc_conv.init_filters()
>>> sinc_conv.f.shape
torch.Size([10, 2])  # Shape of the filterbank parameters

static none_window(x: Tensor) → Tensor

Identity-like windowing function.

This function applies an identity transformation to the input tensor x, effectively returning a tensor of ones with the same shape as x. This means that no windowing effect is applied to the filter.

Parameters:x – A tensor for which the windowing function is applied.
Returns: A tensor of ones with the same shape as input x.
Return type: torch.Tensor

############### Examples

>>> input_tensor = torch.tensor([1.0, 2.0, 3.0])
>>> output_tensor = SincConv.none_window(input_tensor)
>>> print(output_tensor)
tensor([1., 1., 1.])

static sinc(x: Tensor) → Tensor

Sinc Convolution.

This implementation of the Sinc convolution is heavily inspired by Ravanelli et al. https://github.com/mravanelli/SincNet, and adapted for the ESpnet toolkit. Combine Sinc convolutions with a log compression activation function, as in: https://arxiv.org/abs/2010.07597

Notes

Currently, the same filters are applied to all input channels. The windowing function is applied on the kernel to obtain a smoother filter, and not on the input values, which is different from traditional ASR.

in_channels

Number of input channels.

Type: int

out_channels

Number of output channels.

Type: int

kernel_size

Sinc filter kernel size (needs to be an odd number).

Type: int

stride

Stride for the convolution.

Type: int

padding

Padding for the convolution.

Type: int

dilation

Dilation for the convolution.

Type: int

window_func

Window function applied to the filter.

Type: callable

scale

Scale type for frequency mapping.

Type: callable

Sample rate of the input data.

Type: float

sinc_filters

Sinc filters used for convolution.

Type: torch.Tensor
Parameters:
- in_channels – Number of input channels.
- out_channels – Number of output channels.
- kernel_size – Sinc filter kernel size (needs to be an odd number).
- stride – See torch.nn.functional.conv1d.
- padding – See torch.nn.functional.conv1d.
- dilation – See torch.nn.functional.conv1d.
- window_func – Window function on the filter, one of [“hamming”, “none”].
- fs (str , int , float) – Sample rate of the input data.
Raises:
- NotImplementedError – If the specified window function or scale type
- is not supported. –
- ValueError – If the kernel size is not odd.

############### Examples

>>> sinc_conv = SincConv(in_channels=1, out_channels=2, kernel_size=31)
>>> input_tensor = torch.randn(1, 1, 100)  # (B, C_in, D_in)
>>> output_tensor = sinc_conv(input_tensor)
>>> print(output_tensor.shape)  # (B, C_out, D_out)