espnet2.layers.sinc_conv.SincConv
espnet2.layers.sinc_conv.SincConv
class espnet2.layers.sinc_conv.SincConv(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, window_func: str = 'hamming', scale_type: str = 'mel', fs: int | float = 16000)
Bases: Module
Sinc Convolution.
This module performs a convolution using Sinc filters in the time domain as the kernel. Sinc filters function as band passes in the spectral domain. The filtering is done as a convolution in the time domain, and no transformation to the spectral domain is necessary.
This implementation of the Sinc convolution is heavily inspired by Ravanelli et al. (https://github.com/mravanelli/SincNet) and adapted for the ESpnet toolkit. It combines Sinc convolutions with a log compression activation function, as described in: https://arxiv.org/abs/2010.07597.
Notes
Currently, the same filters are applied to all input channels. The windowing function is applied on the kernel to obtain a smoother filter and not on the input values, which is different from traditional ASR.
in_channels
Number of input channels.
- Type: int
out_channels
Number of output channels.
- Type: int
kernel_size
Sinc filter kernel size (must be odd).
- Type: int
stride
Stride for the convolution.
- Type: int
padding
Padding for the convolution.
- Type: int
dilation
Dilation for the convolution.
- Type: int
window_func
Window function applied to the filter.
- Type: callable
scale
Scale type for frequency representation.
- Type: callable
fs
Sample rate of the input data.
- Type: float
sinc_filters
Calculated Sinc filters for convolution.
Type: torch.Tensor
Parameters:
- in_channels – Number of input channels.
- out_channels – Number of output channels.
- kernel_size – Sinc filter kernel size (needs to be an odd number).
- stride – See torch.nn.functional.conv1d.
- padding – See torch.nn.functional.conv1d.
- dilation – See torch.nn.functional.conv1d.
- window_func – Window function on the filter, one of [“hamming”, “none”].
- fs – Sample rate of the input data.
Raises:
- NotImplementedError – If an unsupported window function or scale type is
- specified. –
- ValueError – If the kernel size is not an odd number.
############### Examples
>>> sinc_conv = SincConv(in_channels=1, out_channels=16, kernel_size=31)
>>> input_tensor = torch.randn(10, 1, 160) # Batch size 10, 1 channel, 160 samples
>>> output_tensor = sinc_conv(input_tensor)
>>> output_tensor.shape
torch.Size([10, 16, D_out]) # D_out depends on padding and stride
Initialize Sinc convolutions.
- Parameters:
- in_channels – Number of input channels.
- out_channels – Number of output channels.
- kernel_size – Sinc filter kernel size (needs to be an odd number).
- stride – See torch.nn.functional.conv1d.
- padding – See torch.nn.functional.conv1d.
- dilation – See torch.nn.functional.conv1d.
- window_func – Window function on the filter, one of [“hamming”, “none”].
- fs (str , int , float) – Sample rate of the input data
forward(xs: Tensor) → Tensor
Sinc convolution forward function.
Applies the Sinc convolution operation to the input tensor xs. The input tensor should have the shape (B, C_in, D_in), where B is the batch size, C_in is the number of input channels, and D_in is the input dimension. The output tensor will have the shape (B, C_out, D_out), where C_out is the number of output channels and D_out is the output dimension calculated based on the input size, stride, padding, and dilation.
- Parameters:xs – Batch in form of torch.Tensor (B, C_in, D_in).
- Returns: Batch in form of torch.Tensor (B, C_out, D_out).
- Return type: torch.Tensor
############### Examples
>>> sinc_conv = SincConv(in_channels=1, out_channels=2,
... kernel_size=31)
>>> input_tensor = torch.randn(10, 1, 100) # (B, C_in, D_in)
>>> output_tensor = sinc_conv(input_tensor)
>>> print(output_tensor.shape) # Output shape: (10, 2, D_out)
NOTE
This method requires that the Sinc filters be created before performing the convolution, which is handled internally by calling _create_filters().
- Raises:
- RuntimeError – If the input tensor does not have the expected
- shape. –
get_odim(idim: int) → int
Obtain the output dimension of the filter.
static hamming_window(x: Tensor) → Tensor
Hamming Windowing function.
This function computes the Hamming window, which is a type of tapering function used to smooth the Sinc filter coefficients. The Hamming window is particularly effective in reducing spectral leakage in frequency analysis.
- Parameters:x – A tensor of shape (N,) where N is the number of points to generate the window.
- Returns: A tensor containing the Hamming window values of : the same shape as the input tensor x.
- Return type: torch.Tensor
############### Examples
>>> import torch
>>> window_size = 10
>>> x = torch.linspace(0, window_size - 1, window_size)
>>> hamming_window = SincConv.hamming_window(x)
>>> print(hamming_window)
tensor([0.08, 0.54, 0.92, 1.00, 0.92, 0.54, 0.08, 0.00, 0.00, 0.00])
init_filters()
Initialize filters with filterbank values.
This method computes the initial filterbank values based on the specified scale (Mel or Bark) and the sample rate. The resulting filterbank is stored as a parameter that can be learned during training.
- Raises:NotImplementedError – If the specified scale type is not supported.
############### Examples
>>> sinc_conv = SincConv(in_channels=1, out_channels=10, kernel_size=51)
>>> sinc_conv.init_filters()
>>> sinc_conv.f.shape
torch.Size([10, 2]) # Shape of the filterbank parameters
static none_window(x: Tensor) → Tensor
Identity-like windowing function.
This function applies an identity transformation to the input tensor x, effectively returning a tensor of ones with the same shape as x. This means that no windowing effect is applied to the filter.
- Parameters:x – A tensor for which the windowing function is applied.
- Returns: A tensor of ones with the same shape as input x.
- Return type: torch.Tensor
############### Examples
>>> input_tensor = torch.tensor([1.0, 2.0, 3.0])
>>> output_tensor = SincConv.none_window(input_tensor)
>>> print(output_tensor)
tensor([1., 1., 1.])
static sinc(x: Tensor) → Tensor
Sinc Convolution.
This module performs a convolution using Sinc filters in the time domain as the kernel. Sinc filters function as band passes in the spectral domain. The filtering is done as a convolution in the time domain, and no transformation to the spectral domain is necessary.
This implementation of the Sinc convolution is heavily inspired by Ravanelli et al. https://github.com/mravanelli/SincNet, and adapted for the ESpnet toolkit. Combine Sinc convolutions with a log compression activation function, as in: https://arxiv.org/abs/2010.07597
Notes
Currently, the same filters are applied to all input channels. The windowing function is applied on the kernel to obtain a smoother filter, and not on the input values, which is different from traditional ASR.
in_channels
Number of input channels.
- Type: int
out_channels
Number of output channels.
- Type: int
kernel_size
Sinc filter kernel size (needs to be an odd number).
- Type: int
stride
Stride for the convolution.
- Type: int
padding
Padding for the convolution.
- Type: int
dilation
Dilation for the convolution.
- Type: int
window_func
Window function applied to the filter.
- Type: callable
scale
Scale type for frequency mapping.
- Type: callable
fs
Sample rate of the input data.
- Type: float
sinc_filters
Sinc filters used for convolution.
Type: torch.Tensor
Parameters:
- in_channels – Number of input channels.
- out_channels – Number of output channels.
- kernel_size – Sinc filter kernel size (needs to be an odd number).
- stride – See torch.nn.functional.conv1d.
- padding – See torch.nn.functional.conv1d.
- dilation – See torch.nn.functional.conv1d.
- window_func – Window function on the filter, one of [“hamming”, “none”].
- fs (str , int , float) – Sample rate of the input data.
Raises:
- NotImplementedError – If the specified window function or scale type
- is not supported. –
- ValueError – If the kernel size is not odd.
############### Examples
>>> sinc_conv = SincConv(in_channels=1, out_channels=2, kernel_size=31)
>>> input_tensor = torch.randn(1, 1, 100) # (B, C_in, D_in)
>>> output_tensor = sinc_conv(input_tensor)
>>> print(output_tensor.shape) # (B, C_out, D_out)