espnet2.asr.preencoder.sinc.SpatialDropout

About 1 min

espnet2.asr.preencoder.sinc.SpatialDropout

class espnet2.asr.preencoder.sinc.SpatialDropout(dropout_probability: float = 0.15, shape: tuple | list | None = None)

Bases: Module

Spatial dropout module.

This module applies dropout to the entire channels of input tensors with shape (B, C, D), where B is the batch size, C is the number of channels, and D is the dimension of the data. This is particularly useful for regularizing deep learning models by preventing overfitting.

dropout

An instance of torch.nn.Dropout2d for applying dropout.

shape

The shape of the input tensors after permutation.

Parameters:
- dropout_probability (float) – Probability of an element being zeroed. Default is 0.15.
- shape (Optional *[*Union *[*tuple , list ] ]) – The desired shape of the input tensors. Default is (0, 2, 1).

####### Examples

>>> spatial_dropout = SpatialDropout(dropout_probability=0.2,
...                                    shape=(0, 2, 1))
>>> input_tensor = torch.randn(10, 3, 5)  # Example input
>>> output_tensor = spatial_dropout(input_tensor)
>>> output_tensor.shape
torch.Size([10, 3, 5])

NOTE

The input tensor should be of shape (B, C, D). The shape parameter determines how the dimensions are permuted before applying dropout.

Raises:ValueError – If the shape parameter is not of type tuple or list.

Initialize.

Parameters:
- dropout_probability – Dropout probability.
- shape (tuple , list) – Shape of input tensors.

forward(x: Tensor) → Tensor

Apply Lightweight Sinc Convolutions.

This method processes the input tensor through a series of lightweight sinc convolutions. The input tensor should be formatted as (B, T, C_in, D_in), where:

B: Batch size
T: Time dimension
C_in: Number of input channels
D_in: Feature dimension

The output tensor will have the shape (B, T, C_out * D_out), where:

C_out: Number of output channels (initialized in the class)
D_out: Output feature dimension (currently fixed to 1).

Note that the current implementation only supports an input feature dimension of D_in=400, resulting in D_out=1. In the case of multichannel input, the output channel size is computed as the product of the number of output channels and the number of input channels.

Parameters:
- input (torch.Tensor) – Input tensor of shape (B, T, C_in, D_in).
- input_lengths (torch.Tensor) – Lengths of each input sequence in the batch.
Returns: A tuple containing: : - output (torch.Tensor): Processed output tensor of shape (B, T, C_out * D_out).
- input_lengths (torch.Tensor): The same input lengths tensor, unchanged.
Return type: Tuple[torch.Tensor, torch.Tensor]

####### Examples

>>> model = LightweightSincConvs()
>>> input_tensor = torch.randn(16, 100, 1, 400)  # Example input
>>> input_lengths = torch.tensor([100] * 16)  # Example lengths
>>> output, lengths = model.forward(input_tensor, input_lengths)
>>> print(output.shape)  # Output shape: (16, 100, C_out)