espnet2.asr.preencoder.sinc.SpatialDropout
espnet2.asr.preencoder.sinc.SpatialDropout
class espnet2.asr.preencoder.sinc.SpatialDropout(dropout_probability: float = 0.15, shape: tuple | list | None = None)
Bases: Module
Spatial dropout module.
This module applies dropout to the entire channels of input tensors with shape (B, C, D), where B is the batch size, C is the number of channels, and D is the dimension of the data. This is particularly useful for regularizing deep learning models by preventing overfitting.
dropout
An instance of torch.nn.Dropout2d for applying dropout.
shape
The shape of the input tensors after permutation.
- Parameters:
- dropout_probability (float) – Probability of an element being zeroed. Default is 0.15.
- shape (Optional *[*Union *[*tuple , list ] ]) – The desired shape of the input tensors. Default is (0, 2, 1).
####### Examples
>>> spatial_dropout = SpatialDropout(dropout_probability=0.2,
... shape=(0, 2, 1))
>>> input_tensor = torch.randn(10, 3, 5) # Example input
>>> output_tensor = spatial_dropout(input_tensor)
>>> output_tensor.shape
torch.Size([10, 3, 5])
NOTE
The input tensor should be of shape (B, C, D). The shape parameter determines how the dimensions are permuted before applying dropout.
- Raises:ValueError – If the shape parameter is not of type tuple or list.
Initialize.
- Parameters:
- dropout_probability – Dropout probability.
- shape (tuple , list) – Shape of input tensors.
forward(x: Tensor) → Tensor
Apply Lightweight Sinc Convolutions.
This method processes the input tensor through a series of lightweight sinc convolutions. The input tensor should be formatted as (B, T, C_in, D_in), where:
- B: Batch size
- T: Time dimension
- C_in: Number of input channels
- D_in: Feature dimension
The output tensor will have the shape (B, T, C_out * D_out), where:
- C_out: Number of output channels (initialized in the class)
- D_out: Output feature dimension (currently fixed to 1).
Note that the current implementation only supports an input feature dimension of D_in=400, resulting in D_out=1. In the case of multichannel input, the output channel size is computed as the product of the number of output channels and the number of input channels.
- Parameters:
- input (torch.Tensor) – Input tensor of shape (B, T, C_in, D_in).
- input_lengths (torch.Tensor) – Lengths of each input sequence in the batch.
- Returns: A tuple containing: : - output (torch.Tensor): Processed output tensor of shape (B, T, C_out * D_out).
- input_lengths (torch.Tensor): The same input lengths tensor, unchanged.
- Return type: Tuple[torch.Tensor, torch.Tensor]
####### Examples
>>> model = LightweightSincConvs()
>>> input_tensor = torch.randn(16, 100, 1, 400) # Example input
>>> input_lengths = torch.tensor([100] * 16) # Example lengths
>>> output, lengths = model.forward(input_tensor, input_lengths)
>>> print(output.shape) # Output shape: (16, 100, C_out)