espnet2.asr.state_spaces.pool.DownPool2d

About 1 min

espnet2.asr.state_spaces.pool.DownPool2d

class espnet2.asr.state_spaces.pool.DownPool2d(d_input, d_output, stride=1, transposed=True, weight_norm=True)

Bases: SequenceModule

DownPool2d is a pooling layer that performs downsampling on 2D inputs.

This class implements a downsampling layer using average pooling followed by a linear transformation. It is designed for processing 2D inputs such as images or feature maps in a sequence-to-sequence model.

linear

A linear transformation applied after pooling.

Type: LinearActivation

pool

A tuple containing the AvgPool2d layer used for downsampling.

Type: tuple
Parameters:
- d_input (int) – The number of input features.
- d_output (int) – The number of output features after pooling.
- stride (int) – The stride of the pooling operation.
- transposed (bool) – If True, applies transposed pooling.
- weight_norm (bool) – If True, applies weight normalization to the linear layer.
Returns: The output tensor after applying downsampling and linear transformation.
Return type: Tensor

####### Examples

>>> down_pool = DownPool2d(d_input=16, d_output=8, stride=2)
>>> x = torch.randn(1, 16, 32, 32)  # Batch size of 1, 16 channels, 32x32 image
>>> output = down_pool(x)
>>> output.shape
torch.Size([1, 8, 16, 16])  # Output shape after downsampling

NOTE

This layer is typically used in models where spatial dimensions need to be reduced while retaining important features.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Forward pass for the DownPool layer.

This method applies a linear transformation to the input tensor x after rearranging it according to the specified stride and transposition settings. It is typically used to downsample the input sequence by applying the specified linear activation function.

Parameters:x (torch.Tensor) – The input tensor of shape (…, H) where H is the number of features in the input.
Returns: A tuple containing the transformed tensor and None. : The transformed tensor will have a shape based on the linear transformation applied.
Return type: tuple

####### Examples

>>> down_pool = DownPool(d_input=128, d_output=64, stride=2, transposed=False)
>>> input_tensor = torch.randn(10, 32, 128)  # (batch_size, sequence_length, features)
>>> output_tensor, _ = down_pool(input_tensor)
>>> output_tensor.shape
torch.Size([10, 16, 64])  # Expected output shape after downsampling

NOTE

The method expects that the input tensor has at least three dimensions. The transposed option allows for different handling of the input tensor shape.