espnet2.asr.state_spaces.pool.DownPool

About 4 min

espnet2.asr.state_spaces.pool.DownPool

class espnet2.asr.state_spaces.pool.DownPool(d_input, d_output=None, expand=None, stride=1, transposed=True, weight_norm=True, initializer=None, activation=None)

Bases: SequenceModule

Downsampling layer that applies linear transformations and pooling.

This class implements a downsampling mechanism that combines linear activation with pooling operations to reduce the dimensionality of input sequences. It can be used in various neural network architectures, particularly those involving sequential data.

d_output

The dimensionality of the output after downsampling.

Type: int

stride

The factor by which to downsample the input.

Type: int

transposed

Whether to apply transposed operations.

Type: bool

linear

A linear layer that transforms the input features.

Type: LinearActivation
Parameters:
- d_input (int) – The dimensionality of the input features.
- d_output (int , optional) – The dimensionality of the output features. If not provided, it is computed as d_input * expand.
- expand (int , optional) – The factor by which to expand the output features. If provided, d_output should be None.
- stride (int , optional) – The downsampling stride. Defaults to 1.
- transposed (bool , optional) – Whether to use transposed operations. Defaults to True.
- weight_norm (bool , optional) – Whether to apply weight normalization. Defaults to True.
- initializer (callable , optional) – Function for initializing the weights. Defaults to None.
- activation (callable , optional) – Activation function to apply after the linear transformation. Defaults to None.
Returns: The downsampled output and None (placeholder for potential future states).
Return type: Tuple[torch.Tensor, None]

########### Examples

>>> down_pool = DownPool(d_input=64, d_output=32, stride=2)
>>> x = torch.randn(10, 5, 64)  # (batch_size, sequence_length, features)
>>> output, _ = down_pool(x)
>>> print(output.shape)
torch.Size([10, 3, 32])  # (batch_size, new_sequence_length, new_features)

######## NOTE Ensure that the input tensor’s shape is compatible with the specified stride and dimensionality settings.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

default_state(*batch_shape, device=None)

Downsampling module for sequences using linear activation.

This module performs downsampling on input sequences while allowing for optional expansion and transposition. The output dimension can be specified either directly or by using an expansion factor.

d_input

The input dimension of the sequence.

Type: int

d_output

The output dimension of the sequence.

Type: int

stride

The downsampling factor for the input sequence.

Type: int

transposed

Whether to apply transposed operations.

Type: bool

linear

A linear layer for transforming the downsampled output.

Type: LinearActivation
Parameters:
- d_input (int) – The dimension of the input.
- d_output (int , optional) – The dimension of the output. If None, it is computed based on the expand parameter.
- expand (int , optional) – Expansion factor for the output dimension.
- stride (int) – The downsampling factor for the input sequence.
- transposed (bool) – Whether to apply transposed operations.
- weight_norm (bool) – Whether to apply weight normalization.
- initializer (callable , optional) – A function to initialize weights.
- activation (callable , optional) – An activation function to apply after the linear transformation.
Returns: The downsampled output tensor and None.
Return type: Tuple[Tensor, None]
Raises:AssertionError – If both d_output and expand are None or both are provided.

########### Examples

>>> down_pool = DownPool(d_input=64, expand=2, stride=2)
>>> x = torch.randn(32, 10, 64)  # (batch_size, sequence_length, d_input)
>>> output, _ = down_pool(x)
>>> output.shape
torch.Size([32, 5, 128])  # (batch_size, new_sequence_length, d_output)

######## NOTE The output dimension is calculated as: d_output = d_input * expand if d_output is None.

forward(x)

Perform the forward pass of the DownPool layer.

This method applies downsampling to the input tensor x using the specified stride and expand parameters. It rearranges the tensor based on whether the operation is transposed or not and passes the result through a linear activation layer.

Parameters:x (torch.Tensor) – Input tensor of shape (…, H), where H is the feature dimension.
Returns: The output tensor after downsampling and : linear transformation.
Return type: torch.Tensor

########### Examples

>>> down_pool = DownPool(d_input=64, d_output=32, stride=2, transposed=False)
>>> x = torch.randn(10, 16, 64)  # Batch size of 10, sequence length of 16
>>> output = down_pool.forward(x)
>>> output.shape
torch.Size([10, 8, 32])  # Output shape after downsampling

######## NOTE Ensure that the input tensor is appropriately shaped and that the parameters are set correctly to avoid runtime errors.

step(x, state, **kwargs)

Downsampling layer that combines linear transformations with pooling.

This class implements a downsampling mechanism that applies a linear transformation followed by pooling. It can operate in both transposed and non-transposed modes, allowing for flexibility in sequence modeling tasks.

d_input

The dimensionality of the input features.

Type: int

d_output

The dimensionality of the output features.

Type: int

stride

The stride used for downsampling.

Type: int

transposed

Whether to apply the transformation in transposed mode.

Type: bool

linear

The linear activation layer used for transformation.

Type: LinearActivation
Parameters:
- d_input (int) – The number of input features.
- d_output (int , optional) – The number of output features. If not provided, it will be calculated based on expand.
- expand (int , optional) – Expansion factor for the output features. If provided, d_output must be None.
- stride (int , optional) – The stride for downsampling. Default is 1.
- transposed (bool , optional) – Indicates if the transformation should be transposed. Default is True.
- weight_norm (bool , optional) – Indicates whether to apply weight normalization. Default is True.
- initializer (callable , optional) – Function for weight initialization.
- activation (callable , optional) – Activation function to be used.
Returns: A tuple containing the transformed output and a state (None).
Return type: tuple

########### Examples

>>> down_pool = DownPool(d_input=64, d_output=32, stride=2)
>>> input_tensor = torch.randn(10, 20, 64)  # (batch_size, seq_len, features)
>>> output, _ = down_pool(input_tensor)
>>> output.shape
torch.Size([10, 10, 32])  # Output shape after downsampling

######## NOTE

The step method is intended for use in recurrent models where

stateful processing is required.

Ensure that the input shape is compatible with the defined stride and expand parameters.

Raises:AssertionError – If both d_output and expand are provided, or if the input shape does not match the expected dimensions.