espnet2.asr.state_spaces.pool.DownLinearPool

About 3 min

espnet2.asr.state_spaces.pool.DownLinearPool

class espnet2.asr.state_spaces.pool.DownLinearPool(d_input, stride=1, expand=1, transposed=True)

Bases: SequenceModule

Applies linear downsampling to input sequences with trainable parameters.

This module performs downsampling of the input sequences using a linear transformation. The downsampling is achieved by rearranging the input tensor, applying a linear layer, and optionally transposing the output.

d_input

The dimensionality of the input feature vectors.

Type: int

stride

The factor by which to downsample the input sequences.

Type: int

expand

The factor by which to expand the output feature vectors.

Type: int

transposed

Indicates whether to apply transposed operations.

Type: bool

linear

A linear layer that transforms the input features.

Type: LinearActivation
Parameters:
- d_input (int) – The dimensionality of the input feature vectors.
- stride (int , optional) – The downsampling factor (default is 1).
- expand (int , optional) – The expansion factor for output features (default is 1).
- transposed (bool , optional) – Whether to use transposed operations (default is True).
Returns: The transformed output tensor after applying the linear transformation and downsampling.
Return type: Tensor
Raises:
- NotImplementedError – If stride or expand is greater than 1 in the
- step –

########### Examples

>>> down_linear_pool = DownLinearPool(d_input=64, stride=2, expand=1)
>>> input_tensor = torch.randn(10, 8, 64)  # (batch_size, length, d_input)
>>> output_tensor = down_linear_pool(input_tensor)
>>> output_tensor.shape
torch.Size([10, 4, 64])  # Output shape after downsampling

NOTE

The output dimensionality is computed as d_input * expand.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

property d_output

Calculates the output dimension based on input and expand factors.

This property computes the output dimension for the DownLinearPool layer by multiplying the input dimension by the expand factor. It is useful for determining the shape of the output tensor after applying the pooling operation.

Returns: The calculated output dimension.
Return type: int

########### Examples

>>> down_linear_pool = DownLinearPool(d_input=64, expand=2)
>>> print(down_linear_pool.d_output)
128

forward(x)

Applies a linear transformation with downsampling on input sequences.

This module performs a downsampling operation followed by a linear transformation. The input is reshaped based on the specified stride and expanded based on the specified parameters.

d_input

The dimensionality of the input features.

Type: int

stride

The downsampling factor along the sequence length.

Type: int

expand

The factor by which to expand the output features.

Type: int

transposed

Indicates whether to apply the transformation in transposed mode (affecting the input reshaping).

Type: bool

linear

A linear layer that applies the transformation after downsampling.

Type: LinearActivation
Parameters:
- d_input (int) – Number of input features.
- stride (int) – Factor to downsample the input (default: 1).
- expand (int) – Factor to expand the output features (default: 1).
- transposed (bool) – Whether to apply the linear layer in transposed mode (default: True).
Returns: The transformed output after applying downsampling and the linear transformation.
Return type: torch.Tensor

########### Examples

>>> down_linear_pool = DownLinearPool(d_input=64, stride=2, expand=2)
>>> input_tensor = torch.randn(10, 8, 64)  # (batch_size, sequence_length, d_input)
>>> output_tensor = down_linear_pool(input_tensor)
>>> output_tensor.shape
torch.Size([10, 4, 128])  # Downsampled length and expanded features

Raises:
- NotImplementedError – If stride or expand is greater than 1 in
- the step method. –

step(x, state, **kwargs)

Downsampling layer that applies a linear transformation with pooling.

This class implements a downsampling operation that combines a linear activation with an optional stride and expansion factor. The input sequence is rearranged according to the specified transposed mode before applying the linear transformation.

d_input

The dimensionality of the input features.

Type: int

stride

The stride factor for downsampling.

Type: int

expand

The expansion factor for upsampling.

Type: int

transposed

Indicates whether to apply the operation in a transposed manner.

Type: bool

linear

The linear transformation layer.

Type: LinearActivation
Parameters:
- d_input (int) – The input feature dimension.
- stride (int , optional) – The downsampling stride. Default is 1.
- expand (int , optional) – The upsampling expansion factor. Default is 1.
- transposed (bool , optional) – If True, the operation is transposed. Default is True.
Returns: The transformed output tensor after downsampling.
Return type: Tensor
Raises:NotImplementedError – If the stride or expand factors are greater than 1 in the step method.

########### Examples

>>> down_pool = DownLinearPool(d_input=64, stride=2, expand=1)
>>> x = torch.randn(10, 5, 64)  # (batch_size, length, features)
>>> output = down_pool(x)
>>> print(output.shape)  # Output shape will be (10, 3, 128)

NOTE

The input tensor should have at least three dimensions, where the last dimension corresponds to the feature dimension.