espnet2.asr.state_spaces.pool.DownLinearPool
espnet2.asr.state_spaces.pool.DownLinearPool
class espnet2.asr.state_spaces.pool.DownLinearPool(d_input, stride=1, expand=1, transposed=True)
Bases: SequenceModule
Applies linear downsampling to input sequences with trainable parameters.
This module performs downsampling of the input sequences using a linear transformation. The downsampling is achieved by rearranging the input tensor, applying a linear layer, and optionally transposing the output.
d_input
The dimensionality of the input feature vectors.
- Type: int
stride
The factor by which to downsample the input sequences.
- Type: int
expand
The factor by which to expand the output feature vectors.
- Type: int
transposed
Indicates whether to apply transposed operations.
- Type: bool
linear
A linear layer that transforms the input features.
Type: LinearActivation
Parameters:
- d_input (int) – The dimensionality of the input feature vectors.
- stride (int , optional) – The downsampling factor (default is 1).
- expand (int , optional) – The expansion factor for output features (default is 1).
- transposed (bool , optional) – Whether to use transposed operations (default is True).
Returns: The transformed output tensor after applying the linear transformation and downsampling.
Return type: Tensor
Raises:
- NotImplementedError – If stride or expand is greater than 1 in the
- step –
########### Examples
>>> down_linear_pool = DownLinearPool(d_input=64, stride=2, expand=1)
>>> input_tensor = torch.randn(10, 8, 64) # (batch_size, length, d_input)
>>> output_tensor = down_linear_pool(input_tensor)
>>> output_tensor.shape
torch.Size([10, 4, 64]) # Output shape after downsampling
NOTE
The output dimensionality is computed as d_input * expand.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
property d_output
Calculates the output dimension based on input and expand factors.
This property computes the output dimension for the DownLinearPool layer by multiplying the input dimension by the expand factor. It is useful for determining the shape of the output tensor after applying the pooling operation.
- Returns: The calculated output dimension.
- Return type: int
########### Examples
>>> down_linear_pool = DownLinearPool(d_input=64, expand=2)
>>> print(down_linear_pool.d_output)
128
forward(x)
Applies a linear transformation with downsampling on input sequences.
This module performs a downsampling operation followed by a linear transformation. The input is reshaped based on the specified stride and expanded based on the specified parameters.
d_input
The dimensionality of the input features.
- Type: int
stride
The downsampling factor along the sequence length.
- Type: int
expand
The factor by which to expand the output features.
- Type: int
transposed
Indicates whether to apply the transformation in transposed mode (affecting the input reshaping).
- Type: bool
linear
A linear layer that applies the transformation after downsampling.
Type: LinearActivation
Parameters:
- d_input (int) – Number of input features.
- stride (int) – Factor to downsample the input (default: 1).
- expand (int) – Factor to expand the output features (default: 1).
- transposed (bool) – Whether to apply the linear layer in transposed mode (default: True).
Returns: The transformed output after applying downsampling and the linear transformation.
Return type: torch.Tensor
########### Examples
>>> down_linear_pool = DownLinearPool(d_input=64, stride=2, expand=2)
>>> input_tensor = torch.randn(10, 8, 64) # (batch_size, sequence_length, d_input)
>>> output_tensor = down_linear_pool(input_tensor)
>>> output_tensor.shape
torch.Size([10, 4, 128]) # Downsampled length and expanded features
- Raises:
- NotImplementedError – If stride or expand is greater than 1 in
- the step method. –
step(x, state, **kwargs)
Downsampling layer that applies a linear transformation with pooling.
This class implements a downsampling operation that combines a linear activation with an optional stride and expansion factor. The input sequence is rearranged according to the specified transposed mode before applying the linear transformation.
d_input
The dimensionality of the input features.
- Type: int
stride
The stride factor for downsampling.
- Type: int
expand
The expansion factor for upsampling.
- Type: int
transposed
Indicates whether to apply the operation in a transposed manner.
- Type: bool
linear
The linear transformation layer.
Type: LinearActivation
Parameters:
- d_input (int) – The input feature dimension.
- stride (int , optional) – The downsampling stride. Default is 1.
- expand (int , optional) – The upsampling expansion factor. Default is 1.
- transposed (bool , optional) – If True, the operation is transposed. Default is True.
Returns: The transformed output tensor after downsampling.
Return type: Tensor
Raises:NotImplementedError – If the stride or expand factors are greater than 1 in the step method.
########### Examples
>>> down_pool = DownLinearPool(d_input=64, stride=2, expand=1)
>>> x = torch.randn(10, 5, 64) # (batch_size, length, features)
>>> output = down_pool(x)
>>> print(output.shape) # Output shape will be (10, 3, 128)
NOTE
The input tensor should have at least three dimensions, where the last dimension corresponds to the feature dimension.