espnet2.asr.state_spaces.pool.UpPool

About 4 min

espnet2.asr.state_spaces.pool.UpPool

class espnet2.asr.state_spaces.pool.UpPool(d_input, d_output, stride, transposed=True, weight_norm=True, initializer=None, activation=None)

Bases: SequenceModule

Upsampling layer that applies a linear transformation followed by reshaping.

This class implements an upsampling operation that uses a linear layer to transform the input tensor, followed by reshaping the output tensor based on the specified stride and transposition settings. The upsampling process can be configured to include a skip connection, allowing for the addition of previous activations.

d_input

The number of input features.

Type: int

_d_output

The number of output features after upsampling.

Type: int

stride

The factor by which the input is upsampled.

Type: int

transposed

Indicates if the transformation should be transposed.

Type: bool

linear

A linear activation layer for feature transformation.

Type: LinearActivation
Parameters:
- d_input (int) – The number of input features.
- d_output (int) – The number of output features after upsampling.
- stride (int) – The upsampling factor.
- transposed (bool) – Whether to apply the transformation in transposed mode.
- weight_norm (bool) – If True, applies weight normalization to the linear layer.
- initializer (callable) – Function to initialize the weights of the linear layer.
- activation (callable) – Activation function to apply after the linear layer.
Returns: The upsampled tensor and None (for compatibility).
Return type: Tuple[Tensor, None]

############# Examples

>>> up_pool = UpPool(d_input=16, d_output=32, stride=2)
>>> x = torch.randn(10, 5, 16)  # Batch of 10, sequence length 5
>>> output, _ = up_pool(x)
>>> output.shape
torch.Size([10, 5, 32])  # Output shape reflects upsampling

######## NOTE The upsampling operation shifts the tensor to ensure causality during the transformation, which is particularly important in sequence models.

Raises:AssertionError – If the state is empty during the step function.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

property d_output

The output dimension of the layer.

This property computes the output dimension based on the input dimension and the expansion factor. Specifically, it multiplies the input dimension by the expansion factor set during the initialization of the layer.

Returns: The calculated output dimension, which is equal to self.d_input * self.expand.
Return type: int

############# Examples

>>> upsample_layer = UpSample(d_input=64, stride=2, expand=3)
>>> upsample_layer.d_output
64 // 3  # Assuming d_input is 64 and expand is 3.

Type: int

default_state(*batch_shape, device=None)

Create the default state for the UpPool module.

This method initializes the state used during the forward pass of the UpPool layer. The state is represented as a list of tensors, where each tensor corresponds to a time step in the sequence. The dimensions of the state are determined by the input batch shape and the output dimension of the layer.

Parameters:
- *batch_shape – Variable length argument list that defines the shape of the input batch. This is typically the shape of the input data excluding the sequence length and feature dimensions.
- device (torch.device , optional) – The device on which to create the state tensors. If None, the state tensors will be created on the same device as the input data.
Returns: A list of tensors initialized to zeros, representing the : default state of the UpPool module. Each tensor in the list has the shape defined by the output dimension of the layer and the stride used for the upsampling.
Return type: list

############# Examples

>>> up_pool = UpPool(d_input=128, d_output=256, stride=2)
>>> state = up_pool.default_state(32, device='cuda')
>>> len(state)
2
>>> state[0].shape
torch.Size([32, 256])  # Example output shape based on d_output

######## NOTE The state is designed to be used in conjunction with the step method, which processes one time step at a time in a recurrent manner.

forward(x, skip=None)

Upsampling layer for sequence data.

This layer takes an input tensor and applies a linear transformation, followed by an upsampling operation. The upsampling can be performed in either a transposed or non-transposed manner, depending on the configuration. Additionally, it allows for skip connections to be added to the output.

d_input

The dimensionality of the input feature.

Type: int

_d_output

The dimensionality of the output feature.

Type: int

stride

The upsampling factor, determines how much the input is expanded.

Type: int

transposed

Whether to apply the transposed operation.

Type: bool

linear

A linear layer used to transform the input.

Type: LinearActivation
Parameters:
- d_input (int) – The dimensionality of the input feature.
- d_output (int) – The dimensionality of the output feature.
- stride (int) – The upsampling factor.
- transposed (bool) – Whether to apply the transposed operation.
- weight_norm (bool) – Whether to apply weight normalization.
- initializer (callable , optional) – A function to initialize weights.
- activation (callable , optional) – Activation function to apply.
Returns: The upsampled tensor and a None placeholder for the state.
Return type: tuple

############# Examples

>>> upsample_layer = UpPool(d_input=64, d_output=128, stride=2)
>>> input_tensor = torch.randn(10, 20, 64)  # (batch, length, feature)
>>> output_tensor, _ = upsample_layer(input_tensor)
>>> output_tensor.shape
torch.Size([10, 20, 128])  # Up sampled output shape

######## NOTE The output size is determined by the stride parameter and the linear transformation applied to the input.

Raises:AssertionError – If the state is empty during the step method.

step(x, state, **kwargs)

Upsampling layer with linear transformation and optional skip connections.

This class implements an upsampling mechanism for sequences, applying a linear transformation to increase the sequence length. It can also utilize skip connections to add additional features to the output. The upsampling is performed by repeating the input features according to the specified stride.

d_input

The number of input features.

Type: int

_d_output

The number of output features after upsampling.

Type: int

stride

The factor by which to upsample the input.

Type: int

transposed

If True, applies a transposed operation during upsampling.

Type: bool

linear

A linear transformation applied to the input.

Type: LinearActivation
Parameters:
- d_input (int) – Number of input features.
- d_output (int) – Number of output features after upsampling.
- stride (int) – The factor by which to upsample the input.
- transposed (bool , optional) – If True, applies a transposed operation during upsampling. Defaults to True.
- weight_norm (bool , optional) – If True, applies weight normalization to the linear layer. Defaults to True.
- initializer (callable , optional) – Custom initializer for the linear layer. Defaults to None.
- activation (callable , optional) – Activation function to be applied after the linear layer. Defaults to None.
Returns: A tuple containing the upsampled output and None.
Return type: tuple

############# Examples

>>> up_pool = UpPool(d_input=64, d_output=128, stride=2)
>>> x = torch.randn(32, 10, 64)  # Batch of 32, sequence length of 10
>>> output, _ = up_pool(x)
>>> output.shape
torch.Size([32, 20, 128])  # Output shape after upsampling

######## NOTE The output shape is affected by the stride and whether the operation is transposed. If transposed is True, the operation applies transformations accordingly to maintain causality in the sequence.

Raises:AssertionError – If the state is not properly initialized or if incorrect input dimensions are provided.