espnet2.asr.state_spaces.model.SequenceModel

About 4 min

espnet2.asr.state_spaces.model.SequenceModel

class espnet2.asr.state_spaces.model.SequenceModel(d_model, n_layers=1, transposed=False, dropout=0.0, tie_dropout=False, prenorm=True, n_repeat=1, layer=None, residual=None, norm=None, pool=None, track_norms=True, dropinp=0.0, drop_path=0.0)

Bases: SequenceModule

Isotropic deep sequence model backbone, inspired by ResNets and Transformers.

The SequenceModel class implements a generic transformation from (batch, length, d_input) to (batch, length, d_output). This model can be configured with various parameters to adjust its architecture and behavior.

d_model

Dimensionality of the input features.

Type: int

transposed

If True, input tensors are transposed.

Type: bool

track_norms

If True, logs the norms of each layer output.

Type: bool

drop

Dropout layer applied to the inputs.

Type: nn.Module

layers

List of sequential residual blocks.

Type: nn.ModuleList

norm

Normalization layer applied at the end, if specified.

Type: nn.Module

d_output

Dimensionality of the output features.

Type: int
Parameters:
- d_model (int) – The dimensionality of the model input.
- n_layers (int) – Number of layers in the model. Default is 1.
- transposed (bool) – If True, transposes the input shape. Default is False.
- dropout (float) – Dropout rate applied on each residual connection. Default is 0.0.
- tie_dropout (bool) – If True, ties dropout mask across sequence like nn.Dropout1d/nn.Dropout2d. Default is False.
- prenorm (bool) – If True, applies normalization before the layer. Default is True.
- n_repeat (int) – Number of times each layer is repeated before pooling. Default is 1.
- layer (dict or list) – Configuration for the layers. Must be specified.
- residual (dict) – Configuration for the residual connections.
- norm (dict or str) – Normalization configuration (e.g. ‘layer’, ‘batch’).
- pool (dict) – Configuration for pooling layer per stage.
- track_norms (bool) – If True, tracks and logs the norms of each layer output. Default is True.
- dropinp (float) – Dropout rate applied to inputs. Default is 0.0.
- drop_path (float) – Stochastic depth for each residual path. Default is 0.0.
Returns: A tuple containing: : - outputs (torch.Tensor): The output tensor of shape (batch, length, d_output).
- next_states (list): The updated states after processing through layers.
Return type: tuple
Raises:ValueError – If the layer configuration is invalid.

############# Examples

>>> model = SequenceModel(d_model=128, n_layers=4, layer=[{'type': 'conv'}])
>>> inputs = torch.randn(32, 10, 128)  # (batch, length, d_input)
>>> outputs, states = model(inputs)
>>> print(outputs.shape)
torch.Size([32, 10, d_output])  # d_output depends on layer configuration

######### NOTE This model can be used for various sequence modeling tasks such as automatic speech recognition (ASR) and other sequence-based applications.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

property d_state

Return dimension of output of self.state_to_tensor.

default_state(*batch_shape, device=None)

Generate the default state for each layer in the sequence model.

This method creates an initial state for each layer based on the specified batch shape and device. The default state can be used as a starting point for processing inputs through the model.

Parameters:
- *batch_shape – Variable length argument for the shape of the batch. This should typically represent the dimensions of the input sequence excluding the last dimension, which is the feature dimension.
- device (torch.device , optional) – The device on which to create the state tensors. If not specified, the default device will be used.
Returns: A list containing the default state tensors for each layer. : Each tensor’s shape is determined by the layer’s internal state configuration and the provided batch shape.
Return type: list

############# Examples

>>> model = SequenceModel(d_model=128, n_layers=3)
>>> default_states = model.default_state(10, 20)  # For a batch size of 10
>>> print([state.shape for state in default_states])
[torch.Size([10, ...]), torch.Size([10, ...]), torch.Size([10, ...])]

######### NOTE The shapes of the returned state tensors depend on the individual layer configurations and the specified batch shape. Each layer may have a different state shape based on its design.

Raises:ValueError – If the batch shape is invalid or incompatible with the model’s architecture.

forward(inputs, *args, state=None, **kwargs)

Forward pass for the SequenceModel, which processes the input tensor through the defined layers and applies normalization if specified.

This method assumes that the input tensor is shaped as (batch, sequence, dim) and applies dropout, layers, and normalization sequentially.

Parameters:
- inputs (torch.Tensor) – The input tensor of shape (batch, sequence, dim).
- *args – Additional positional arguments passed to each layer.
- state (list , optional) – A list of previous states for each layer. If None, initializes to a list of None.
- **kwargs – Additional keyword arguments passed to each layer.
Returns: A tuple containing: : - outputs (torch.Tensor): The output tensor after processing, shaped as (batch, sequence, d_output).
- next_states (list): A list of states for each layer after processing.
Return type: tuple
Raises:
- ValueError – If the input tensor does not match the expected
- shape. –

############# Examples

>>> model = SequenceModel(d_model=128, n_layers=3)
>>> inputs = torch.randn(32, 10, 128)  # (batch, sequence, dim)
>>> outputs, states = model(inputs)

######### NOTE The method tracks the norms of outputs at each layer if track_norms is set to True, which can be accessed via the metrics attribute after the forward pass.

property state_to_tensor

Convert the state of each layer into a tensor representation.

This method iterates through the layers of the sequence model, calling each layer’s state_to_tensor method on the corresponding state. It concatenates the resulting tensors along the last dimension to produce a single tensor that represents the entire state of the model.

Parameters:state (list) – A list of states, one for each layer in the model.
Returns: A tensor containing the concatenated states of all layers.
Return type: torch.Tensor

############# Examples

>>> model = SequenceModel(d_model=128, n_layers=2)
>>> states = model.default_state(batch_shape=(10,), device='cpu')
>>> tensor_representation = model.state_to_tensor(states)
>>> print(tensor_representation.shape)
torch.Size([10, d_output])  # where d_output is the concatenated dimension

######### NOTE This method assumes that each layer’s state_to_tensor method is implemented and returns a tensor. If any layer returns None, it will be excluded from the concatenation.

step(x, state, **kwargs)

Processes a single time step of input through the model layers.

This method applies each layer of the sequence model to the input tensor x and updates the hidden states. It is typically used in scenarios where the model needs to process input sequentially, such as in recurrent architectures or during inference.

Parameters:
- x (torch.Tensor) – Input tensor of shape (batch_size, d_input).
- state (list) – List of previous states for each layer, or None if no state is to be used.
- **kwargs – Additional keyword arguments passed to each layer’s step method.
Returns: A tuple containing: : - torch.Tensor: The output tensor after processing through all layers of shape (batch_size, d_output).
- list: Updated list of states for each layer.
Return type: tuple

############# Examples

>>> model = SequenceModel(d_model=64, n_layers=2)
>>> input_tensor = torch.randn(32, 10, 64)  # (batch_size, seq_len, d_input)
>>> initial_state = model.default_state(32)  # Default state for the batch
>>> output, next_state = model.step(input_tensor, initial_state)

######### NOTE This method is designed to work with the assumption that the input x is formatted as (batch_size, d_input).