espnet2.asr.encoder.linear_encoder.LinearEncoder

About 2 min

espnet2.asr.encoder.linear_encoder.LinearEncoder

class espnet2.asr.encoder.linear_encoder.LinearEncoder(input_size: int, output_size: int = 256, dropout_rate: float = 0.1, input_layer: str | None = 'conv2d', normalize_before: bool = True, padding_idx: int = -1)

Bases: AbsEncoder

Linear encoder module for processing input features.

This class implements a linear encoder that can use various input layer types, such as linear layers or convolutional subsampling layers. It applies an embedding operation to the input tensor and optionally normalizes the output before passing it to subsequent layers in a neural network.

_output_size

The dimension of the output features.

Type: int

embed

The embedding layer, which can be a linear layer, convolutional subsampling, or embedding layer.

Type: torch.nn.Module

normalize_before

Flag indicating whether to apply layer normalization before the first block.

Type: bool

after_norm

Layer normalization applied to the output if normalize_before is True.

Type:LayerNorm
Parameters:
- input_size (int) – The dimension of the input features.
- output_size (int , optional) – The dimension of the output features. Defaults to 256.
- dropout_rate (float , optional) – The dropout rate for regularization. Defaults to 0.1.
- input_layer (str , optional) – The type of input layer to use. Can be one of [‘linear’, ‘conv2d’, ‘conv2d2’, ‘conv2d6’, ‘conv2d8’, ‘embed’]. Defaults to ‘conv2d’.
- normalize_before (bool , optional) – Whether to apply layer normalization before the first block. Defaults to True.
- padding_idx (int , optional) – The index for padding when using an embedding layer. Defaults to -1.
Raises:ValueError – If an unknown input_layer type is specified.

######### Examples

Initialize a linear encoder with a linear input layer:

encoder = LinearEncoder(input_size=128, output_size=256, : input_layer=’linear’)

Forward pass through the encoder:

xs_pad = torch.randn(10, 20, 128) # (B, L, D) ilens = torch.tensor([20] * 10) # Input lengths output, olens, _ = encoder(xs_pad, ilens)

NOTE

The encoder expects the input tensor to be of shape (B, L, D) where B is the batch size, L is the sequence length, and D is the input dimension.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(xs_pad: Tensor, ilens: Tensor, prev_states: Tensor | None = None) → Tuple[Tensor, Tensor, Tensor | None]

Embed positions in tensor.

This method processes the input tensor xs_pad by applying the embedding layer defined during the initialization of the LinearEncoder. It also handles padding masks based on the input lengths.

Parameters:
- xs_pad (torch.Tensor) – Input tensor of shape (B, L, D), where B is the batch size, L is the sequence length, and D is the dimension of the input features.
- ilens (torch.Tensor) – A tensor containing the lengths of each input sequence in the batch. Shape (B).
- prev_states (torch.Tensor , optional) – Not used currently. Defaults to None.
Returns: A tuple containing:
- The position-embedded tensor of shape (B, L, output_size).
- A tensor representing the output lengths of shape (B).
- An optional tensor which is currently set to None.
Return type: Tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]]
Raises:TooShortUttError – If the input tensor is too short for the subsampling method being used.

######### Examples

>>> encoder = LinearEncoder(input_size=128, output_size=256)
>>> xs_pad = torch.randn(10, 20, 128)  # Batch of 10, 20 timesteps, 128 features
>>> ilens = torch.tensor([20, 20, 20, 20, 20, 20, 20, 20, 20, 20])  # All lengths are 20
>>> output, olens, _ = encoder.forward(xs_pad, ilens)
>>> print(output.shape)  # Should be (10, 20, 256)
>>> print(olens.shape)  # Should be (10,)

output_size() → int

Returns the output size of the linear encoder.

This method provides the dimension of the output produced by the encoder, which is defined during the initialization of the LinearEncoder class.

Returns: The output size of the encoder.
Return type: int

######### Examples

encoder = LinearEncoder(input_size=128, output_size=256) size = encoder.output_size() # size will be 256