espnet2.asr.encoder.linear_encoder.LinearEncoder
espnet2.asr.encoder.linear_encoder.LinearEncoder
class espnet2.asr.encoder.linear_encoder.LinearEncoder(input_size: int, output_size: int = 256, dropout_rate: float = 0.1, input_layer: str | None = 'conv2d', normalize_before: bool = True, padding_idx: int = -1)
Bases: AbsEncoder
Linear encoder module for processing input features.
This class implements a linear encoder that can use various input layer types, such as linear layers or convolutional subsampling layers. It applies an embedding operation to the input tensor and optionally normalizes the output before passing it to subsequent layers in a neural network.
_output_size
The dimension of the output features.
- Type: int
embed
The embedding layer, which can be a linear layer, convolutional subsampling, or embedding layer.
- Type: torch.nn.Module
normalize_before
Flag indicating whether to apply layer normalization before the first block.
- Type: bool
after_norm
Layer normalization applied to the output if normalize_before is True.
Type:LayerNorm
Parameters:
- input_size (int) – The dimension of the input features.
- output_size (int , optional) – The dimension of the output features. Defaults to 256.
- dropout_rate (float , optional) – The dropout rate for regularization. Defaults to 0.1.
- input_layer (str , optional) – The type of input layer to use. Can be one of [‘linear’, ‘conv2d’, ‘conv2d2’, ‘conv2d6’, ‘conv2d8’, ‘embed’]. Defaults to ‘conv2d’.
- normalize_before (bool , optional) – Whether to apply layer normalization before the first block. Defaults to True.
- padding_idx (int , optional) – The index for padding when using an embedding layer. Defaults to -1.
Raises:ValueError – If an unknown input_layer type is specified.
######### Examples
Initialize a linear encoder with a linear input layer:
encoder = LinearEncoder(input_size=128, output_size=256, : input_layer=’linear’)
Forward pass through the encoder:
xs_pad = torch.randn(10, 20, 128) # (B, L, D) ilens = torch.tensor([20] * 10) # Input lengths output, olens, _ = encoder(xs_pad, ilens)
NOTE
The encoder expects the input tensor to be of shape (B, L, D) where B is the batch size, L is the sequence length, and D is the input dimension.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(xs_pad: Tensor, ilens: Tensor, prev_states: Tensor | None = None) → Tuple[Tensor, Tensor, Tensor | None]
Embed positions in tensor.
This method processes the input tensor xs_pad by applying the embedding layer defined during the initialization of the LinearEncoder. It also handles padding masks based on the input lengths.
- Parameters:
- xs_pad (torch.Tensor) – Input tensor of shape (B, L, D), where B is the batch size, L is the sequence length, and D is the dimension of the input features.
- ilens (torch.Tensor) – A tensor containing the lengths of each input sequence in the batch. Shape (B).
- prev_states (torch.Tensor , optional) – Not used currently. Defaults to None.
- Returns: A tuple containing:
- The position-embedded tensor of shape (B, L, output_size).
- A tensor representing the output lengths of shape (B).
- An optional tensor which is currently set to None.
- Return type: Tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]]
- Raises:TooShortUttError – If the input tensor is too short for the subsampling method being used.
######### Examples
>>> encoder = LinearEncoder(input_size=128, output_size=256)
>>> xs_pad = torch.randn(10, 20, 128) # Batch of 10, 20 timesteps, 128 features
>>> ilens = torch.tensor([20, 20, 20, 20, 20, 20, 20, 20, 20, 20]) # All lengths are 20
>>> output, olens, _ = encoder.forward(xs_pad, ilens)
>>> print(output.shape) # Should be (10, 20, 256)
>>> print(olens.shape) # Should be (10,)
output_size() → int
Returns the output size of the linear encoder.
This method provides the dimension of the output produced by the encoder, which is defined during the initialization of the LinearEncoder class.
- Returns: The output size of the encoder.
- Return type: int
######### Examples
encoder = LinearEncoder(input_size=128, output_size=256) size = encoder.output_size() # size will be 256