espnet2.asr.postencoder.length_adaptor_postencoder.LengthAdaptorPostEncoder
espnet2.asr.postencoder.length_adaptor_postencoder.LengthAdaptorPostEncoder
class espnet2.asr.postencoder.length_adaptor_postencoder.LengthAdaptorPostEncoder(input_size: int, length_adaptor_n_layers: int = 0, input_layer: str | None = None, output_size: int | None = None, dropout_rate: float = 0.1, return_int_enc: bool = False)
Bases: AbsPostEncoder
Length Adaptor PostEncoder.
This class implements a Length Adaptor PostEncoder which is a component designed to adapt the length of input sequences through convolutional layers, based on the specified parameters. It is particularly useful in asynchronous speech recognition systems where input lengths may vary.
embed
The embedding layer for input processing.
- Type: torch.nn.Sequential
out_sz
The output size of the encoder.
- Type: int
length_adaptor
The sequential layers for length adaptation.
- Type: torch.nn.Sequential
length_adaptor
The ratio by which input lengths are adjusted.
- Type: int
return_int_enc
A flag to determine if the integer encoding should be returned.
Type: bool
Parameters:
- input_size (int) – The size of the input features.
- length_adaptor_n_layers (int , optional) – The number of convolutional layers in the length adaptor. Defaults to 0.
- input_layer (Optional *[*str ] , optional) – Type of input layer (‘linear’ or None). Defaults to None.
- output_size (Optional *[*int ] , optional) – The size of the output features if input_layer is ‘linear’. Defaults to None.
- dropout_rate (float , optional) – The dropout rate for regularization. Defaults to 0.1.
- return_int_enc (bool , optional) – Whether to return integer encoding. Defaults to False.
Returns: The adapted input tensor and the updated lengths of the input sequences.
Return type: Tuple[torch.Tensor, torch.Tensor]
Raises:TooShortUttError – If the input sequence length is shorter than the required length for subsampling.
######### Examples
>>> post_encoder = LengthAdaptorPostEncoder(input_size=128,
... length_adaptor_n_layers=2,
... output_size=256)
>>> input_tensor = torch.randn(10, 50, 128) # (batch_size, seq_len, features)
>>> input_lengths = torch.tensor([50] * 10) # Lengths of each input
>>> output, new_lengths = post_encoder(input_tensor, input_lengths)
>>> print(output.shape) # Should reflect the adapted length
>>> print(new_lengths) # Updated lengths after adaptation
####### NOTE This implementation follows the design described in the paper “Length Adaptor for End-to-End ASR” (ACL 2021).
Initialize the module.
forward(input: Tensor, input_lengths: Tensor) → Tuple[Tensor, Tensor]
Forward pass through the LengthAdaptorPostEncoder.
This method takes an input tensor and its corresponding lengths, applies the embedding layer if specified, processes the input through the length adaptor, and returns the transformed input along with updated lengths.
- Parameters:
- input (torch.Tensor) – Input tensor of shape (batch_size, input_size, sequence_length).
- input_lengths (torch.Tensor) – Tensor of shape (batch_size,) containing the lengths of each input sequence.
- Returns: A tuple containing: : - output (torch.Tensor): Transformed output tensor of shape : (batch_size, output_size, new_sequence_length).
- output_lengths (torch.Tensor): Updated lengths of the output : sequences.
- Return type: Tuple[torch.Tensor, torch.Tensor]
- Raises:TooShortUttError – If the input sequence is shorter than the required length for subsampling.
######### Examples
>>> encoder = LengthAdaptorPostEncoder(input_size=128,
... length_adaptor_n_layers=2)
>>> input_tensor = torch.randn(10, 128, 20) # batch_size=10
>>> input_lengths = torch.tensor([20] * 10) # all sequences have length 20
>>> output, output_lengths = encoder.forward(input_tensor, input_lengths)
####### NOTE The length adaptor reduces the sequence length by a factor of 2 ** length_adaptor_n_layers. Ensure that the input sequences are sufficiently long to avoid raising the TooShortUttError.
output_size() → int
Get the output size.
This method returns the output size of the LengthAdaptorPostEncoder, which is determined during initialization. The output size is either set explicitly through the output_size parameter or defaults to the input_size if no embedding layer is used.
- Returns: The output size of the encoder.
- Return type: int
######### Examples
>>> encoder = LengthAdaptorPostEncoder(input_size=256, output_size=128)
>>> encoder.output_size()
128
>>> encoder_no_embed = LengthAdaptorPostEncoder(input_size=256)
>>> encoder_no_embed.output_size()
256
####### NOTE The output size is important for downstream tasks and should be configured based on the model architecture requirements.