espnet2.asr.postencoder.length_adaptor_postencoder.LengthAdaptorPostEncoder

About 2 min

espnet2.asr.postencoder.length_adaptor_postencoder.LengthAdaptorPostEncoder

class espnet2.asr.postencoder.length_adaptor_postencoder.LengthAdaptorPostEncoder(input_size: int, length_adaptor_n_layers: int = 0, input_layer: str | None = None, output_size: int | None = None, dropout_rate: float = 0.1, return_int_enc: bool = False)

Bases: AbsPostEncoder

Length Adaptor PostEncoder.

This class implements a Length Adaptor PostEncoder which is a component designed to adapt the length of input sequences through convolutional layers, based on the specified parameters. It is particularly useful in asynchronous speech recognition systems where input lengths may vary.

embed

The embedding layer for input processing.

Type: torch.nn.Sequential

out_sz

The output size of the encoder.

Type: int

length_adaptor

The sequential layers for length adaptation.

Type: torch.nn.Sequential

length_adaptor

_ratio

The ratio by which input lengths are adjusted.

Type: int

return_int_enc

A flag to determine if the integer encoding should be returned.

Type: bool
Parameters:
- input_size (int) – The size of the input features.
- length_adaptor_n_layers (int , optional) – The number of convolutional layers in the length adaptor. Defaults to 0.
- input_layer (Optional *[*str ] , optional) – Type of input layer (‘linear’ or None). Defaults to None.
- output_size (Optional *[*int ] , optional) – The size of the output features if input_layer is ‘linear’. Defaults to None.
- dropout_rate (float , optional) – The dropout rate for regularization. Defaults to 0.1.
- return_int_enc (bool , optional) – Whether to return integer encoding. Defaults to False.
Returns: The adapted input tensor and the updated lengths of the input sequences.
Return type: Tuple[torch.Tensor, torch.Tensor]
Raises:TooShortUttError – If the input sequence length is shorter than the required length for subsampling.

######### Examples

>>> post_encoder = LengthAdaptorPostEncoder(input_size=128,
...                                          length_adaptor_n_layers=2,
...                                          output_size=256)
>>> input_tensor = torch.randn(10, 50, 128)  # (batch_size, seq_len, features)
>>> input_lengths = torch.tensor([50] * 10)  # Lengths of each input
>>> output, new_lengths = post_encoder(input_tensor, input_lengths)
>>> print(output.shape)  # Should reflect the adapted length
>>> print(new_lengths)  # Updated lengths after adaptation

####### NOTE This implementation follows the design described in the paper “Length Adaptor for End-to-End ASR” (ACL 2021).

Initialize the module.

forward(input: Tensor, input_lengths: Tensor) → Tuple[Tensor, Tensor]

Forward pass through the LengthAdaptorPostEncoder.

This method takes an input tensor and its corresponding lengths, applies the embedding layer if specified, processes the input through the length adaptor, and returns the transformed input along with updated lengths.

Parameters:
- input (torch.Tensor) – Input tensor of shape (batch_size, input_size, sequence_length).
- input_lengths (torch.Tensor) – Tensor of shape (batch_size,) containing the lengths of each input sequence.
Returns: A tuple containing: : - output (torch.Tensor): Transformed output tensor of shape : (batch_size, output_size, new_sequence_length).
- output_lengths (torch.Tensor): Updated lengths of the output : sequences.
Return type: Tuple[torch.Tensor, torch.Tensor]
Raises:TooShortUttError – If the input sequence is shorter than the required length for subsampling.

######### Examples

>>> encoder = LengthAdaptorPostEncoder(input_size=128,
...                                     length_adaptor_n_layers=2)
>>> input_tensor = torch.randn(10, 128, 20)  # batch_size=10
>>> input_lengths = torch.tensor([20] * 10)  # all sequences have length 20
>>> output, output_lengths = encoder.forward(input_tensor, input_lengths)

####### NOTE The length adaptor reduces the sequence length by a factor of 2 ** length_adaptor_n_layers. Ensure that the input sequences are sufficiently long to avoid raising the TooShortUttError.

output_size() → int

Get the output size.

This method returns the output size of the LengthAdaptorPostEncoder, which is determined during initialization. The output size is either set explicitly through the output_size parameter or defaults to the input_size if no embedding layer is used.

Returns: The output size of the encoder.
Return type: int

######### Examples

>>> encoder = LengthAdaptorPostEncoder(input_size=256, output_size=128)
>>> encoder.output_size()
128

>>> encoder_no_embed = LengthAdaptorPostEncoder(input_size=256)
>>> encoder_no_embed.output_size()
256

####### NOTE The output size is important for downstream tasks and should be configured based on the model architecture requirements.