espnet2.asr_transducer.encoder.modules.positional_encoding.RelPositionalEncoding

About 2 min

espnet2.asr_transducer.encoder.modules.positional_encoding.RelPositionalEncoding

class espnet2.asr_transducer.encoder.modules.positional_encoding.RelPositionalEncoding(size: int, dropout_rate: float = 0.0, max_len: int = 5000)

Bases: Module

Relative positional encoding module for sequence processing.

This module implements relative positional encoding, which enhances the performance of attention mechanisms in sequence models by providing contextual information about the position of elements in the input sequences.

size

The dimensionality of the positional encoding.

Type: int

The computed positional encodings.

Type: torch.Tensor

dropout

The dropout layer applied to the positional encodings.

Type: torch.nn.Dropout
Parameters:
- size (int) – Module size, representing the dimensionality of the positional encoding.
- max_len (int) – Maximum length of input sequences for which positional encodings will be computed.
- dropout_rate (float , optional) – Dropout rate applied to the output positional encodings. Default is 0.0.

extend_pe(x

torch.Tensor, left_context: int = 0) -> None: Resets the positional encoding based on the input sequences.

forward(x

torch.Tensor, left_context: int = 0) -> torch.Tensor: Computes the positional encoding for the given input sequences.

######### Examples

Create a relative positional encoding module

rpe = RelPositionalEncoding(size=128, dropout_rate=0.1, max_len=5000)

Input tensor of shape (B, T, ?)

input_tensor = torch.randn(32, 100, 128)

Get the positional encoding

pos_enc = rpe(input_tensor, left_context=10) print(pos_enc.shape) # Output shape will be (B, 2 * (T - 1), ?)

####### NOTE The extend_pe method should be called before computing the forward pass to ensure the positional encodings are appropriately sized for the input.

Construct a RelativePositionalEncoding object.

extend_pe(x

: Tensor, left_context: int = 0) → None

Positional encoding modules.

This module implements relative positional encoding, which is commonly used in transformer architectures to provide information about the position of tokens in a sequence.

size

The size of the positional encoding.

Type: int

The tensor holding the positional encodings.

Type: torch.Tensor

dropout

The dropout layer applied to the positional encodings.

Type: torch.nn.Dropout
Parameters:
- size (int) – Module size.
- max_len (int) – Maximum input length.
- dropout_rate (float) – Dropout rate.

extend_pe(x

torch.Tensor, left_context: int = 0) -> None: Resets the positional encoding based on the input tensor.

forward(x

torch.Tensor, left_context: int = 0) -> torch.Tensor: Computes the positional encoding for the input tensor.

######### Examples

Create an instance of RelPositionalEncoding

rel_pos_enc = RelPositionalEncoding(size=64, dropout_rate=0.1, max_len=5000)

Generate a random input tensor of shape (batch_size, seq_len, features)

input_tensor = torch.randn(32, 100, 64)

Compute the positional encoding

pos_enc = rel_pos_enc(input_tensor, left_context=10)

####### NOTE The extend_pe method is called internally in the forward method to ensure the positional encodings are updated based on the input tensor.

forward(x

: Tensor, left_context: int = 0) → Tensor

Compute positional encoding.

This method generates the positional encoding for the input sequences, utilizing relative positional encoding to enhance the model’s ability to attend to previous elements in the input.

Parameters:
- x – Input sequences of shape (B, T, ?), where B is the batch size,
- length (T is the sequence)
- dimensions. (and ? represents any additional)
- left_context – Number of previous frames the attention module can see in the current chunk. This is used to determine the size of the positional encoding.
Returns: Positional embedding sequences of shape (B, 2 * (T - 1), ?), : which incorporates both positive and negative positional encodings.
Return type: pos_enc

######### Examples

>>> rel_pos_enc = RelPositionalEncoding(size=128)
>>> input_tensor = torch.randn(10, 20, 128)  # Batch of 10, seq len 20
>>> output = rel_pos_enc.forward(input_tensor, left_context=5)
>>> output.shape
torch.Size([10, 39, 128])  # Output shape will vary based on left_context

####### NOTE The method uses the extend_pe function to ensure that the positional encodings are correctly sized for the input sequences before applying the dropout.

Raises:ValueError – If the input tensor x does not have the expected shape.