espnet2.asr_transducer.encoder.modules.positional_encoding.RelPositionalEncoding
espnet2.asr_transducer.encoder.modules.positional_encoding.RelPositionalEncoding
class espnet2.asr_transducer.encoder.modules.positional_encoding.RelPositionalEncoding(size: int, dropout_rate: float = 0.0, max_len: int = 5000)
Bases: Module
Relative positional encoding module for sequence processing.
This module implements relative positional encoding, which enhances the performance of attention mechanisms in sequence models by providing contextual information about the position of elements in the input sequences.
size
The dimensionality of the positional encoding.
- Type: int
pe
The computed positional encodings.
- Type: torch.Tensor
dropout
The dropout layer applied to the positional encodings.
Type: torch.nn.Dropout
Parameters:
- size (int) – Module size, representing the dimensionality of the positional encoding.
- max_len (int) – Maximum length of input sequences for which positional encodings will be computed.
- dropout_rate (float , optional) – Dropout rate applied to the output positional encodings. Default is 0.0.
extend_pe(x
torch.Tensor, left_context: int = 0) -> None: Resets the positional encoding based on the input sequences.
forward(x
torch.Tensor, left_context: int = 0) -> torch.Tensor: Computes the positional encoding for the given input sequences.
######### Examples
Create a relative positional encoding module
rpe = RelPositionalEncoding(size=128, dropout_rate=0.1, max_len=5000)
Input tensor of shape (B, T, ?)
input_tensor = torch.randn(32, 100, 128)
Get the positional encoding
pos_enc = rpe(input_tensor, left_context=10) print(pos_enc.shape) # Output shape will be (B, 2 * (T - 1), ?)
####### NOTE The extend_pe method should be called before computing the forward pass to ensure the positional encodings are appropriately sized for the input.
Construct a RelativePositionalEncoding object.
#
extend_pe(x
Positional encoding modules.
This module implements relative positional encoding, which is commonly used in transformer architectures to provide information about the position of tokens in a sequence.
size
The size of the positional encoding.
- Type: int
pe
The tensor holding the positional encodings.
- Type: torch.Tensor
dropout
The dropout layer applied to the positional encodings.
Type: torch.nn.Dropout
Parameters:
- size (int) – Module size.
- max_len (int) – Maximum input length.
- dropout_rate (float) – Dropout rate.
extend_pe(x
torch.Tensor, left_context: int = 0) -> None: Resets the positional encoding based on the input tensor.
forward(x
torch.Tensor, left_context: int = 0) -> torch.Tensor: Computes the positional encoding for the input tensor.
######### Examples
Create an instance of RelPositionalEncoding
rel_pos_enc = RelPositionalEncoding(size=64, dropout_rate=0.1, max_len=5000)
Generate a random input tensor of shape (batch_size, seq_len, features)
input_tensor = torch.randn(32, 100, 64)
Compute the positional encoding
pos_enc = rel_pos_enc(input_tensor, left_context=10)
####### NOTE The extend_pe method is called internally in the forward method to ensure the positional encodings are updated based on the input tensor.
#
forward(x
Compute positional encoding.
This method generates the positional encoding for the input sequences, utilizing relative positional encoding to enhance the model’s ability to attend to previous elements in the input.
- Parameters:
- x – Input sequences of shape (B, T, ?), where B is the batch size,
- length (T is the sequence)
- dimensions. (and ? represents any additional)
- left_context – Number of previous frames the attention module can see in the current chunk. This is used to determine the size of the positional encoding.
- Returns: Positional embedding sequences of shape (B, 2 * (T - 1), ?), : which incorporates both positive and negative positional encodings.
- Return type: pos_enc
######### Examples
>>> rel_pos_enc = RelPositionalEncoding(size=128)
>>> input_tensor = torch.randn(10, 20, 128) # Batch of 10, seq len 20
>>> output = rel_pos_enc.forward(input_tensor, left_context=5)
>>> output.shape
torch.Size([10, 39, 128]) # Output shape will vary based on left_context
####### NOTE The method uses the extend_pe function to ensure that the positional encodings are correctly sized for the input sequences before applying the dropout.
- Raises:ValueError – If the input tensor x does not have the expected shape.