espnet2.asr_transducer.decoder.modules.mega.positional_bias.RotaryRelativePositionBias
espnet2.asr_transducer.decoder.modules.mega.positional_bias.RotaryRelativePositionBias
class espnet2.asr_transducer.decoder.modules.mega.positional_bias.RotaryRelativePositionBias(size: int, max_positions: int = 2048)
Bases: Module
RotaryRelativePositionBias module definition.
This module computes rotary relative position biases using sinusoidal positional embeddings. It is designed to enhance the performance of transformer models by providing a mechanism to capture relative positional information.
- Parameters:
- size – Module embedding size.
- max_positions – Maximum number of relative positions (default is 2048).
sine
Sine components of the sinusoidal embeddings.
cosine
Cosine components of the sinusoidal embeddings.
alpha
Learnable parameter representing one set of positional embeddings.
beta
Learnable parameter representing another set of positional embeddings.
size
The embedding size for the module.
max_positions
The maximum number of positions for relative bias.
############# Examples
>>> rpb = RotaryRelativePositionBias(size=128, max_positions=2048)
>>> input_tensor = torch.randn(10, 128) # Sequence length of 10
>>> output_bias = rpb.forward(length=10)
>>> print(output_bias.shape) # Output shape will be (10, 10)
######### NOTE This module is based on research and implementations from Facebook Research’s MEGA project.
- Raises:ValueError – If the length exceeds the maximum number of positions.
Construct a RotaryRelativePositionBias object.
#
forward(length
Compute rotary relative position bias.
This method calculates the rotary relative position bias based on the input sequence length. It generates the bias using rotary positional embeddings computed from the module’s parameters.
- Parameters:length – Sequence length. This should not exceed the maximum number of relative positions defined during the initialization of the module.
- Returns: Rotary relative position bias. The shape of the output tensor is : (L, L), where L is the sequence length.
- Return type: bias
- Raises:ValueError – If the input length exceeds the maximum number of allowed positions defined during the initialization of the module.
############# Examples
>>> rotary_bias = RotaryRelativePositionBias(size=128, max_positions=2048)
>>> bias_matrix = rotary_bias.forward(length=10)
>>> print(bias_matrix.shape)
torch.Size([10, 10])
######### NOTE The method utilizes the rotary method to compute the rotary embeddings for the parameters alpha and beta, and then calculates the bias using the einsum operation.
static get_sinusoid_embeddings(max_positions: int, size: int) → Tuple[Tensor, Tensor]
RotaryRelativePositionBias module definition.
This module implements rotary relative positional bias using sinusoidal embeddings. The embeddings are generated based on the specified maximum number of positions and the size of the embeddings. This technique helps to capture the relative positions of tokens in a sequence, enhancing the model’s understanding of spatial relationships.
size
The embedding size for the positional encodings.
- Type: int
max_positions
The maximum number of relative positions allowed.
- Type: int
sine
Sine components of the sinusoidal embeddings.
- Type: torch.Tensor
cosine
Cosine components of the sinusoidal embeddings.
- Type: torch.Tensor
alpha
Learnable parameter for alpha.
- Type: torch.nn.Parameter
beta
Learnable parameter for beta.
Type: torch.nn.Parameter
Parameters:
- size (int) – Module embedding size.
- max_positions (int , optional) – Maximum number of relative positions. Defaults to 2048.
reset_parameters(val
float = 0.0, std: float = 0.02) -> None: Resets module parameters.
get_sinusoid_embeddings(max_positions
int, size: int) -> Tuple[torch.Tensor, torch.Tensor]: Computes sinusoidal positional embeddings.
rotary(x
torch.Tensor) -> torch.Tensor: Computes rotary positional embeddings.
forward(length
int) -> torch.Tensor: Computes rotary relative position bias.
############# Examples
>>> rrb = RotaryRelativePositionBias(size=128, max_positions=2048)
>>> bias = rrb.forward(length=10)
>>> print(bias.shape)
torch.Size([10, 10])
######### NOTE The sinusoidal embeddings are calculated using sine and cosine functions, which allows the model to effectively represent relative positions in a continuous manner.
#
reset_parameters(val
Reset module parameters.
This method initializes the parameters of the RotaryRelativePositionBias module (alpha and beta) using a normal distribution. The mean and standard deviation of the distribution can be specified via the parameters val and std.
- Parameters:
- val – Initialization value (mean of the normal distribution). Defaults to 0.0.
- std – Standard deviation of the normal distribution. Defaults to 0.02.
############# Examples
>>> rrp_bias = RotaryRelativePositionBias(size=128, max_positions=2048)
>>> rrp_bias.reset_parameters(val=0.1, std=0.01)
######### NOTE This method is typically called during the initialization of the module to ensure that parameters are set to reasonable starting values.
#
rotary(x
RotaryRelativePositionBias module definition.
This module computes rotary relative position bias using sinusoidal embeddings. It is designed to enhance the performance of attention-based models by providing a means to incorporate relative positional information.
- Parameters:
- size – Module embedding size.
- max_positions – Maximum number of relative positions.
sine
Sine components of the sinusoidal embeddings.
cosine
Cosine components of the sinusoidal embeddings.
alpha
Learnable parameter for rotary position bias.
beta
Learnable parameter for rotary position bias.
_pe
Buffer for positional embeddings.
size
Size of the embeddings.
max_positions
Maximum number of positions allowed.
reset_parameters(val
float = 0.0, std: float = 0.02) -> None: Resets the module parameters.
get_sinusoid_embeddings(max_positions
int, size: int) -> Tuple[torch.Tensor, torch.Tensor]: Computes sinusoidal positional embeddings.
rotary(x
torch.Tensor) -> torch.Tensor: Computes rotary positional embeddings.
forward(length
int) -> torch.Tensor: Computes rotary relative position bias.
############# Examples
>>> rrp_bias = RotaryRelativePositionBias(size=128, max_positions=2048)
>>> input_tensor = torch.randn(10, 128) # Example input (L, size)
>>> bias = rrp_bias.forward(length=10) # Compute bias for length 10
######### NOTE The rotary positional embeddings are computed based on the input sequence length, allowing dynamic adjustments for varying input sizes.
- Raises:ValueError – If the input length exceeds the maximum number of positions.