espnet2.asr_transducer.decoder.modules.mega.positional_bias.RotaryRelativePositionBias

About 4 min

espnet2.asr_transducer.decoder.modules.mega.positional_bias.RotaryRelativePositionBias

class espnet2.asr_transducer.decoder.modules.mega.positional_bias.RotaryRelativePositionBias(size: int, max_positions: int = 2048)

Bases: Module

RotaryRelativePositionBias module definition.

This module computes rotary relative position biases using sinusoidal positional embeddings. It is designed to enhance the performance of transformer models by providing a mechanism to capture relative positional information.

Parameters:
- size – Module embedding size.
- max_positions – Maximum number of relative positions (default is 2048).

sine

Sine components of the sinusoidal embeddings.

cosine

Cosine components of the sinusoidal embeddings.

alpha

Learnable parameter representing one set of positional embeddings.

beta

Learnable parameter representing another set of positional embeddings.

size

The embedding size for the module.

max_positions

The maximum number of positions for relative bias.

############# Examples

>>> rpb = RotaryRelativePositionBias(size=128, max_positions=2048)
>>> input_tensor = torch.randn(10, 128)  # Sequence length of 10
>>> output_bias = rpb.forward(length=10)
>>> print(output_bias.shape)  # Output shape will be (10, 10)

######### NOTE This module is based on research and implementations from Facebook Research’s MEGA project.

Raises:ValueError – If the length exceeds the maximum number of positions.

Construct a RotaryRelativePositionBias object.

forward(length

: int) → Tensor

Compute rotary relative position bias.

This method calculates the rotary relative position bias based on the input sequence length. It generates the bias using rotary positional embeddings computed from the module’s parameters.

Parameters:length – Sequence length. This should not exceed the maximum number of relative positions defined during the initialization of the module.
Returns: Rotary relative position bias. The shape of the output tensor is : (L, L), where L is the sequence length.
Return type: bias
Raises:ValueError – If the input length exceeds the maximum number of allowed positions defined during the initialization of the module.

############# Examples

>>> rotary_bias = RotaryRelativePositionBias(size=128, max_positions=2048)
>>> bias_matrix = rotary_bias.forward(length=10)
>>> print(bias_matrix.shape)
torch.Size([10, 10])

######### NOTE The method utilizes the rotary method to compute the rotary embeddings for the parameters alpha and beta, and then calculates the bias using the einsum operation.

static get_sinusoid_embeddings(max_positions: int, size: int) → Tuple[Tensor, Tensor]

RotaryRelativePositionBias module definition.

This module implements rotary relative positional bias using sinusoidal embeddings. The embeddings are generated based on the specified maximum number of positions and the size of the embeddings. This technique helps to capture the relative positions of tokens in a sequence, enhancing the model’s understanding of spatial relationships.

size

The embedding size for the positional encodings.

Type: int

max_positions

The maximum number of relative positions allowed.

Type: int

sine

Sine components of the sinusoidal embeddings.

Type: torch.Tensor

cosine

Cosine components of the sinusoidal embeddings.

Type: torch.Tensor

alpha

Learnable parameter for alpha.

Type: torch.nn.Parameter

beta

Learnable parameter for beta.

Type: torch.nn.Parameter
Parameters:
- size (int) – Module embedding size.
- max_positions (int , optional) – Maximum number of relative positions. Defaults to 2048.

reset_parameters(val

float = 0.0, std: float = 0.02) -> None: Resets module parameters.

get_sinusoid_embeddings(max_positions

int, size: int) -> Tuple[torch.Tensor, torch.Tensor]: Computes sinusoidal positional embeddings.

rotary(x

torch.Tensor) -> torch.Tensor: Computes rotary positional embeddings.

forward(length

int) -> torch.Tensor: Computes rotary relative position bias.

############# Examples

>>> rrb = RotaryRelativePositionBias(size=128, max_positions=2048)
>>> bias = rrb.forward(length=10)
>>> print(bias.shape)
torch.Size([10, 10])

######### NOTE The sinusoidal embeddings are calculated using sine and cosine functions, which allows the model to effectively represent relative positions in a continuous manner.

reset_parameters(val

: float = 0.0, std: float = 0.02) → None

Reset module parameters.

This method initializes the parameters of the RotaryRelativePositionBias module (alpha and beta) using a normal distribution. The mean and standard deviation of the distribution can be specified via the parameters val and std.

Parameters:
- val – Initialization value (mean of the normal distribution). Defaults to 0.0.
- std – Standard deviation of the normal distribution. Defaults to 0.02.

############# Examples

>>> rrp_bias = RotaryRelativePositionBias(size=128, max_positions=2048)
>>> rrp_bias.reset_parameters(val=0.1, std=0.01)

######### NOTE This method is typically called during the initialization of the module to ensure that parameters are set to reasonable starting values.

rotary(x

: Tensor) → Tensor

RotaryRelativePositionBias module definition.

This module computes rotary relative position bias using sinusoidal embeddings. It is designed to enhance the performance of attention-based models by providing a means to incorporate relative positional information.

Parameters:
- size – Module embedding size.
- max_positions – Maximum number of relative positions.

sine

Sine components of the sinusoidal embeddings.

cosine

Cosine components of the sinusoidal embeddings.

alpha

Learnable parameter for rotary position bias.

beta

Learnable parameter for rotary position bias.

_pe

Buffer for positional embeddings.

size

Size of the embeddings.

max_positions

Maximum number of positions allowed.

reset_parameters(val

float = 0.0, std: float = 0.02) -> None: Resets the module parameters.

get_sinusoid_embeddings(max_positions

int, size: int) -> Tuple[torch.Tensor, torch.Tensor]: Computes sinusoidal positional embeddings.

rotary(x

torch.Tensor) -> torch.Tensor: Computes rotary positional embeddings.

forward(length

int) -> torch.Tensor: Computes rotary relative position bias.

############# Examples

>>> rrp_bias = RotaryRelativePositionBias(size=128, max_positions=2048)
>>> input_tensor = torch.randn(10, 128)  # Example input (L, size)
>>> bias = rrp_bias.forward(length=10)  # Compute bias for length 10

######### NOTE The rotary positional embeddings are computed based on the input sequence length, allowing dynamic adjustments for varying input sizes.

Raises:ValueError – If the input length exceeds the maximum number of positions.