espnet2.asr_transducer.encoder.modules.attention.RelPositionMultiHeadedAttention
espnet2.asr_transducer.encoder.modules.attention.RelPositionMultiHeadedAttention
class espnet2.asr_transducer.encoder.modules.attention.RelPositionMultiHeadedAttention(num_heads: int, embed_size: int, dropout_rate: float = 0.0, simplified_attention_score: bool = False)
Bases: Module
RelPositionMultiHeadedAttention definition.
- Parameters:
- num_heads β Number of attention heads.
- embed_size β Embedding size.
- dropout_rate β Dropout rate.
Construct an MultiHeadedAttention object.
compute_attention_score(query: Tensor, key: Tensor, pos_enc: Tensor, left_context: int = 0) β Tensor
Attention score computation.
- Parameters:
- query β Transformed query tensor. (B, H, T_1, d_k)
- key β Transformed key tensor. (B, H, T_2, d_k)
- pos_enc β Positional embedding tensor. (B, 2 * T_1 - 1, size)
- left_context β Number of previous frames to use for current chunk attention computation.
- Returns: Attention score. (B, H, T_1, T_2)
compute_simplified_attention_score(query: Tensor, key: Tensor, pos_enc: Tensor, left_context: int = 0) β Tensor
Simplified attention score computation.
Reference: https://github.com/k2-fsa/icefall/pull/458
- Parameters:
- query β Transformed query tensor. (B, H, T_1, d_k)
- key β Transformed key tensor. (B, H, T_2, d_k)
- pos_enc β Positional embedding tensor. (B, 2 * T_1 - 1, size)
- left_context β Number of previous frames to use for current chunk attention computation.
- Returns: Attention score. (B, H, T_1, T_2)
forward(query: Tensor, key: Tensor, value: Tensor, pos_enc: Tensor, mask: Tensor, chunk_mask: Tensor | None = None, left_context: int = 0) β Tensor
Compute scaled dot product attention with rel. positional encoding.
- Parameters:
- query β Query tensor. (B, T_1, size)
- key β Key tensor. (B, T_2, size)
- value β Value tensor. (B, T_2, size)
- pos_enc β Positional embedding tensor. (B, 2 * T_1 - 1, size)
- mask β Source mask. (B, T_2)
- chunk_mask β Chunk mask. (T_1, T_1)
- left_context β Number of previous frames to use for current chunk attention computation.
- Returns: Output tensor. (B, T_1, H * d_k)
forward_attention(value: Tensor, scores: Tensor, mask: Tensor, chunk_mask: Tensor | None = None) β Tensor
Compute attention context vector.
- Parameters:
- value β Transformed value. (B, H, T_2, d_k)
- scores β Attention score. (B, H, T_1, T_2)
- mask β Source mask. (B, T_2)
- chunk_mask β Chunk mask. (T_1, T_1)
- Returns: Transformed value weighted by attention score. (B, T_1, H * d_k)
- Return type: attn_output
forward_qkv(query: Tensor, key: Tensor, value: Tensor) β Tuple[Tensor, Tensor, Tensor]
Transform query, key and value.
- Parameters:
- query β Query tensor. (B, T_1, size)
- key β Key tensor. (B, T_2, size)
- v β Value tensor. (B, T_2, size)
- Returns: Transformed query tensor. (B, H, T_1, d_k) k: Transformed key tensor. (B, H, T_2, d_k) v: Transformed value tensor. (B, H, T_2, d_k)
- Return type: q
rel_shift(x: Tensor, left_context: int = 0) β Tensor
Compute relative positional encoding.
- Parameters:
- x β Input sequence. (B, H, T_1, 2 * T_1 - 1)
- left_context β Number of previous frames to use for current chunk attention computation.
- Returns: Output sequence. (B, H, T_1, T_2)
- Return type: x
