espnet2.asr.encoder.e_branchformer_ctc_encoder.EBranchformerEncoderLayer

About 2 min

espnet2.asr.encoder.e_branchformer_ctc_encoder.EBranchformerEncoderLayer

class espnet2.asr.encoder.e_branchformer_ctc_encoder.EBranchformerEncoderLayer(size: int, attn: Module, cgmlp: Module, feed_forward: Module | None, feed_forward_macaron: Module | None, cross_attn: Module | None, dropout_rate: float, merge_conv_kernel: int = 3)

Bases: Module

E-Branchformer encoder layer module.

This layer implements an enhanced version of the E-Branchformer encoder layer, incorporating additional cross-attention modules. It is designed to facilitate improved processing of input sequences by integrating both self-attention and convolutional gating mechanisms.

size

The dimension of the model.

Type: int

attn

The attention module, either standard or efficient.

Type: torch.nn.Module

cgmlp

The Convolutional Gating MLP module.

Type: torch.nn.Module

feed_forward

The feed-forward module, if applicable.

Type: Optional[torch.nn.Module]

feed_forward

_macaron

A macaron-style feed-forward module, if applicable.

Type: Optional[torch.nn.Module]

cross_attn

The cross-attention module, if applicable.

Type: Optional[torch.nn.Module]

dropout

The dropout layer for regularization.

Type: torch.nn.Dropout

depthwise_conv_fusion

The depthwise convolution layer for merging branches.

Type: torch.nn.Conv1d

merge_proj

The linear projection layer for merging outputs.

Type: torch.nn.Linear
Parameters:
- size (int) – Model dimension.
- attn (torch.nn.Module) – Attention module (self-attention or efficient attention).
- cgmlp (torch.nn.Module) – Convolutional Gating MLP.
- feed_forward (Optional *[*torch.nn.Module ]) – Feed-forward module.
- feed_forward_macaron (Optional *[*torch.nn.Module ]) – Macaron-style feed-forward module.
- cross_attn (Optional *[*torch.nn.Module ]) – Cross-attention module.
- dropout_rate (float) – Dropout probability.
- merge_conv_kernel (int) – Kernel size of the depth-wise conv in the merge module.
Raises:NotImplementedError – If cache is provided in the forward pass, as this functionality is not implemented.

####### Examples

>>> encoder_layer = EBranchformerEncoderLayer(
...     size=256,
...     attn=MultiHeadedAttention(4, 256, 0.1),
...     cgmlp=ConvolutionalGatingMLP(256, 2048, 31, 0.1),
...     feed_forward=None,
...     feed_forward_macaron=None,
...     cross_attn=None,
...     dropout_rate=0.1
... )
>>> x_input = torch.randn(32, 10, 256)  # (batch, time, size)
>>> mask = torch.ones(32, 1, 10)  # (batch, 1, time)
>>> output, output_mask = encoder_layer(x_input, mask)

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x_input, mask, cache=None, memory=None, memory_mask=None)

Compute encoded features.

This method processes the input tensor through the E-Branchformer encoder layer. It utilizes both self-attention and convolutional gating mechanisms, merging the results to produce the final output tensor.

Parameters:
- x_input (Union *[*Tuple , torch.Tensor ]) –
  Input tensor with or without positional embedding. It can be:
  - A tuple containing:
  - torch.Tensor: Input tensor of shape
  (#batch, time, size).
  - torch.Tensor: Positional embedding tensor of shape (1, time, size).
  - A torch.Tensor of shape (#batch, time, size) without
  positional embedding.
- mask (torch.Tensor) – Mask tensor for the input with shape (#batch, 1, time) to indicate valid positions.
- cache (torch.Tensor , optional) – Cache tensor of the input with shape (#batch, time - 1, size). If provided, the function raises a NotImplementedError.
- memory (torch.Tensor , optional) – Memory tensor for cross attention, with shape (#batch, memory_time, size).
- memory_mask (torch.Tensor , optional) – Mask for the memory tensor, with shape (#batch, 1, memory_time).
Returns: If positional embedding is provided, returns a tuple:
- torch.Tensor: Output tensor of shape (#batch, time, size).
- torch.Tensor: Positional embedding tensor. Otherwise, returns:
- torch.Tensor: Output tensor of shape (#batch, time, size).
- torch.Tensor: Mask tensor of shape (#batch, time).
Return type: Union[Tuple[torch.Tensor, torch.Tensor], torch.Tensor]
Raises:NotImplementedError – If cache is not None.

####### Examples

>>> layer = EBranchformerEncoderLayer(...)
>>> input_tensor = torch.randn(2, 10, 256)  # (batch, time, size)
>>> mask = torch.ones(2, 1, 10)  # (batch, 1, time)
>>> output, output_mask = layer(input_tensor, mask)

NOTE

The cache parameter is not implemented and will raise an error if provided.