espnet2.asr.encoder.e_branchformer_ctc_encoder.EBranchformerEncoderLayer
espnet2.asr.encoder.e_branchformer_ctc_encoder.EBranchformerEncoderLayer
class espnet2.asr.encoder.e_branchformer_ctc_encoder.EBranchformerEncoderLayer(size: int, attn: Module, cgmlp: Module, feed_forward: Module | None, feed_forward_macaron: Module | None, cross_attn: Module | None, dropout_rate: float, merge_conv_kernel: int = 3)
Bases: Module
E-Branchformer encoder layer module.
This layer implements an enhanced version of the E-Branchformer encoder layer, incorporating additional cross-attention modules. It is designed to facilitate improved processing of input sequences by integrating both self-attention and convolutional gating mechanisms.
size
The dimension of the model.
- Type: int
attn
The attention module, either standard or efficient.
- Type: torch.nn.Module
cgmlp
The Convolutional Gating MLP module.
- Type: torch.nn.Module
feed_forward
The feed-forward module, if applicable.
- Type: Optional[torch.nn.Module]
feed_forward
A macaron-style feed-forward module, if applicable.
- Type: Optional[torch.nn.Module]
cross_attn
The cross-attention module, if applicable.
- Type: Optional[torch.nn.Module]
dropout
The dropout layer for regularization.
- Type: torch.nn.Dropout
depthwise_conv_fusion
The depthwise convolution layer for merging branches.
- Type: torch.nn.Conv1d
merge_proj
The linear projection layer for merging outputs.
Type: torch.nn.Linear
Parameters:
- size (int) – Model dimension.
- attn (torch.nn.Module) – Attention module (self-attention or efficient attention).
- cgmlp (torch.nn.Module) – Convolutional Gating MLP.
- feed_forward (Optional *[*torch.nn.Module ]) – Feed-forward module.
- feed_forward_macaron (Optional *[*torch.nn.Module ]) – Macaron-style feed-forward module.
- cross_attn (Optional *[*torch.nn.Module ]) – Cross-attention module.
- dropout_rate (float) – Dropout probability.
- merge_conv_kernel (int) – Kernel size of the depth-wise conv in the merge module.
Raises:NotImplementedError – If cache is provided in the forward pass, as this functionality is not implemented.
####### Examples
>>> encoder_layer = EBranchformerEncoderLayer(
... size=256,
... attn=MultiHeadedAttention(4, 256, 0.1),
... cgmlp=ConvolutionalGatingMLP(256, 2048, 31, 0.1),
... feed_forward=None,
... feed_forward_macaron=None,
... cross_attn=None,
... dropout_rate=0.1
... )
>>> x_input = torch.randn(32, 10, 256) # (batch, time, size)
>>> mask = torch.ones(32, 1, 10) # (batch, 1, time)
>>> output, output_mask = encoder_layer(x_input, mask)
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(x_input, mask, cache=None, memory=None, memory_mask=None)
Compute encoded features.
This method processes the input tensor through the E-Branchformer encoder layer. It utilizes both self-attention and convolutional gating mechanisms, merging the results to produce the final output tensor.
- Parameters:
x_input (Union *[*Tuple , torch.Tensor ]) –
Input tensor with or without positional embedding. It can be:
- A tuple containing:
- torch.Tensor: Input tensor of shape
(#batch, time, size).
- torch.Tensor: Positional embedding tensor of shape (1, time, size).
- A torch.Tensor of shape (#batch, time, size) without
positional embedding.
mask (torch.Tensor) – Mask tensor for the input with shape (#batch, 1, time) to indicate valid positions.
cache (torch.Tensor , optional) – Cache tensor of the input with shape (#batch, time - 1, size). If provided, the function raises a NotImplementedError.
memory (torch.Tensor , optional) – Memory tensor for cross attention, with shape (#batch, memory_time, size).
memory_mask (torch.Tensor , optional) – Mask for the memory tensor, with shape (#batch, 1, memory_time).
- Returns: If positional embedding is provided, returns a tuple:
- torch.Tensor: Output tensor of shape (#batch, time, size).
- torch.Tensor: Positional embedding tensor. Otherwise, returns:
- torch.Tensor: Output tensor of shape (#batch, time, size).
- torch.Tensor: Mask tensor of shape (#batch, time).
- Return type: Union[Tuple[torch.Tensor, torch.Tensor], torch.Tensor]
- Raises:NotImplementedError – If cache is not None.
####### Examples
>>> layer = EBranchformerEncoderLayer(...)
>>> input_tensor = torch.randn(2, 10, 256) # (batch, time, size)
>>> mask = torch.ones(2, 1, 10) # (batch, 1, time)
>>> output, output_mask = layer(input_tensor, mask)
NOTE
The cache parameter is not implemented and will raise an error if provided.