espnet2.asr_transducer.decoder.modules.mega.feed_forward.NormalizedPositionwiseFeedForward
espnet2.asr_transducer.decoder.modules.mega.feed_forward.NormalizedPositionwiseFeedForward
class espnet2.asr_transducer.decoder.modules.mega.feed_forward.NormalizedPositionwiseFeedForward(size: int, hidden_size: int, normalization: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.normalization.LayerNorm'>, activation: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.ReLU'>, dropout_rate: float = 0.0)
Bases: Module
Normalized position-wise feed-forward module for MEGA block.
This module implements a normalized position-wise feed-forward layer that is commonly used in transformer architectures. It applies linear transformations to the input data, followed by an activation function, dropout, and normalization.
linear1
First linear transformation layer.
- Type: torch.nn.Linear
linear2
Second linear transformation layer.
- Type: torch.nn.Linear
normalization
Normalization module to apply.
- Type: torch.nn.Module
activation
Activation function to apply.
- Type: torch.nn.Module
dropout
Dropout layer for regularization.
- Type: torch.nn.Dropout
hidden_dropout
Dropout layer for hidden units.
Type: torch.nn.Dropout
Parameters:
- size (int) – Input/Output size.
- hidden_size (int) – Hidden size.
- normalization (torch.nn.Module , optional) – Normalization module (default: torch.nn.LayerNorm).
- activation (torch.nn.Module , optional) – Activation function (default: torch.nn.ReLU).
- dropout_rate (float , optional) – Dropout rate (default: 0.0).
######### Examples
>>> ff = NormalizedPositionwiseFeedForward(size=512, hidden_size=2048)
>>> input_tensor = torch.randn(32, 10, 512) # (B, L, size)
>>> output_tensor = ff(input_tensor)
>>> output_tensor.shape
torch.Size([32, 10, 512])
####### NOTE The module is designed to be used within a larger neural network architecture, particularly in sequence-to-sequence tasks.
- Raises:ValueError – If size or hidden_size is non-positive.
Construct an NormalizedPositionwiseFeedForward object.
forward(x: Tensor) → Tensor
Compute feed-forward module.
This method applies a two-layer feed-forward network with residual connections and normalization. It first passes the input through a linear layer followed by an activation function, applies dropout, and then passes the result through a second linear layer. The input is added back to the output (residual connection), and finally, the result is normalized.
- Parameters:x – Input tensor of shape (B, L, size), where: B is the batch size, L is the sequence length, and size is the input/output size.
- Returns: Output tensor of the same shape as input (B, L, size).
- Return type: torch.Tensor
######### Examples
>>> ff = NormalizedPositionwiseFeedForward(size=512, hidden_size=2048)
>>> input_tensor = torch.rand(32, 10, 512) # Batch of 32, sequence length of 10
>>> output_tensor = ff.forward(input_tensor)
>>> print(output_tensor.shape) # Should output: torch.Size([32, 10, 512])
####### NOTE The activation function and normalization module can be customized during the initialization of the NormalizedPositionwiseFeedForward object.
reset_parameters(val: float = 0.0, std: float = 0.02) → None
Reset module parameters.
This method initializes the weights and biases of the linear layers in the NormalizedPositionwiseFeedForward module. The weights are initialized using a normal distribution with a specified mean and standard deviation, while the biases are initialized to a constant value.
- Parameters:
- val – Initialization value for biases and the mean of the weight initialization (default is 0.0).
- std – Standard deviation for the normal distribution used to initialize the weights (default is 0.02).
######### Examples
>>> ff = NormalizedPositionwiseFeedForward(size=512, hidden_size=2048)
>>> ff.reset_parameters(val=0.1, std=0.01)
####### NOTE This method should be called if you want to reinitialize the parameters of the model after it has been created or modified.