espnet2.asr_transducer.decoder.modules.mega.feed_forward.NormalizedPositionwiseFeedForward

About 2 min

espnet2.asr_transducer.decoder.modules.mega.feed_forward.NormalizedPositionwiseFeedForward

class espnet2.asr_transducer.decoder.modules.mega.feed_forward.NormalizedPositionwiseFeedForward(size: int, hidden_size: int, normalization: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.normalization.LayerNorm'>, activation: ~torch.nn.modules.module.Module = <class 'torch.nn.modules.activation.ReLU'>, dropout_rate: float = 0.0)

Bases: Module

Normalized position-wise feed-forward module for MEGA block.

This module implements a normalized position-wise feed-forward layer that is commonly used in transformer architectures. It applies linear transformations to the input data, followed by an activation function, dropout, and normalization.

linear1

First linear transformation layer.

Type: torch.nn.Linear

linear2

Second linear transformation layer.

Type: torch.nn.Linear

normalization

Normalization module to apply.

Type: torch.nn.Module

activation

Activation function to apply.

Type: torch.nn.Module

dropout

Dropout layer for regularization.

Type: torch.nn.Dropout

hidden_dropout

Dropout layer for hidden units.

Type: torch.nn.Dropout
Parameters:
- size (int) – Input/Output size.
- hidden_size (int) – Hidden size.
- normalization (torch.nn.Module , optional) – Normalization module (default: torch.nn.LayerNorm).
- activation (torch.nn.Module , optional) – Activation function (default: torch.nn.ReLU).
- dropout_rate (float , optional) – Dropout rate (default: 0.0).

######### Examples

>>> ff = NormalizedPositionwiseFeedForward(size=512, hidden_size=2048)
>>> input_tensor = torch.randn(32, 10, 512)  # (B, L, size)
>>> output_tensor = ff(input_tensor)
>>> output_tensor.shape
torch.Size([32, 10, 512])

####### NOTE The module is designed to be used within a larger neural network architecture, particularly in sequence-to-sequence tasks.

Raises:ValueError – If size or hidden_size is non-positive.

Construct an NormalizedPositionwiseFeedForward object.

forward(x: Tensor) → Tensor

Compute feed-forward module.

This method applies a two-layer feed-forward network with residual connections and normalization. It first passes the input through a linear layer followed by an activation function, applies dropout, and then passes the result through a second linear layer. The input is added back to the output (residual connection), and finally, the result is normalized.

Parameters:x – Input tensor of shape (B, L, size), where: B is the batch size, L is the sequence length, and size is the input/output size.
Returns: Output tensor of the same shape as input (B, L, size).
Return type: torch.Tensor

######### Examples

>>> ff = NormalizedPositionwiseFeedForward(size=512, hidden_size=2048)
>>> input_tensor = torch.rand(32, 10, 512)  # Batch of 32, sequence length of 10
>>> output_tensor = ff.forward(input_tensor)
>>> print(output_tensor.shape)  # Should output: torch.Size([32, 10, 512])

####### NOTE The activation function and normalization module can be customized during the initialization of the NormalizedPositionwiseFeedForward object.

reset_parameters(val: float = 0.0, std: float = 0.02) → None

Reset module parameters.

This method initializes the weights and biases of the linear layers in the NormalizedPositionwiseFeedForward module. The weights are initialized using a normal distribution with a specified mean and standard deviation, while the biases are initialized to a constant value.

Parameters:
- val – Initialization value for biases and the mean of the weight initialization (default is 0.0).
- std – Standard deviation for the normal distribution used to initialize the weights (default is 0.02).

######### Examples

>>> ff = NormalizedPositionwiseFeedForward(size=512, hidden_size=2048)
>>> ff.reset_parameters(val=0.1, std=0.01)

####### NOTE This method should be called if you want to reinitialize the parameters of the model after it has been created or modified.