espnet2.asr.layers.cgmlp.ConvolutionalSpatialGatingUnit

About 3 min

espnet2.asr.layers.cgmlp.ConvolutionalSpatialGatingUnit

class espnet2.asr.layers.cgmlp.ConvolutionalSpatialGatingUnit(size: int, kernel_size: int, dropout_rate: float, use_linear_after_conv: bool, gate_activation: str)

Bases: Module

Convolutional Spatial Gating Unit (CSGU) for convolutional gating in MLPs.

This module implements a Convolutional Spatial Gating Unit, which is part of the convolutional gating MLP architecture. It splits the input tensor into two halves, applies normalization, convolution, and optional linear transformation, and then combines the results with gating mechanisms.

norm

Layer normalization applied to the gating input.

Type:LayerNorm

conv

1D convolutional layer applied to the normalized gating input.

Type:Conv1d

linear

Optional linear layer applied after the convolution.

Type:Linear or None

act

Activation function applied to the gated output.

Type: nn.Module

dropout

Dropout layer applied to the output.

Type: Dropout
Parameters:
- size (int) – The total number of input channels.
- kernel_size (int) – The size of the convolutional kernel.
- dropout_rate (float) – The dropout rate for the output.
- use_linear_after_conv (bool) – Flag indicating if a linear layer should be applied after convolution.
- gate_activation (str) – The activation function to use for the gating mechanism. Can be ‘identity’ or any valid activation name supported by PyTorch.

espnet_initialization_fn()

Initializes weights and biases for the layers.

forward(x

torch.Tensor, gate_add: Optional[torch.Tensor]) -> torch.Tensor: Performs the forward pass of the unit.

######### Examples

>>> csgu = ConvolutionalSpatialGatingUnit(size=64, kernel_size=3,
...     dropout_rate=0.1, use_linear_after_conv=True, gate_activation='relu')
>>> input_tensor = torch.randn(10, 20, 64)  # (N, T, D)
>>> gate_add_tensor = torch.randn(10, 20, 32)  # (N, T, D/2)
>>> output = csgu(input_tensor, gate_add=gate_add_tensor)
>>> print(output.shape)  # Should output: torch.Size([10, 20, 32])

####### NOTE The input tensor should have its last dimension equal to size, and if gate_add is provided, it should have a last dimension equal to size / 2.

Raises:ValueError – If the input tensor dimensions do not match the expected size.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

espnet_initialization_fn()

Initializes the weights and biases of the Convolutional Spatial Gating Unit (CSGU) components using a normal distribution for weights and ones for biases.

This method performs the following initializations: : - Initializes the convolutional layer’s weights with a normal <br/> distribution with mean 0 and standard deviation 1e-6.

Initializes the convolutional layer’s biases to ones.
If a linear layer is used after the convolution, initializes its weights similarly and its biases to ones.

####### NOTE This method is typically called after the model’s parameters have been set to ensure that they start from a reasonable point for training.

######### Examples

>>> csgu = ConvolutionalSpatialGatingUnit(size=128, kernel_size=3,
... dropout_rate=0.1, use_linear_after_conv=True,
... gate_activation='relu')
>>> csgu.espnet_initialization_fn()  # Initialize parameters

Raises:None –

forward(x

, gate_add=None)

Convolutional Spatial Gating Unit (CSGU).

This module applies a convolutional gating mechanism to the input tensor, allowing for dynamic modulation of the input based on learned spatial patterns. The input tensor is split into two halves, where one half is processed through a convolutional layer and the other half is used to control the gating mechanism.

norm

Layer normalization applied to the gating tensor.

Type:LayerNorm

conv

1D convolution layer for the gating tensor.

Type:Conv1d

linear

Optional linear layer applied after the convolution.

Type:Linear or None

act

Activation function for the gating output.

Type: Module

dropout

Dropout layer for regularization.

Type: Dropout
Parameters:
- size (int) – Total number of input channels (must be even).
- kernel_size (int) – Size of the convolutional kernel.
- dropout_rate (float) – Dropout rate for regularization.
- use_linear_after_conv (bool) – If True, a linear layer is used after the convolution.
- gate_activation (str) – Activation function to use for the gating output (e.g., “relu”, “sigmoid”, “identity”).
Returns: The output tensor with the same shape as the input, where the first half of the channels has been gated by the processed second half.
Return type: torch.Tensor

######### Examples

>>> import torch
>>> csgu = ConvolutionalSpatialGatingUnit(size=4, kernel_size=3,
...                                       dropout_rate=0.1,
...                                       use_linear_after_conv=True,
...                                       gate_activation='relu')
>>> x = torch.randn(2, 10, 4)  # (N, T, D)
>>> output = csgu(x)
>>> output.shape
torch.Size([2, 10, 4])

####### NOTE The input tensor x must have an even number of channels to be split into two halves for gating.