espnet2.asr.layers.cgmlp.ConvolutionalGatingMLP

About 2 min

espnet2.asr.layers.cgmlp.ConvolutionalGatingMLP

class espnet2.asr.layers.cgmlp.ConvolutionalGatingMLP(size: int, linear_units: int, kernel_size: int, dropout_rate: float, use_linear_after_conv: bool, gate_activation: str)

Bases: Module

Convolutional Gating MLP (cgMLP) class.

This class implements a Convolutional Gating Multi-Layer Perceptron (cgMLP), which uses a convolutional spatial gating unit to process input features. The cgMLP is designed to enhance the representation of sequential data by leveraging both linear and convolutional layers.

channel_proj1

A sequential model consisting of a linear layer followed by a GELU activation.

Type: torch.nn.Sequential

csgu

The convolutional spatial gating unit that applies gating mechanisms on the input.

Type:ConvolutionalSpatialGatingUnit

channel_proj2

A linear layer that projects the output from the spatial gating unit back to the original size.

Type: torch.nn.Linear
Parameters:
- size (int) – The size of the input features.
- linear_units (int) – The number of units in the linear layer before applying the spatial gating unit.
- kernel_size (int) – The size of the convolutional kernel used in the spatial gating unit.
- dropout_rate (float) – The dropout rate to be applied after the gating operation.
- use_linear_after_conv (bool) – If True, applies a linear layer after the convolutional operation within the spatial gating unit.
- gate_activation (str) – The activation function to use for the gating mechanism. Common options include ‘relu’, ‘sigmoid’, or ‘identity’.
Returns: The output tensor of shape (N, T, D) where N is the batch : size, T is the sequence length, and D is the feature size.
Return type: torch.Tensor

####### Examples

>>> model = ConvolutionalGatingMLP(size=256, linear_units=512, kernel_size=3,
...                                 dropout_rate=0.1,
...                                 use_linear_after_conv=True,
...                                 gate_activation='relu')
>>> input_tensor = torch.randn(10, 20, 256)  # (N, T, D)
>>> output = model(input_tensor, mask=None)
>>> print(output.shape)
torch.Size([10, 20, 256])  # Output shape is same as input size

NOTE

This model is particularly useful in scenarios where capturing spatial dependencies in sequential data is crucial, such as in speech recognition tasks.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x, mask)

Forward method for the Convolutional Gating MLP (cgMLP).

This method processes the input tensor through a series of transformations, including linear projections and convolutional gating. It can optionally incorporate a positional embedding if provided.

Parameters:
- x (Union *[*torch.Tensor , tuple ]) – Input tensor of shape (N, T, D) or a tuple containing:
  - xs_pad (torch.Tensor): Input tensor of shape (N, T, D).
  - pos_emb (torch.Tensor): Positional embedding tensor of shape (N, T, D).
- mask (torch.Tensor) – A mask tensor used for masking inputs, typically of shape (N, T).
Returns: If positional embedding is provided, returns a tuple containing:
- Output tensor of shape (N, T, size) after processing.
- Positional embedding tensor of shape (N, T, D). If no positional embedding is provided, returns only the output tensor.
Return type: torch.Tensor or tuple

####### Examples

>>> model = ConvolutionalGatingMLP(size=128, linear_units=256,
...                                 kernel_size=3, dropout_rate=0.1,
...                                 use_linear_after_conv=True,
...                                 gate_activation='relu')
>>> input_tensor = torch.randn(32, 10, 128)  # (N, T, D)
>>> mask = torch.ones(32, 10)  # (N, T)
>>> output = model(input_tensor, mask)
>>> print(output.shape)  # Output shape should be (32, 10, 128)

NOTE

The input tensor is expected to have the last dimension equal to the specified size for the model.