espnet2.asr.layers.multiconv_cgmlp.MultiConvolutionalSpatialGatingUnit
espnet2.asr.layers.multiconv_cgmlp.MultiConvolutionalSpatialGatingUnit
class espnet2.asr.layers.multiconv_cgmlp.MultiConvolutionalSpatialGatingUnit(size: int, arch_type: str, kernel_sizes: str, merge_conv_kernel: int, use_non_linear: bool, dropout_rate: float, use_linear_after_conv: bool, activation, gate_activation: str)
Bases: Module
Multi Convolutional Spatial Gating Unit (M-CSGU).
This class implements a multi-convolutional spatial gating unit that applies several convolutional layers to input data and merges the outputs based on the specified architecture type. It can be used as a building block in advanced neural network architectures, particularly in applications involving speech and audio processing.
norm
Layer normalization applied to the input channels.
- Type:LayerNorm
arch_type
Type of architecture for merging convolutions (‘sum’, ‘weighted_sum’, ‘concat’, or ‘concat_fusion’).
- Type: str
convs
List of convolutional layers applied to the input.
- Type: ModuleList
use_non_linear
Flag indicating whether to apply a non-linear activation after convolution.
- Type: bool
kernel_prob_gen
Module to generate kernel probabilities for weighted sum architecture.
- Type: Sequential
depthwise_conv_fusion
Convolution layer for concatenation fusion architecture.
- Type:Conv1d
linear
Optional linear layer applied after convolutions.
- Type:Linear
model_act
Activation function for the output.
act
Activation function for gating.
dropout
Dropout layer for regularization.
Type: Dropout
Parameters:
- size (int) – Total number of input channels, which will be split in half.
- arch_type (str) – Type of architecture for merging convolutions.
- kernel_sizes (str) – Comma-separated string of kernel sizes for convolutional layers.
- merge_conv_kernel (int) – Kernel size for the depthwise convolution in concatenation fusion.
- use_non_linear (bool) – Whether to apply a non-linear activation after convolution.
- dropout_rate (float) – Dropout rate for regularization.
- use_linear_after_conv (bool) – Whether to apply a linear layer after convolutions.
- activation – Activation function to be used in the model.
- gate_activation (str) – Activation function to be applied to the gating.
Returns: Output tensor with shape (N, T, D/2).
Return type: out (torch.Tensor)
Raises:NotImplementedError – If an unknown architecture type is specified.
######### Examples
>>> mcs_gating_unit = MultiConvolutionalSpatialGatingUnit(
... size=64,
... arch_type='sum',
... kernel_sizes='3,5',
... merge_conv_kernel=3,
... use_non_linear=True,
... dropout_rate=0.1,
... use_linear_after_conv=True,
... activation=torch.nn.ReLU(),
... gate_activation='sigmoid'
... )
>>> input_tensor = torch.randn(10, 20, 64) # (N, T, D)
>>> output_tensor = mcs_gating_unit(input_tensor)
>>> output_tensor.shape
torch.Size([10, 20, 32]) # Output shape is (N, T, D/2)
####### NOTE The input tensor should have an even number of channels, as it is split into two halves for processing.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
espnet_initialization_fn()
Initializes the weights and biases of the convolutional layers.
This method applies a normal distribution initialization to the weights of each convolutional layer and sets the biases to ones. It also handles the initialization for any additional linear layers present in the MultiConvolutionalSpatialGatingUnit.
The initialization strategy used is as follows: : - Convolutional weights are initialized using a normal distribution <br/> with a standard deviation of 1e-6.
- Biases are initialized to one.
This method is typically called after the model has been created to ensure that the parameters are initialized appropriately before training.
######### Examples
>>> mcs_gu = MultiConvolutionalSpatialGatingUnit(size=64,
... arch_type='sum', kernel_sizes='3,5', merge_conv_kernel=3,
... use_non_linear=True, dropout_rate=0.1,
... use_linear_after_conv=True, activation=torch.nn.ReLU(),
... gate_activation='sigmoid')
>>> mcs_gu.espnet_initialization_fn() # Initialize parameters
####### NOTE Ensure that this method is called after creating an instance of the MultiConvolutionalSpatialGatingUnit and before starting training.
- Raises:
- NotImplementedError – If the architecture type does not match any
- known configuration. –
forward(x, gate_add=None)
Perform the forward pass of the MultiConvolutionalSpatialGatingUnit.
This method takes an input tensor and processes it through multiple convolutional layers, applying spatial gating mechanisms based on the specified architecture type. The output is then modulated by a gating mechanism, which can include an optional additive gate.
- Parameters:
- x (torch.Tensor) – Input tensor of shape (N, T, D), where N is the batch size, T is the sequence length, and D is the number of input channels.
- gate_add (torch.Tensor , optional) – Tensor of shape (N, T, D/2) to be added to the output after gating. If None, no addition is performed.
- Returns: Output tensor of shape (N, T, D/2) after : applying convolutions, gating, and dropout.
- Return type: out (torch.Tensor)
######### Examples
>>> model = MultiConvolutionalSpatialGatingUnit(size=64, arch_type='sum',
... kernel_sizes='3,5', merge_conv_kernel=3, use_non_linear=True,
... dropout_rate=0.1, use_linear_after_conv=True,
... activation=torch.nn.ReLU(), gate_activation='sigmoid')
>>> x = torch.randn(32, 10, 64) # Example input
>>> output = model.forward(x)
>>> print(output.shape) # Should output: torch.Size([32, 10, 32])
####### NOTE The input tensor is expected to be split into real and imaginary components, with the imaginary part undergoing normalization and convolution operations. The output tensor represents the gated product of the real component and the processed imaginary component.