espnet2.spk.encoder.ska_tdnn_encoder.cwSKAttention

About 2 min

espnet2.spk.encoder.ska_tdnn_encoder.cwSKAttention

class espnet2.spk.encoder.ska_tdnn_encoder.cwSKAttention(freq=40, channel=128, kernels=[3, 5], receptive=[3, 5], dilations=[1, 1], reduction=8, groups=1, L=16)

Bases: Module

cwSKAttention is a convolutional kernel attention module that applies selective

kernel attention on feature maps in a multi-scale fashion. It utilizes multiple convolutions with different kernel sizes to capture various feature representations, and combines them through an attention mechanism.

convs

A list of convolutional layers with different kernel sizes for feature extraction.

Type: nn.ModuleList

avg_pool

Adaptive average pooling layer to reduce feature dimensions.

Type: nn.AdaptiveAvgPool2d

The dimension of the intermediate representation, determined by the reduction factor.

Type: int

Fully connected layer for transforming the pooled features.

Type: nn.Linear

relu

ReLU activation function.

Type: nn.ReLU

A list of fully connected layers to generate attention weights for each kernel.

Type: nn.ModuleList

softmax

Softmax layer to normalize the attention weights.

Type: nn.Softmax
Parameters:
- freq (int) – The frequency dimension of the input features.
- channel (int) – The number of channels in the input feature map.
- kernels (list) – List of kernel sizes to be used in the convolutional layers.
- receptive (list) – List of receptive field sizes corresponding to each kernel.
- dilations (list) – List of dilation rates for each convolutional layer.
- reduction (int) – Reduction factor for the dimensionality of the features.
- groups (int) – Number of groups for grouped convolution.
- L (int) – Maximum number of features to keep in the intermediate layer.
Returns: The output feature map after applying selective kernel attention.
Return type: Tensor

####### Examples

>>> model = cwSKAttention(freq=40, channel=128, kernels=[3, 5])
>>> input_tensor = torch.randn(8, 128, 40, 100)  # (B, C, F, T)
>>> output = model(input_tensor)
>>> print(output.shape)
torch.Size([8, 128, 40, 100])  # Output shape matches input shape

NOTE

The cwSKAttention module is designed to enhance the representational capacity of convolutional neural networks by allowing the model to learn which features are most important for the task at hand, through adaptive attention mechanisms.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Forward function.

This method performs a forward pass through the cwSKAttention module. It takes an input tensor and applies a series of convolutional layers, attention mechanisms, and activation functions to compute the output tensor.

Parameters:x (torch.Tensor) – Input tensor of shape [B, C, F, T], where B is the batch size, C is the number of channels, F is the frequency dimension, and T is the time dimension.
Returns: Output tensor of shape [B, C, F, T] after applying attention and convolutions.
Return type: torch.Tensor

####### Examples

>>> model = cwSKAttention()
>>> input_tensor = torch.randn(16, 128, 40, 100)  # Batch of 16
>>> output_tensor = model(input_tensor)
>>> output_tensor.shape
torch.Size([16, 128, 40, 100])