espnet2.spk.encoder.ska_tdnn_encoder.cwSKAttention
espnet2.spk.encoder.ska_tdnn_encoder.cwSKAttention
class espnet2.spk.encoder.ska_tdnn_encoder.cwSKAttention(freq=40, channel=128, kernels=[3, 5], receptive=[3, 5], dilations=[1, 1], reduction=8, groups=1, L=16)
Bases: Module
cwSKAttention is a convolutional kernel attention module that applies selective
kernel attention on feature maps in a multi-scale fashion. It utilizes multiple convolutions with different kernel sizes to capture various feature representations, and combines them through an attention mechanism.
convs
A list of convolutional layers with different kernel sizes for feature extraction.
- Type: nn.ModuleList
avg_pool
Adaptive average pooling layer to reduce feature dimensions.
- Type: nn.AdaptiveAvgPool2d
D
The dimension of the intermediate representation, determined by the reduction factor.
- Type: int
fc
Fully connected layer for transforming the pooled features.
- Type: nn.Linear
relu
ReLU activation function.
- Type: nn.ReLU
fc
A list of fully connected layers to generate attention weights for each kernel.
- Type: nn.ModuleList
softmax
Softmax layer to normalize the attention weights.
Type: nn.Softmax
Parameters:
- freq (int) – The frequency dimension of the input features.
- channel (int) – The number of channels in the input feature map.
- kernels (list) – List of kernel sizes to be used in the convolutional layers.
- receptive (list) – List of receptive field sizes corresponding to each kernel.
- dilations (list) – List of dilation rates for each convolutional layer.
- reduction (int) – Reduction factor for the dimensionality of the features.
- groups (int) – Number of groups for grouped convolution.
- L (int) – Maximum number of features to keep in the intermediate layer.
Returns: The output feature map after applying selective kernel attention.
Return type: Tensor
####### Examples
>>> model = cwSKAttention(freq=40, channel=128, kernels=[3, 5])
>>> input_tensor = torch.randn(8, 128, 40, 100) # (B, C, F, T)
>>> output = model(input_tensor)
>>> print(output.shape)
torch.Size([8, 128, 40, 100]) # Output shape matches input shape
NOTE
The cwSKAttention module is designed to enhance the representational capacity of convolutional neural networks by allowing the model to learn which features are most important for the task at hand, through adaptive attention mechanisms.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(x)
Forward function.
This method performs a forward pass through the cwSKAttention module. It takes an input tensor and applies a series of convolutional layers, attention mechanisms, and activation functions to compute the output tensor.
- Parameters:x (torch.Tensor) – Input tensor of shape [B, C, F, T], where B is the batch size, C is the number of channels, F is the frequency dimension, and T is the time dimension.
- Returns: Output tensor of shape [B, C, F, T] after applying attention and convolutions.
- Return type: torch.Tensor
####### Examples
>>> model = cwSKAttention()
>>> input_tensor = torch.randn(16, 128, 40, 100) # Batch of 16
>>> output_tensor = model(input_tensor)
>>> output_tensor.shape
torch.Size([16, 128, 40, 100])