espnet2.spk.encoder.ska_tdnn_encoder.Bottle2neck

About 2 min

espnet2.spk.encoder.ska_tdnn_encoder.Bottle2neck

class espnet2.spk.encoder.ska_tdnn_encoder.Bottle2neck(inplanes, planes, kernel_size=None, kernel_sizes=[5, 7], dilation=None, scale=8, group=1)

Bases: Module

Bottle2neck module for SKA-TDNN architecture.

This module implements a bottleneck layer with selective kernel attention, allowing for adaptive feature extraction through multiple convolutional kernels. It utilizes a squeeze-and-excitation mechanism to enhance the representation power of the network.

Parameters:
- inplanes (int) – Number of input channels.
- planes (int) – Number of output channels.
- kernel_size (int , optional) – Size of the convolution kernel. Defaults to None.
- kernel_sizes (list of int , optional) – List of kernel sizes for the selective kernel convolution. Defaults to [5, 7].
- dilation (int , optional) – Dilation rate for the convolution. Defaults to None.
- scale (int , optional) – Scaling factor for the width of the bottleneck. Defaults to 8.
- group (int , optional) – Number of groups for grouped convolution. Defaults to 1.

conv1

First convolutional layer.

Type: nn.Conv1d

relu

ReLU activation function.

Type: nn.ReLU

bn1

Batch normalization layer.

Type: nn.BatchNorm1d

nums

Number of selective kernel convolutions.

Type: int

skconvs

List of selective kernel convolution modules.

Type: nn.ModuleList

skse

Selective kernel attention module.

Type:SKAttentionModule

conv3

Second convolutional layer.

Type: nn.Conv1d

bn3

Batch normalization layer.

Type: nn.BatchNorm1d

Squeeze-and-excitation module.

Type:SEModule

width

Width of the bottleneck.

Type: int
Returns: Output tensor after applying the bottleneck operation.
Return type: out (Tensor)

Example

>>> model = Bottle2neck(inplanes=64, planes=128)
>>> x = torch.randn(32, 64, 100)  # (batch_size, channels, sequence_length)
>>> output = model(x)
>>> output.shape
torch.Size([32, 128, 100])

NOTE

This module is designed to work within the SKA-TDNN architecture and expects inputs of the shape (batch_size, inplanes, sequence_length).

Raises:ValueError – If kernel_size is provided but not valid.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Computes the forward pass of the Bottle2neck module.

This method processes the input tensor x through a series of convolutional layers, applies a skip connection, and returns the final output tensor. The forward operation consists of several stages including initial convolution, ReLU activation, batch normalization, and a series of attention mechanisms.

Parameters:x (torch.Tensor) – The input tensor of shape (B, C, T) where B is the batch size, C is the number of channels, and T is the sequence length.
Returns: The output tensor of shape (B, planes, T) : after applying the series of transformations.
Return type: torch.Tensor

Example

>>> model = Bottle2neck(inplanes=64, planes=128)
>>> input_tensor = torch.randn(32, 64, 100)  # Batch size of 32
>>> output_tensor = model(input_tensor)
>>> print(output_tensor.shape)
torch.Size([32, 128, 100])

NOTE

This method relies on the internal layers defined in the Bottle2neck class and the proper initialization of those layers in the constructor.