espnet2.spk.layers.ecapa_block.EcapaBlock

About 2 min

espnet2.spk.layers.ecapa_block.EcapaBlock

class espnet2.spk.layers.ecapa_block.EcapaBlock(inplanes, planes, kernel_size=None, dilation=None, scale=8)

Bases: Module

Basic blocks for ECAPA-TDNN.

Code from https://github.com/TaoRuijie/ECAPA-TDNN/blob/main/model.py

The EcapaBlock class implements a building block for the ECAPA-TDNN model, which is used in speaker recognition tasks. It incorporates multiple convolutional layers, batch normalization, and a squeeze-and-excitation module to enhance feature extraction.

conv1

The first 1D convolution layer.

Type: nn.Conv1d

bn1

Batch normalization layer following the first conv layer.

Type: nn.BatchNorm1d

nums

Number of convolutional branches (scale - 1).

Type: int

convs

List of convolutional layers for each branch.

Type: ModuleList

bns

List of batch normalization layers for each branch.

Type: ModuleList

conv3

The final 1D convolution layer.

Type: nn.Conv1d

bn3

Batch normalization layer following the final conv layer.

Type: nn.BatchNorm1d

relu

ReLU activation function.

Type: ReLU

width

Width of each convolutional branch.

Type: int

Squeeze-and-excitation module for channel-wise attention.

Type:SEModule
Parameters:
- inplanes (int) – Number of input channels.
- planes (int) – Number of output channels.
- kernel_size (int , optional) – Size of the convolutional kernel. Defaults to None.
- dilation (int , optional) – Dilation rate for convolution. Defaults to None.
- scale (int , optional) – Scale factor for width. Defaults to 8.
Returns: The output tensor after applying the ECAPA block.
Return type: Tensor

####### Examples

>>> ecapa_block = EcapaBlock(inplanes=64, planes=128, kernel_size=3)
>>> input_tensor = torch.randn(1, 64, 100)  # Batch size of 1, 64 channels, 100 length
>>> output_tensor = ecapa_block(input_tensor)
>>> output_tensor.shape
torch.Size([1, 128, 100])  # Output should have 128 channels

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Forward pass of the ECAPA block.

This method processes the input tensor through a series of convolutional layers, batch normalization, and activation functions. It also includes a squeeze-and-excitation module to enhance the feature representation. The output is the result of adding the residual connection to the processed features.

Parameters:x (torch.Tensor) – The input tensor of shape (batch_size, in_channels, sequence_length).
Returns: The output tensor after applying the ECAPA block, with : shape (batch_size, out_channels, sequence_length).
Return type: torch.Tensor

####### Examples

>>> ecapa_block = EcapaBlock(inplanes=64, planes=128, kernel_size=3,
                              dilation=2)
>>> input_tensor = torch.randn(8, 64, 100)  # batch_size=8, channels=64, seq_len=100
>>> output_tensor = ecapa_block(input_tensor)
>>> output_tensor.shape
torch.Size([8, 128, 100])