espnet2.spk.layers.ecapa_block.EcapaBlock
espnet2.spk.layers.ecapa_block.EcapaBlock
class espnet2.spk.layers.ecapa_block.EcapaBlock(inplanes, planes, kernel_size=None, dilation=None, scale=8)
Bases: Module
Basic blocks for ECAPA-TDNN.
Code from https://github.com/TaoRuijie/ECAPA-TDNN/blob/main/model.py
The EcapaBlock class implements a building block for the ECAPA-TDNN model, which is used in speaker recognition tasks. It incorporates multiple convolutional layers, batch normalization, and a squeeze-and-excitation module to enhance feature extraction.
conv1
The first 1D convolution layer.
- Type: nn.Conv1d
bn1
Batch normalization layer following the first conv layer.
- Type: nn.BatchNorm1d
nums
Number of convolutional branches (scale - 1).
- Type: int
convs
List of convolutional layers for each branch.
- Type: ModuleList
bns
List of batch normalization layers for each branch.
- Type: ModuleList
conv3
The final 1D convolution layer.
- Type: nn.Conv1d
bn3
Batch normalization layer following the final conv layer.
- Type: nn.BatchNorm1d
relu
ReLU activation function.
- Type: ReLU
width
Width of each convolutional branch.
- Type: int
se
Squeeze-and-excitation module for channel-wise attention.
Type:SEModule
Parameters:
- inplanes (int) – Number of input channels.
- planes (int) – Number of output channels.
- kernel_size (int , optional) – Size of the convolutional kernel. Defaults to None.
- dilation (int , optional) – Dilation rate for convolution. Defaults to None.
- scale (int , optional) – Scale factor for width. Defaults to 8.
Returns: The output tensor after applying the ECAPA block.
Return type: Tensor
####### Examples
>>> ecapa_block = EcapaBlock(inplanes=64, planes=128, kernel_size=3)
>>> input_tensor = torch.randn(1, 64, 100) # Batch size of 1, 64 channels, 100 length
>>> output_tensor = ecapa_block(input_tensor)
>>> output_tensor.shape
torch.Size([1, 128, 100]) # Output should have 128 channels
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(x)
Forward pass of the ECAPA block.
This method processes the input tensor through a series of convolutional layers, batch normalization, and activation functions. It also includes a squeeze-and-excitation module to enhance the feature representation. The output is the result of adding the residual connection to the processed features.
- Parameters:x (torch.Tensor) – The input tensor of shape (batch_size, in_channels, sequence_length).
- Returns: The output tensor after applying the ECAPA block, with : shape (batch_size, out_channels, sequence_length).
- Return type: torch.Tensor
####### Examples
>>> ecapa_block = EcapaBlock(inplanes=64, planes=128, kernel_size=3,
dilation=2)
>>> input_tensor = torch.randn(8, 64, 100) # batch_size=8, channels=64, seq_len=100
>>> output_tensor = ecapa_block(input_tensor)
>>> output_tensor.shape
torch.Size([8, 128, 100])