espnet2.spk.encoder.ecapa_tdnn_encoder.EcapaTdnnEncoder

About 2 min

espnet2.spk.encoder.ecapa_tdnn_encoder.EcapaTdnnEncoder

class espnet2.spk.encoder.ecapa_tdnn_encoder.EcapaTdnnEncoder(input_size: int, block: str = 'EcapaBlock', model_scale: int = 8, ndim: int = 1024, output_size: int = 1536, **kwargs)

Bases: AbsEncoder

ECAPA-TDNN Encoder

This class implements the ECAPA-TDNN encoder, which extracts frame-level ECAPA-TDNN embeddings from mel-filterbank energy or MFCC features. It is based on the paper: B Desplanques et al.,

``

ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification,’’ in Proc. INTERSPEECH, 2020.

_output_size

The output embedding dimension of the encoder.

Type: int

conv

Convolutional layer for initial feature extraction.

Type: nn.Conv1d

relu

Activation function applied after convolution.

Type: nn.ReLU

Batch normalization layer for feature scaling.

Type: nn.BatchNorm1d

layer1

First ECAPA block layer.

Type: block

layer2

Second ECAPA block layer.

Type: block

layer3

Third ECAPA block layer.

Type: block

layer4

Final convolutional layer for output embedding.

Type: nn.Conv1d

mp3

Max pooling layer with a kernel size of 3.

Type: nn.MaxPool1d
Parameters:
- input_size (int) – Input feature dimension.
- block (str) – Type of encoder block class to use (default: “EcapaBlock”).
- model_scale (int) – Scale value of the Res2Net architecture (default: 8).
- ndim (int) – Dimensionality of the hidden representation (default: 1024).
- output_size (int) – Output embedding dimension (default: 1536).
- **kwargs – Additional keyword arguments for further customization.

######### Examples

>>> encoder = EcapaTdnnEncoder(input_size=80)
>>> input_tensor = torch.randn(16, 100, 80)  # Batch size 16, 100 frames
>>> output = encoder(input_tensor)
>>> print(output.shape)  # Should output: torch.Size([16, 100, 1536])

Raises:ValueError – If an unsupported block type is provided.

NOTE

The encoder is designed for speaker verification tasks and utilizes a series of convolutional layers followed by ECAPA blocks to enhance the feature extraction process.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor)

Calculate forward propagation through the ECAPA-TDNN encoder.

This method processes the input tensor through several convolutional and layer blocks to produce an output tensor representing the frame-level ECAPA-TDNN embeddings.

Parameters:x (torch.Tensor) – Input tensor with shape (#batch, L, input_size), where L is the sequence length and input_size is the dimension of the input features.
Returns: Output tensor with shape (#batch, L, output_size), : where output_size is the dimension of the extracted embeddings.
Return type: torch.Tensor

######### Examples

>>> encoder = EcapaTdnnEncoder(input_size=80)
>>> input_tensor = torch.randn(32, 100, 80)  # Example input
>>> output_tensor = encoder.forward(input_tensor)
>>> print(output_tensor.shape)
torch.Size([32, 100, 1536])  # Example output shape

output_size() → int

ECAPA-TDNN encoder. Extracts frame-level ECAPA-TDNN embeddings from mel-filterbank

energy or MFCC features.

Paper: B Desplanques et al.,

``

ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification,’’ in Proc. INTERSPEECH, 2020.

input_size

Input feature dimension.

Type: int

block

Type of encoder block class to use.

Type: str

model_scale

Scale value of the Res2Net architecture.

Type: int

ndim

Dimensionality of the hidden representation.

Type: int

output_size

Output embedding dimension.

Type: int
Parameters:
- input_size – Input feature dimension.
- block – Type of encoder block class to use.
- model_scale – Scale value of the Res2Net architecture.
- ndim – Dimensionality of the hidden representation.
- output_size – Output embedding dimension.

######### Examples

encoder = EcapaTdnnEncoder(input_size=80, output_size=1536) output = encoder(torch.randn(32, 100, 80)) # Example input tensor

NOTE

Ensure that the input tensor has the shape (#batch, L, input_size).

Raises:ValueError – If an unsupported block type is provided.