espnet2.spk.encoder.ecapa_tdnn_encoder.EcapaTdnnEncoder
espnet2.spk.encoder.ecapa_tdnn_encoder.EcapaTdnnEncoder
class espnet2.spk.encoder.ecapa_tdnn_encoder.EcapaTdnnEncoder(input_size: int, block: str = 'EcapaBlock', model_scale: int = 8, ndim: int = 1024, output_size: int = 1536, **kwargs)
Bases: AbsEncoder
ECAPA-TDNN Encoder
This class implements the ECAPA-TDNN encoder, which extracts frame-level ECAPA-TDNN embeddings from mel-filterbank energy or MFCC features. It is based on the paper: B Desplanques et al.,
``
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification,’’ in Proc. INTERSPEECH, 2020.
_output_size
The output embedding dimension of the encoder.
- Type: int
conv
Convolutional layer for initial feature extraction.
- Type: nn.Conv1d
relu
Activation function applied after convolution.
- Type: nn.ReLU
bn
Batch normalization layer for feature scaling.
- Type: nn.BatchNorm1d
layer1
First ECAPA block layer.
- Type: block
layer2
Second ECAPA block layer.
- Type: block
layer3
Third ECAPA block layer.
- Type: block
layer4
Final convolutional layer for output embedding.
- Type: nn.Conv1d
mp3
Max pooling layer with a kernel size of 3.
Type: nn.MaxPool1d
Parameters:
- input_size (int) – Input feature dimension.
- block (str) – Type of encoder block class to use (default: “EcapaBlock”).
- model_scale (int) – Scale value of the Res2Net architecture (default: 8).
- ndim (int) – Dimensionality of the hidden representation (default: 1024).
- output_size (int) – Output embedding dimension (default: 1536).
- **kwargs – Additional keyword arguments for further customization.
######### Examples
>>> encoder = EcapaTdnnEncoder(input_size=80)
>>> input_tensor = torch.randn(16, 100, 80) # Batch size 16, 100 frames
>>> output = encoder(input_tensor)
>>> print(output.shape) # Should output: torch.Size([16, 100, 1536])
- Raises:ValueError – If an unsupported block type is provided.
NOTE
The encoder is designed for speaker verification tasks and utilizes a series of convolutional layers followed by ECAPA blocks to enhance the feature extraction process.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(x: Tensor)
Calculate forward propagation through the ECAPA-TDNN encoder.
This method processes the input tensor through several convolutional and layer blocks to produce an output tensor representing the frame-level ECAPA-TDNN embeddings.
- Parameters:x (torch.Tensor) – Input tensor with shape (#batch, L, input_size), where L is the sequence length and input_size is the dimension of the input features.
- Returns: Output tensor with shape (#batch, L, output_size), : where output_size is the dimension of the extracted embeddings.
- Return type: torch.Tensor
######### Examples
>>> encoder = EcapaTdnnEncoder(input_size=80)
>>> input_tensor = torch.randn(32, 100, 80) # Example input
>>> output_tensor = encoder.forward(input_tensor)
>>> print(output_tensor.shape)
torch.Size([32, 100, 1536]) # Example output shape
output_size() → int
ECAPA-TDNN encoder. Extracts frame-level ECAPA-TDNN embeddings from mel-filterbank
energy or MFCC features.
Paper: B Desplanques et al.,
``
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification,’’ in Proc. INTERSPEECH, 2020.
input_size
Input feature dimension.
- Type: int
block
Type of encoder block class to use.
- Type: str
model_scale
Scale value of the Res2Net architecture.
- Type: int
ndim
Dimensionality of the hidden representation.
- Type: int
output_size
Output embedding dimension.
Type: int
Parameters:
- input_size – Input feature dimension.
- block – Type of encoder block class to use.
- model_scale – Scale value of the Res2Net architecture.
- ndim – Dimensionality of the hidden representation.
- output_size – Output embedding dimension.
######### Examples
encoder = EcapaTdnnEncoder(input_size=80, output_size=1536) output = encoder(torch.randn(32, 100, 80)) # Example input tensor
NOTE
Ensure that the input tensor has the shape (#batch, L, input_size).
- Raises:ValueError – If an unsupported block type is provided.