espnet2.spk.projector.ska_tdnn_projector.SkaTdnnProjector

About 1 min

espnet2.spk.projector.ska_tdnn_projector.SkaTdnnProjector

class espnet2.spk.projector.ska_tdnn_projector.SkaTdnnProjector(input_size, output_size)

Bases: AbsProjector

SkaTdnnProjector is a speaker embedding projector that utilizes a Time-Delay

Neural Network (TDNN) architecture. This projector applies batch normalization and a fully connected layer to transform input feature vectors into a specified output size, making it suitable for tasks such as speaker recognition or verification.

Batch normalization layer for input features.

Type: torch.nn.BatchNorm1d

Fully connected layer for transforming input to output features.

Type: torch.nn.Linear

Batch normalization layer for output features.

Type: torch.nn.BatchNorm1d

_output_size

The size of the output feature vector.

Type: int
Parameters:
- input_size (int) – The size of the input feature vector.
- output_size (int) – The size of the output feature vector.
Returns: The transformed output tensor after applying the : batch normalization and linear transformation.
Return type: torch.Tensor

######### Examples

>>> projector = SkaTdnnProjector(input_size=128, output_size=64)
>>> input_tensor = torch.randn(10, 128)  # Batch of 10 input vectors
>>> output_tensor = projector.forward(input_tensor)
>>> print(output_tensor.shape)  # Should print: torch.Size([10, 64])

NOTE

The input tensor should have the shape (batch_size, input_size).

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Computes the forward pass of the SkaTdnnProjector.

This method applies batch normalization to the input tensor, followed by a linear transformation and another batch normalization. The transformations are defined by the layers initialized in the constructor.

Parameters:x (torch.Tensor) – Input tensor of shape (batch_size, input_size).
Returns: Output tensor of shape (batch_size, output_size) after applying the transformations.
Return type: torch.Tensor

######### Examples

>>> projector = SkaTdnnProjector(input_size=128, output_size=64)
>>> input_tensor = torch.randn(32, 128)  # Batch of 32 samples
>>> output_tensor = projector.forward(input_tensor)
>>> print(output_tensor.shape)  # Should output: torch.Size([32, 64])

output_size()

Returns the output size of the projector.

This property retrieves the output size that was set during the initialization of the SkaTdnnProjector instance. The output size is essential for defining the shape of the output tensor after the forward pass through the projector.

Returns: The output size of the projector.
Return type: int

######### Examples

projector = SkaTdnnProjector(input_size=128, output_size=64) assert projector.output_size() == 64

NOTE

The output size is determined by the linear layer defined in the constructor of the SkaTdnnProjector class.