espnet2.spk.projector.ska_tdnn_projector.SkaTdnnProjector
espnet2.spk.projector.ska_tdnn_projector.SkaTdnnProjector
class espnet2.spk.projector.ska_tdnn_projector.SkaTdnnProjector(input_size, output_size)
Bases: AbsProjector
SkaTdnnProjector is a speaker embedding projector that utilizes a Time-Delay
Neural Network (TDNN) architecture. This projector applies batch normalization and a fully connected layer to transform input feature vectors into a specified output size, making it suitable for tasks such as speaker recognition or verification.
bn
Batch normalization layer for input features.
- Type: torch.nn.BatchNorm1d
fc
Fully connected layer for transforming input to output features.
- Type: torch.nn.Linear
bn
Batch normalization layer for output features.
- Type: torch.nn.BatchNorm1d
_output_size
The size of the output feature vector.
Type: int
Parameters:
- input_size (int) – The size of the input feature vector.
- output_size (int) – The size of the output feature vector.
Returns: The transformed output tensor after applying the : batch normalization and linear transformation.
Return type: torch.Tensor
######### Examples
>>> projector = SkaTdnnProjector(input_size=128, output_size=64)
>>> input_tensor = torch.randn(10, 128) # Batch of 10 input vectors
>>> output_tensor = projector.forward(input_tensor)
>>> print(output_tensor.shape) # Should print: torch.Size([10, 64])
NOTE
The input tensor should have the shape (batch_size, input_size).
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(x)
Computes the forward pass of the SkaTdnnProjector.
This method applies batch normalization to the input tensor, followed by a linear transformation and another batch normalization. The transformations are defined by the layers initialized in the constructor.
- Parameters:x (torch.Tensor) – Input tensor of shape (batch_size, input_size).
- Returns: Output tensor of shape (batch_size, output_size) after applying the transformations.
- Return type: torch.Tensor
######### Examples
>>> projector = SkaTdnnProjector(input_size=128, output_size=64)
>>> input_tensor = torch.randn(32, 128) # Batch of 32 samples
>>> output_tensor = projector.forward(input_tensor)
>>> print(output_tensor.shape) # Should output: torch.Size([32, 64])
output_size()
Returns the output size of the projector.
This property retrieves the output size that was set during the initialization of the SkaTdnnProjector instance. The output size is essential for defining the shape of the output tensor after the forward pass through the projector.
- Returns: The output size of the projector.
- Return type: int
######### Examples
projector = SkaTdnnProjector(input_size=128, output_size=64) assert projector.output_size() == 64
NOTE
The output size is determined by the linear layer defined in the constructor of the SkaTdnnProjector class.