espnet2.spk.encoder.identity_encoder.IdentityEncoder

About 1 min

espnet2.spk.encoder.identity_encoder.IdentityEncoder

class espnet2.spk.encoder.identity_encoder.IdentityEncoder(input_size: int)

Bases: AbsEncoder

Identity encoder. Does nothing, just passes frontend feature to the pooling.

This encoder is expected to be used for cases when the frontend already has a good representation, such as self-supervised learning (SSL) features. It simply forwards the input tensor without any modifications.

_output_size

The output feature dimension, which is the same as the input feature dimension.

Type: int
Parameters:input_size (int) – Input feature dimension.
Returns: The input tensor transposed along the specified dimensions.
Return type: torch.Tensor

######### Examples

>>> encoder = IdentityEncoder(input_size=128)
>>> input_tensor = torch.randn(32, 128, 10)  # (batch_size, input_size, time)
>>> output_tensor = encoder.forward(input_tensor)
>>> output_tensor.shape
torch.Size([32, 10, 128])  # Output shape after transposition

NOTE

The forward method transposes the input tensor from shape (batch_size, input_size, time) to (batch_size, time, input_size).

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor)

Passes the input tensor through without modification, transposing its

dimensions.

This method is primarily intended for use in scenarios where the input features are already adequately represented and require no further processing. It simply transposes the input tensor from shape (batch_size, input_size, seq_length) to (batch_size, seq_length, input_size).

Parameters:x (torch.Tensor) – The input tensor to be processed. It should have shape (batch_size, input_size, seq_length).
Returns: The transposed tensor with shape : (batch_size, seq_length, input_size).
Return type: torch.Tensor

######### Examples

>>> encoder = IdentityEncoder(input_size=128)
>>> input_tensor = torch.randn(32, 128, 50)  # Example input
>>> output_tensor = encoder.forward(input_tensor)
>>> output_tensor.shape
torch.Size([32, 50, 128])

output_size() → int

Returns the output size of the encoder, which is equal to the input size.

This property provides the dimension of the features that the encoder outputs. It is particularly useful for ensuring compatibility with subsequent layers in a neural network.

Returns: The size of the output features.
Return type: int

######### Examples

encoder = IdentityEncoder(input_size=128) size = encoder.output_size() # size will be 128