espnet2.asvspoof.decoder.linear_decoder.LinearDecoder

About 1 min

espnet2.asvspoof.decoder.linear_decoder.LinearDecoder

class espnet2.asvspoof.decoder.linear_decoder.LinearDecoder(encoder_output_size: int)

Bases: AbsDecoder

Linear decoder for speaker diarization.

This class implements a linear decoder used in the context of speaker diarization. It is responsible for transforming the encoder’s output into a suitable representation for further processing or classification.

encoder_output_size

The size of the output from the encoder.

Type: int
Parameters:encoder_output_size (int) – Size of the encoder’s output dimension.

forward(input

torch.Tensor, ilens: Optional[torch.Tensor]) -> Optional[torch.Tensor]: Processes the input tensor and applies a linear projection.

####### Examples

>>> decoder = LinearDecoder(encoder_output_size=256)
>>> input_tensor = torch.randn(10, 100, 256)  # Batch of 10
>>> input_lengths = torch.tensor([100] * 10)  # All sequences are of length 100
>>> output = decoder.forward(input_tensor, input_lengths)
>>> print(output.shape)  # Shape depends on the implementation details

NOTE

This class currently contains placeholder TODOs for implementation. The forward method is expected to compute the mean over the time dimension and apply a linear projection layer.

Raises:ValueError – If the input tensor has an incorrect shape.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(input

: Tensor, ilens: Tensor | None)

Perform the forward pass of the LinearDecoder.

This method takes the encoder output and computes the linear projection for speaker diarization. It processes the input tensor and utilizes the specified input lengths to ensure proper handling of variable-length sequences.

Parameters:
- input (torch.Tensor) – A tensor representing the hidden space with shape [Batch, T, F], where Batch is the number of samples, T is the sequence length, and F is the feature dimension.
- ilens (Optional *[*torch.Tensor ]) – A tensor containing the lengths of the input sequences with shape [Batch]. This is used to handle variable-length inputs properly.
Returns: The output of the linear projection layer, which will have shape [Batch, F_out], where F_out is the size of the output features after applying the linear projection.
Return type: torch.Tensor
Raises:
- ValueError – If the input tensor shape does not match the expected
- dimensions or if ilens is not compatible with input. –

####### Examples

>>> decoder = LinearDecoder(encoder_output_size=128)
>>> input_tensor = torch.randn(32, 10, 128)  # Batch of 32, T=10, F=128
>>> ilens = torch.tensor([10] * 32)  # All sequences are of length 10
>>> output = decoder.forward(input_tensor, ilens)
>>> print(output.shape)  # Should print: torch.Size([32, F_out])

NOTE

The actual implementation of the forward pass is yet to be completed. This includes computing the mean over the time dimension and applying the projection layer.