espnet2.asvspoof.decoder.linear_decoder.LinearDecoder
espnet2.asvspoof.decoder.linear_decoder.LinearDecoder
class espnet2.asvspoof.decoder.linear_decoder.LinearDecoder(encoder_output_size: int)
Bases: AbsDecoder
Linear decoder for speaker diarization.
This class implements a linear decoder used in the context of speaker diarization. It is responsible for transforming the encoder’s output into a suitable representation for further processing or classification.
encoder_output_size
The size of the output from the encoder.
Type: int
Parameters:encoder_output_size (int) – Size of the encoder’s output dimension.
forward(input
torch.Tensor, ilens: Optional[torch.Tensor]) -> Optional[torch.Tensor]: Processes the input tensor and applies a linear projection.
####### Examples
>>> decoder = LinearDecoder(encoder_output_size=256)
>>> input_tensor = torch.randn(10, 100, 256) # Batch of 10
>>> input_lengths = torch.tensor([100] * 10) # All sequences are of length 100
>>> output = decoder.forward(input_tensor, input_lengths)
>>> print(output.shape) # Shape depends on the implementation details
NOTE
This class currently contains placeholder TODOs for implementation. The forward method is expected to compute the mean over the time dimension and apply a linear projection layer.
- Raises:ValueError – If the input tensor has an incorrect shape.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
#
forward(input
Perform the forward pass of the LinearDecoder.
This method takes the encoder output and computes the linear projection for speaker diarization. It processes the input tensor and utilizes the specified input lengths to ensure proper handling of variable-length sequences.
- Parameters:
- input (torch.Tensor) – A tensor representing the hidden space with shape [Batch, T, F], where Batch is the number of samples, T is the sequence length, and F is the feature dimension.
- ilens (Optional *[*torch.Tensor ]) – A tensor containing the lengths of the input sequences with shape [Batch]. This is used to handle variable-length inputs properly.
- Returns: The output of the linear projection layer, which will have shape [Batch, F_out], where F_out is the size of the output features after applying the linear projection.
- Return type: torch.Tensor
- Raises:
- ValueError – If the input tensor shape does not match the expected
- dimensions or if ilens is not compatible with input. –
####### Examples
>>> decoder = LinearDecoder(encoder_output_size=128)
>>> input_tensor = torch.randn(32, 10, 128) # Batch of 32, T=10, F=128
>>> ilens = torch.tensor([10] * 32) # All sequences are of length 10
>>> output = decoder.forward(input_tensor, ilens)
>>> print(output.shape) # Should print: torch.Size([32, F_out])
NOTE
The actual implementation of the forward pass is yet to be completed. This includes computing the mean over the time dimension and applying the projection layer.