espnet2.layers.label_aggregation.LabelAggregate

About 2 min

espnet2.layers.label_aggregation.LabelAggregate

class espnet2.layers.label_aggregation.LabelAggregate(win_length: int = 512, hop_length: int = 128, center: bool = True)

Bases: Module

LabelAggregate is a PyTorch module that performs label aggregation over input

sequences. It processes the input tensor to produce aggregated labels based on the specified window and hop lengths, facilitating tasks such as speech recognition and other sequence labeling applications.

win_length

The length of the window used for aggregation.

Type: int

hop_length

The hop length used to slide the window across the input.

Type: int

center

If True, pads the input tensor on both sides before aggregation.

Type: bool
Parameters:
- win_length (int , optional) – The length of the window for aggregation. Default is 512.
- hop_length (int , optional) – The hop length for sliding the window. Default is 128.
- center (bool , optional) – Whether to pad the input tensor symmetrically. Default is True.
Returns:
- output (torch.Tensor): The aggregated label tensor of shape (Batch, Frames, Label_dim).
- olens (Optional[torch.Tensor]): The lengths of the output sequences if ilens is provided, otherwise None.
Return type: Tuple[torch.Tensor, Optional[torch.Tensor]]

######### Examples

>>> label_aggregate = LabelAggregate(win_length=256, hop_length=64)
>>> input_tensor = torch.rand(10, 1000, 20)  # (Batch, Nsamples, Label_dim)
>>> ilens = torch.tensor([1000] * 10)  # Lengths for each sequence in the batch
>>> output, olens = label_aggregate(input_tensor, ilens)
>>> print(output.shape)  # Output shape should be (10, Frames, 20)

NOTE

The default behavior of label aggregation is compatible with torch.stft regarding framing and padding.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

extra_repr()

Returns a string representation of the LabelAggregate parameters.

This method provides a formatted string that includes the values of the win_length, hop_length, and center attributes of the LabelAggregate instance, which can be useful for debugging and logging purposes.

win_length

The length of the window used for aggregation.

Type: int

hop_length

The number of samples to hop for each frame.

Type: int

center

Whether to pad the input tensor on both sides.

Type: bool
Returns: A string representation of the LabelAggregate parameters.
Return type: str

######### Examples

>>> label_agg = LabelAggregate(win_length=512, hop_length=128, center=True)
>>> print(label_agg.extra_repr())
win_length=512, hop_length=128, center=True

forward(input: Tensor, ilens: Tensor | None = None) → Tuple[Tensor, Tensor | None]

LabelAggregate forward function.

This method processes the input tensor through a series of steps to perform label aggregation, which is useful in tasks such as speech processing. The forward function takes an input tensor and an optional lengths tensor, returns the aggregated output and the processed lengths.

Parameters:
- input – A tensor of shape (Batch, Nsamples, Label_dim) representing the input data.
- ilens – An optional tensor of shape (Batch) that represents the lengths of each input sequence.
Returns: A tensor of shape (Batch, Frames, Label_dim) containing the : aggregated labels.
Optional[torch.Tensor]: A tensor of processed lengths, shape (Batch) : if ilens is provided, otherwise None.
Return type: output

######### Examples

>>> label_aggregate = LabelAggregate(win_length=512, hop_length=128)
>>> input_tensor = torch.randn(2, 1000, 10)  # Batch size 2, 1000 samples, 10 labels
>>> ilens = torch.tensor([1000, 800])  # Lengths of each sequence
>>> output, olens = label_aggregate(input_tensor, ilens)
>>> print(output.shape)  # Should print: torch.Size([2, Frames, 10])
>>> print(olens)  # Lengths of the output sequences if ilens is provided