espnet2.layers.mask_along_axis.MaskAlongAxisVariableMaxWidth

About 2 min

espnet2.layers.mask_along_axis.MaskAlongAxisVariableMaxWidth

class espnet2.layers.mask_along_axis.MaskAlongAxisVariableMaxWidth(mask_width_ratio_range: float | Sequence[float] = (0.0, 0.05), num_mask: int = 2, dim: int | str = 'time', replace_with_zero: bool = True)

Bases: Module

Mask input spec along a specified axis with variable maximum width.

This module applies a masking operation to the input tensor along a specified axis. The maximum width of the mask is determined by the ratio of the input sequence length. The mask is applied randomly within the range defined by the mask width ratio.

Formula: : max_width = max_width_ratio * seq_len

mask_width_ratio_range

A tuple of floats defining the minimum and maximum ratios for the mask width relative to the sequence length. The default is (0.0, 0.05).

num_mask

An integer defining the number of masks to apply. The default is 2.

dim

An integer or string defining the axis along which to apply the mask. Accepts 1 for time or 2 for frequency. The default is “time”.

replace_with_zero

A boolean indicating whether to replace masked values with zero or the mean of the tensor. The default is True.

Parameters:
- mask_width_ratio_range – (Union[float, Sequence[float]]): Range of mask width ratios. Default is (0.0, 0.05).
- num_mask – (int): Number of masks to apply. Default is 2.
- dim – (Union[int, str]): Axis to apply mask. Can be ‘time’ (1) or ‘freq’ (2). Default is ‘time’.
- replace_with_zero – (bool): If True, replace masked values with zero. Default is True.
Raises:
- TypeError – If mask_width_ratio_range is not a tuple of floats.
- ValueError – If dim is not an int, ‘time’, or ‘freq’.

######### Examples

>>> mask_layer = MaskAlongAxisVariableMaxWidth(num_mask=3, dim=2)
>>> spec = torch.randn(5, 100, 80)  # Batch of 5, Length 100, Freq 80
>>> masked_spec, lengths = mask_layer(spec)

NOTE

The input tensor must have at least 3 dimensions.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

extra_repr()

Returns a string representation of the MaskAlongAxisVariableMaxWidth

instance’s parameters.

This method provides a concise summary of the key attributes of the instance, which include the range of mask widths, the number of masks, and the axis along which the masking occurs.

mask_width_ratio_range

The range of the mask width ratio used to determine the maximum width of the mask relative to the input sequence length.

num_mask

The number of masks to apply to the input tensor.

mask_axis

The axis along which the masking will be applied, represented as either “time” or “freq”.

Returns: A string summarizing the instance’s parameters.

######### Examples

>>> mask_layer = MaskAlongAxisVariableMaxWidth(mask_width_ratio_range=(0.1, 0.2),
...                                              num_mask=3,
...                                              dim='time',
...                                              replace_with_zero=False)
>>> print(mask_layer.extra_repr())
mask_width_ratio_range=(0.1, 0.2), num_mask=3, axis=time

forward(spec: Tensor, spec_lengths: Tensor | None = None)

Forward function.

Applies masking to the input tensor along the specified axis with variable maximum width determined by the mask width ratio range.

Parameters:
- spec – A tensor of shape (Batch, Length, Freq) representing the input data to be masked.
- spec_lengths – Optional tensor representing the lengths of the sequences in the batch. Default is None.
Returns:
- The masked tensor of the same shape as spec.
- The original spec_lengths tensor.
Return type: A tuple containing

######### Examples

>>> mask = MaskAlongAxisVariableMaxWidth(mask_width_ratio_range=(0.0, 0.1))
>>> input_spec = torch.randn(4, 100, 80)  # Batch of 4, Length 100, Freq 80
>>> masked_spec, lengths = mask(input_spec)

NOTE

The masking is applied only if the maximum mask width is greater than the minimum mask width.

Raises:
- ValueError – If dim is not an integer or not one of “time” or
- "freq". –