espnet2.asr.specaug.abs_specaug.AbsSpecAug

About 1 min

espnet2.asr.specaug.abs_specaug.AbsSpecAug

class espnet2.asr.specaug.abs_specaug.AbsSpecAug(*args, **kwargs)

Bases: Module

Abstract base class for spectrogram augmentation in speech processing.

This class serves as a blueprint for implementing various spectrogram augmentation techniques. The augmentation process is typically part of a speech recognition pipeline that includes frontend processing, spectrogram augmentation, normalization, encoding, and decoding.

None

Parameters:None
Returns: A tuple containing the augmented spectrogram tensor and optionally the lengths of the input sequences.
Return type: Tuple[torch.Tensor, Optional[torch.Tensor]]
Yields: None
Raises:
- NotImplementedError – If the forward method is not implemented by
- the subclass. –

####### Examples

To implement a specific spectrogram augmentation, subclass AbsSpecAug and define the forward method:

``

python class MySpecAug(AbsSpecAug):

def forward(self, x, x_lengths=None): : # Implement the augmentation logic here return augmented_x, x_lengths

``

NOTE

This class is intended to be subclassed, and the forward method must be overridden to provide specific augmentation behavior.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor, x_lengths: Tensor | None = None) → Tuple[Tensor, Tensor | None]

Performs the forward pass of the spectrogram augmentation.

This method takes an input tensor representing the spectrogram and optionally its lengths. It processes the input through the augmentation pipeline, returning the augmented spectrogram and the updated lengths.

Parameters:
- x (torch.Tensor) – A tensor of shape (batch_size, num_channels, time_steps) representing the input spectrogram.
- x_lengths (torch.Tensor , optional) – A tensor of shape (batch_size,) containing the lengths of each input in the batch. If None, lengths are assumed to be the maximum length of the inputs.
Returns: A tuple where the first element is the augmented spectrogram tensor of shape (batch_size, num_channels, time_steps) and the second element is the updated lengths tensor, or None if lengths were not provided.
Return type: Tuple[torch.Tensor, Optional[torch.Tensor]]
Raises:
- NotImplementedError – This method should be implemented in a
- subclass of AbsSpecAug. –

####### Examples

>>> model = MySpecAug()  # MySpecAug is a subclass of AbsSpecAug
>>> input_tensor = torch.randn(2, 1, 100)  # Batch of 2, 1 channel, 100 time steps
>>> lengths = torch.tensor([100, 90])  # Lengths of the inputs
>>> output, updated_lengths = model.forward(input_tensor, lengths)
>>> print(output.shape)  # Should print: torch.Size([2, 1, 100])
>>> print(updated_lengths)  # Should print: tensor([100, 90]) or modified lengths

NOTE

This method is intended to be overridden in subclasses to provide specific augmentation logic.