espnet2.uasr.segmenter.abs_segmenter.AbsSegmenter

About 2 min

espnet2.uasr.segmenter.abs_segmenter.AbsSegmenter

class espnet2.uasr.segmenter.abs_segmenter.AbsSegmenter(*args, **kwargs)

Bases: Module, ABC

Segmenter definition for UASR (Unsupervised Automatic Speech Recognition) task.

This abstract base class provides the structure for segmenting frame-level outputs from a generator. In practice, consecutive frames may predict the same phoneme, making it too easy for the discriminator. Therefore, this segmenter is designed to merge frames with similar predictions from the generator output.

None

Parameters:None
Returns: None
Yields: None
Raises:
- NotImplementedError – If the abstract methods are not implemented by
- subclasses. –

######### Examples

class MySegmenter(AbsSegmenter): : def pre_segment(self, xs_pad: torch.Tensor, ilens: torch.Tensor) -> torch.Tensor: : # Implementation of pre_segment pass <br/> def logit_segment(self, xs_pad: torch.Tensor, ilens: torch.Tensor) -> torch.Tensor: : # Implementation of logit_segment pass

NOTE

This class is meant to be subclassed. The methods pre_segment and logit_segment must be implemented by any subclass.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

abstract logit_segment(xs_pad: Tensor, ilens: Tensor) → Tensor

Segmenter definition for UASR task.

This class provides an abstract base for segmenting audio frames based on the output of a generator in the UASR (Unsupervised Automatic Speech Recognition) task. The segmenter is designed to merge frames with similar predictions, thereby improving the discriminator’s ability to learn from the generator’s output.

None

Parameters:
- xs_pad (torch.Tensor) – A padded tensor containing the input audio frames.
- ilens (torch.Tensor) – A tensor containing the lengths of the input sequences.
Returns: A tensor containing the segmented audio frames.
Return type: torch.Tensor
Raises:NotImplementedError – If the method is called directly from this abstract class without being overridden in a derived class.

######### Examples

class MySegmenter(AbsSegmenter): : def pre_segment(self, xs_pad, ilens): : # Implementation of pre-segment method pass <br/> def logit_segment(self, xs_pad, ilens): : # Implementation of logit_segment method return segmented_output

segmenter = MySegmenter() output = segmenter.logit_segment(xs_pad, ilens)

NOTE

This class is intended to be subclassed. Implementations of the abstract methods must be provided in derived classes.

abstract pre_segment(xs_pad: Tensor, ilens: Tensor) → Tensor

Segmenter definition for UASR task.

Practically, the output of the generator (in frame-level) may predict the same phoneme for consecutive frames, which makes it too easy for the discriminator. So, the segmenter here is to merge frames with a similar prediction from the generator output.

pre_segment(xs_pad

torch.Tensor, ilens: torch.Tensor) -> torch.Tensor: Abstract method to perform pre-segmentation on the input data.

logit_segment(xs_pad

torch.Tensor, ilens: torch.Tensor) -> torch.Tensor: Abstract method to obtain logits for segmentation from the input data.

None

Parameters:
- xs_pad (torch.Tensor) – A tensor containing padded input sequences.
- ilens (torch.Tensor) – A tensor containing the lengths of the input sequences.
Returns: A tensor containing the segmented output.
Return type: torch.Tensor
Raises:NotImplementedError – If the method is not implemented in a subclass.

######### Examples

To use this class, you must inherit from it and implement the abstract methods:

class MySegmenter(AbsSegmenter): : def pre_segment(self, xs_pad, ilens): : # Implementation of pre-segmentation logic pass <br/> def logit_segment(self, xs_pad, ilens): : # Implementation of segmentation logits logic pass