espnet2.diar.layers.abs_mask.AbsMask

About 1 min

espnet2.diar.layers.abs_mask.AbsMask

class espnet2.diar.layers.abs_mask.AbsMask(*args, **kwargs)

Bases: Module, ABC

Abstract base class for defining a masking mechanism in speaker diarization.

This class serves as a blueprint for creating different types of masks that can be applied to the input features in speaker diarization tasks. It inherits from PyTorch’s torch.nn.Module and requires subclasses to implement the necessary methods and properties to specify the masking behavior.

max_num_spk

The maximum number of speakers that the mask can handle. This must be defined in subclasses.

Type: int

forward(input, ilens, bottleneck_feat, num_spk) → Tuple[Tuple[torch.Tensor],

torch.Tensor, OrderedDict]: Abstract method that must be implemented by subclasses to define the forward pass of the mask.

Parameters:
- input (torch.Tensor) – The input tensor representing the audio features.
- ilens (torch.Tensor) – A tensor containing the lengths of the input sequences.
- bottleneck_feat (torch.Tensor) – Features extracted from the bottleneck layer.
- num_spk (int) – The number of speakers to consider in the masking process.
Returns: A tuple containing: : - A tuple of tensors representing the masks for each speaker.
- A tensor representing the combined output.
- An ordered dictionary containing any additional information.
Return type: Tuple[Tuple[torch.Tensor], torch.Tensor, OrderedDict]
Raises:NotImplementedError – If the subclass does not implement the max_num_spk property or the forward method.

####### Examples

class MyMask(AbsMask): : @property def max_num_spk(self) -> int: <br/>

return 5 <br/> def forward(self, input, ilens, bottleneck_feat, num_spk): : # Implement the masking logic here pass

my_mask = MyMask() output = my_mask(input_tensor, input_lengths, bottleneck_features, num_speakers)

NOTE

This class should not be instantiated directly. Subclasses must provide concrete implementations of the abstract methods.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(input, ilens, bottleneck_feat, num_spk) → Tuple[Tuple[Tensor], Tensor, OrderedDict]

Gets the maximum number of speakers.

Returns: The maximum number of speakers that can be processed by this mask implementation.
Return type: int

abstract property max_num_spk : int

Abstract property that defines the maximum number of speakers supported by the masking model. This property should be implemented in subclasses of the AbsMask class to specify the maximum number of speakers that can be handled.

Returns: The maximum number of speakers supported by the model.
Return type: int
Raises:
- NotImplementedError – If the property is accessed without being overridden
- in a subclass. –

####### Examples

class MyMask(AbsMask): : @property def max_num_spk(self) -> int: <br/>

return 5

my_mask = MyMask() print(my_mask.max_num_spk) # Output: 5