espnet2.diar.layers.abs_mask.AbsMask
espnet2.diar.layers.abs_mask.AbsMask
class espnet2.diar.layers.abs_mask.AbsMask(*args, **kwargs)
Bases: Module
, ABC
Abstract base class for defining a masking mechanism in speaker diarization.
This class serves as a blueprint for creating different types of masks that can be applied to the input features in speaker diarization tasks. It inherits from PyTorch’s torch.nn.Module and requires subclasses to implement the necessary methods and properties to specify the masking behavior.
max_num_spk
The maximum number of speakers that the mask can handle. This must be defined in subclasses.
- Type: int
forward(input, ilens, bottleneck_feat, num_spk) → Tuple[Tuple[torch.Tensor],
torch.Tensor, OrderedDict]: Abstract method that must be implemented by subclasses to define the forward pass of the mask.
- Parameters:
- input (torch.Tensor) – The input tensor representing the audio features.
- ilens (torch.Tensor) – A tensor containing the lengths of the input sequences.
- bottleneck_feat (torch.Tensor) – Features extracted from the bottleneck layer.
- num_spk (int) – The number of speakers to consider in the masking process.
- Returns: A tuple containing: : - A tuple of tensors representing the masks for each speaker.
- A tensor representing the combined output.
- An ordered dictionary containing any additional information.
- Return type: Tuple[Tuple[torch.Tensor], torch.Tensor, OrderedDict]
- Raises:NotImplementedError – If the subclass does not implement the max_num_spk property or the forward method.
####### Examples
class MyMask(AbsMask): : @property def max_num_spk(self) -> int: <br/>
return 5 <br/> def forward(self, input, ilens, bottleneck_feat, num_spk): : # Implement the masking logic here pass
my_mask = MyMask() output = my_mask(input_tensor, input_lengths, bottleneck_features, num_speakers)
NOTE
This class should not be instantiated directly. Subclasses must provide concrete implementations of the abstract methods.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
abstract forward(input, ilens, bottleneck_feat, num_spk) → Tuple[Tuple[Tensor], Tensor, OrderedDict]
Gets the maximum number of speakers.
- Returns: The maximum number of speakers that can be processed by this mask implementation.
- Return type: int
abstract property max_num_spk : int
Abstract property that defines the maximum number of speakers supported by the masking model. This property should be implemented in subclasses of the AbsMask class to specify the maximum number of speakers that can be handled.
- Returns: The maximum number of speakers supported by the model.
- Return type: int
- Raises:
- NotImplementedError – If the property is accessed without being overridden
- in a subclass. –
####### Examples
class MyMask(AbsMask): : @property def max_num_spk(self) -> int: <br/>
return 5
my_mask = MyMask() print(my_mask.max_num_spk) # Output: 5