espnet2.diar.layers.multi_mask.MultiMask
espnet2.diar.layers.multi_mask.MultiMask
class espnet2.diar.layers.multi_mask.MultiMask(input_dim: int, bottleneck_dim: int = 128, max_num_spk: int = 3, mask_nonlinear='relu')
Bases: AbsMask
Multiple 1x1 convolution layer Module.
This module corresponds to the final 1x1 conv block and non-linear function in TCNSeparator. This module has multiple 1x1 conv blocks. One of them is selected according to the given num_spk to handle flexible num_spk.
- Parameters:
- input_dim β Number of filters in autoencoder
- bottleneck_dim β Number of channels in bottleneck 1 * 1-conv block
- max_num_spk β Number of mask_conv1x1 modules (>= Max number of speakers in the dataset)
- mask_nonlinear β use which non-linear function to generate mask
forward(input: Tensor | ComplexTensor, ilens: Tensor, bottleneck_feat: Tensor, num_spk: int) β Tuple[List[Tensor | ComplexTensor], Tensor, OrderedDict]
Keep this API same with TasNet.
Parameters:
- input β [M, K, N], M is batch size
- ilens (torch.Tensor) β (M,)
- bottleneck_feat β [M, K, B]
- num_spk β number of speakers
- **(**Training β oracle,
- Inference β estimated by other module (e.g, EEND-EDA))
Returns: [(M, K, N), β¦] ilens (torch.Tensor): (M,) others predicted data, e.g. masks: OrderedDict[
βmask_spk1β: torch.Tensor(Batch, Frames, Freq), βmask_spk2β: torch.Tensor(Batch, Frames, Freq), β¦ βmask_spknβ: torch.Tensor(Batch, Frames, Freq),
]
Return type: masked (List[Union(torch.Tensor, ComplexTensor)])
property max_num_spk : int
