espnet2.enh.separator.dccrn_separator.DCCRNSeparator
espnet2.enh.separator.dccrn_separator.DCCRNSeparator
class espnet2.enh.separator.dccrn_separator.DCCRNSeparator(input_dim: int, num_spk: int = 1, rnn_layer: int = 2, rnn_units: int = 256, masking_mode: str = 'E', use_clstm: bool = True, bidirectional: bool = False, use_cbn: bool = False, kernel_size: int = 5, kernel_num: List[int] = [32, 64, 128, 256, 256, 256], use_builtin_complex: bool = True, use_noise_mask: bool = False)
Bases: AbsSeparator
DCCRN separator.
- Parameters:
- input_dim (int) β input dimensionγ
- num_spk (int , optional) β number of speakers. Defaults to 1.
- rnn_layer (int , optional) β number of lstm layers in the crn. Defaults to 2.
- rnn_units (int , optional) β rnn units. Defaults to 128.
- masking_mode (str , optional) β usage of the estimated mask. Defaults to βEβ.
- use_clstm (bool , optional) β whether use complex LSTM. Defaults to False.
- bidirectional (bool , optional) β whether use BLSTM. Defaults to False.
- use_cbn (bool , optional) β whether use complex BN. Defaults to False.
- kernel_size (int , optional) β convolution kernel size. Defaults to 5.
- kernel_num (list , optional) β output dimension of each layer of the encoder.
- use_builtin_complex (bool , optional) β torch.complex if True, else ComplexTensor.
- use_noise_mask (bool , optional) β whether to estimate the mask of noise.
apply_masks(masks: List[Tensor | ComplexTensor], real: Tensor, imag: Tensor)
apply masks
- Parameters:
- masks β est_masks, [(B, T, F), β¦]
- real (torch.Tensor) β real part of the noisy spectrum, (B, F, T)
- imag (torch.Tensor) β imag part of the noisy spectrum, (B, F, T)
- Returns: [(B, T, F), β¦]
- Return type: masked (List[Union(torch.Tensor, ComplexTensor)])
create_masks(mask_tensor: Tensor)
create estimated mask for each speaker
- Parameters:mask_tensor (torch.Tensor) β output of decoder, shape(B, 2*num_spk, F-1, T)
flatten_parameters()
forward(input: Tensor | ComplexTensor, ilens: Tensor, additional: Dict | None = None) β Tuple[List[Tensor | ComplexTensor], Tensor, OrderedDict]
Forward.
Parameters:
- input (torch.Tensor or ComplexTensor) β Encoded feature [B, T, F]
- ilens (torch.Tensor) β input lengths [Batch]
- additional (Dict or None) β other data included in model NOTE: not used in this model
Returns: [(B, T, F), β¦] ilens (torch.Tensor): (B,) others predicted data, e.g. masks: OrderedDict[
βmask_spk1β: torch.Tensor(Batch, Frames, Freq), βmask_spk2β: torch.Tensor(Batch, Frames, Freq), β¦ βmask_spknβ: torch.Tensor(Batch, Frames, Freq),
]
Return type: masked (List[Union(torch.Tensor, ComplexTensor)])
property num_spk
