espnet2.enh.layers.beamformer.get_rank1_mwf_vector

About 1 min

espnet2.enh.layers.beamformer.get_rank1_mwf_vector

espnet2.enh.layers.beamformer.get_rank1_mwf_vector(psd_speech, psd_noise, reference_vector: Tensor | int, denoising_weight: float = 1.0, approx_low_rank_psd_speech: bool = False, iterations: int = 3, diagonal_loading: bool = True, diag_eps: float = 1e-07, eps: float = 1e-08)

Return the R1-MWF (Rank-1 Multi-channel Wiener Filter) vector.

The R1-MWF is calculated using the formula:

h = (Npsd^-1 @ Spsd) / (mu + Tr(Npsd^-1 @ Spsd)) @ u

Reference: : [1] Rank-1 constrained multichannel Wiener filter for speech recognition in noisy environments; Z. Wang et al, 2018 https://hal.inria.fr/hal-01634449/document [2] Low-rank approximation based multichannel Wiener filter algorithms for noise reduction with application in cochlear implants; R. Serizel, 2014 https://ieeexplore.ieee.org/document/6730918

Parameters:
- psd_speech (torch.complex64/ComplexTensor) – Speech covariance matrix (…, F, C, C).
- psd_noise (torch.complex64/ComplexTensor) – Noise covariance matrix (…, F, C, C).
- reference_vector (torch.Tensor or int) – Reference vector, either (…, C) or a scalar index.
- denoising_weight (float) – Trade-off parameter between noise reduction and speech distortion. A larger value leads to more noise reduction at the expense of more speech distortion. When denoising_weight = 0, it corresponds to the MVDR beamformer.
- approx_low_rank_psd_speech (bool) – Whether to replace original input psd_speech with its low-rank approximation as in [1].
- iterations (int) – Number of iterations in power method, only used when approx_low_rank_psd_speech = True.
- diagonal_loading (bool) – Whether to add a tiny term to the diagonal of psd_n.
- diag_eps (float) – Regularization factor for diagonal loading.
- eps (float) – Small constant to avoid division by zero.
Returns: Beamforming vector of shape (…, F, C).
Return type: beamform_vector (torch.complex64/ComplexTensor)

Examples

>>> psd_speech = torch.rand(10, 8, 4, 4, dtype=torch.complex64)
>>> psd_noise = torch.rand(10, 8, 4, 4, dtype=torch.complex64)
>>> ref_vec = torch.tensor([1.0, 0.0, 0.0, 0.0])
>>> result = get_rank1_mwf_vector(psd_speech, psd_noise, ref_vec)
>>> print(result.shape)
torch.Size([10, 8, 4])

NOTE

The function utilizes Tikhonov regularization to stabilize the inversion of the noise covariance matrix.