espnet2.enh.layers.beamformer.get_mvdr_vector_with_rtf

Less than 1 minute

espnet2.enh.layers.beamformer.get_mvdr_vector_with_rtf

espnet2.enh.layers.beamformer.get_mvdr_vector_with_rtf(psd_n: Tensor | ComplexTensor, psd_speech: Tensor | ComplexTensor, psd_noise: Tensor | ComplexTensor, iterations: int = 3, reference_vector: int | Tensor | None = None, diagonal_loading: bool = True, diag_eps: float = 1e-07, eps: float = 1e-08) → Tensor | ComplexTensor

Return the MVDR (Minimum Variance Distortionless Response) vector calculated with RTF:

h = (Npsd^-1 @ rtf) / (rtf^H @ Npsd^-1 @ rtf)

Reference: : On optimal frequency-domain multichannel linear filtering for noise reduction; M. Souden et al., 2010; https://ieeexplore.ieee.org/document/5089420

Parameters:
- psd_n (torch.complex64/ComplexTensor) – observation/noise covariance matrix (…, F, C, C)
- psd_speech (torch.complex64/ComplexTensor) – speech covariance matrix (…, F, C, C)
- psd_noise (torch.complex64/ComplexTensor) – noise covariance matrix (…, F, C, C)
- iterations (int) – number of iterations in power method
- reference_vector (torch.Tensor or int) – (…, C) or scalar
- diagonal_loading (bool) – Whether to add a tiny term to the diagonal of psd_n
- diag_eps (float) – regularization factor for diagonal loading
- eps (float) – small constant to avoid division by zero
Returns: (…, F, C)
Return type: beamform_vector (torch.complex64/ComplexTensor)

Examples

>>> psd_n = torch.randn(2, 5, 3, 3, dtype=torch.complex64)
>>> psd_speech = torch.randn(2, 5, 3, 3, dtype=torch.complex64)
>>> psd_noise = torch.randn(2, 5, 3, 3, dtype=torch.complex64)
>>> vector = get_mvdr_vector_with_rtf(psd_n, psd_speech, psd_noise)
>>> print(vector.shape)  # Should print a shape like (2, 5, 3)

NOTE

The reference_vector can be an index or a tensor used for scaling the output.
The function supports both real and complex tensor types.