espnet2.enh.layers.beamformer.get_rtf_matrix

Less than 1 minute

espnet2.enh.layers.beamformer.get_rtf_matrix

espnet2.enh.layers.beamformer.get_rtf_matrix(psd_speeches, psd_noises, diagonal_loading: bool = True, ref_channel: int = 0, rtf_iterations: int = 3, diag_eps: float = 1e-07, eps: float = 1e-08)

Calculate the RTF matrix with each column being the relative transfer function of the corresponding source.

This function computes the relative transfer function (RTF) matrix for a set of speech covariance matrices and noise covariance matrices. Each column of the resulting matrix corresponds to the RTF of a specific speech source relative to the specified reference channel.

Parameters:
- psd_speeches (list) – A list of speech covariance matrices, where each matrix has the shape (…, F, C, C).
- psd_noises (list) – A list of noise covariance matrices, where each matrix has the shape (…, F, C, C).
- diagonal_loading (bool) – If True, adds a small regularization term to the diagonal of the noise covariance matrices.
- ref_channel (int) – The index of the reference channel for RTF normalization.
- rtf_iterations (int) – The number of iterations for the RTF computation.
- diag_eps (float) – The regularization factor to be added to the diagonal when diagonal loading is enabled.
- eps (float) – A small constant to avoid division by zero.
Returns: The RTF matrix with shape : (…, F, C, num_spk), where num_spk is the number of speech sources.
Return type: torch.complex64/ComplexTensor

NOTE

The function assumes that psd_speeches and psd_noises are both lists of the same length, corresponding to the number of sources.
The output RTF matrix is normalized at the reference channel.

Examples

>>> psd_speeches = [torch.rand(10, 4, 4), torch.rand(10, 4, 4)]
>>> psd_noises = [torch.rand(10, 4, 4), torch.rand(10, 4, 4)]
>>> rtf_matrix = get_rtf_matrix(psd_speeches, psd_noises)