espnet2.enh.layers.beamformer.get_rtf
Less than 1 minute
espnet2.enh.layers.beamformer.get_rtf
espnet2.enh.layers.beamformer.get_rtf(psd_speech, psd_noise, mode='power', reference_vector: int | Tensor = 0, iterations: int = 3)
Calculate the relative transfer function (RTF).
The RTF is calculated using either the power method or eigenvalue decomposition. The algorithm for the power method is as follows:
- rtf = reference_vector
- for i in range(iterations): : rtf = (psd_noise^-1 @ psd_speech) @ rtf rtf = rtf / ||rtf||_2 # this normalization can be skipped
- rtf = psd_noise @ rtf
- rtf = rtf / rtf[…, ref_channel, :]
Note: Normalization at the reference channel is not performed here.
- Parameters:
- psd_speech (torch.complex64/ComplexTensor) – Speech covariance matrix with shape (…, F, C, C).
- psd_noise (torch.complex64/ComplexTensor) – Noise covariance matrix with shape (…, F, C, C).
- mode (str) – One of (“power”, “evd”).
- “power”: Uses the power method.
- “evd”: Uses eigenvalue decomposition.
- reference_vector (torch.Tensor or int) – Can be either a tensor of shape (…, C) or a scalar.
- iterations (int) – Number of iterations to perform in the power method.
- Returns: The calculated RTF with shape (…, F, C, 1).
- Return type: rtf (torch.complex64/ComplexTensor)
Examples
>>> psd_s = torch.randn(10, 8, 4, 4, dtype=torch.complex64)
>>> psd_n = torch.randn(10, 8, 4, 4, dtype=torch.complex64)
>>> rtf = get_rtf(psd_s, psd_n, mode="power", iterations=5)
>>> print(rtf.shape)
torch.Size([10, 8, 4, 1])