espnet2.enh.layers.beamformer.get_rtf

Less than 1 minute

espnet2.enh.layers.beamformer.get_rtf

espnet2.enh.layers.beamformer.get_rtf(psd_speech, psd_noise, mode='power', reference_vector: int | Tensor = 0, iterations: int = 3)

Calculate the relative transfer function (RTF).

The RTF is calculated using either the power method or eigenvalue decomposition. The algorithm for the power method is as follows:

rtf = reference_vector
for i in range(iterations): : rtf = (psd_noise^-1 @ psd_speech) @ rtf rtf = rtf / ||rtf||_2 # this normalization can be skipped
rtf = psd_noise @ rtf
rtf = rtf / rtf[…, ref_channel, :]

Note: Normalization at the reference channel is not performed here.

Parameters:
- psd_speech (torch.complex64/ComplexTensor) – Speech covariance matrix with shape (…, F, C, C).
- psd_noise (torch.complex64/ComplexTensor) – Noise covariance matrix with shape (…, F, C, C).
- mode (str) – One of (“power”, “evd”).
  - “power”: Uses the power method.
  - “evd”: Uses eigenvalue decomposition.
- reference_vector (torch.Tensor or int) – Can be either a tensor of shape (…, C) or a scalar.
- iterations (int) – Number of iterations to perform in the power method.
Returns: The calculated RTF with shape (…, F, C, 1).
Return type: rtf (torch.complex64/ComplexTensor)

Examples

>>> psd_s = torch.randn(10, 8, 4, 4, dtype=torch.complex64)
>>> psd_n = torch.randn(10, 8, 4, 4, dtype=torch.complex64)
>>> rtf = get_rtf(psd_s, psd_n, mode="power", iterations=5)
>>> print(rtf.shape)
torch.Size([10, 8, 4, 1])