espnet2.enh.layers.beamformer.get_power_spectral_density_matrix

Less than 1 minute

espnet2.enh.layers.beamformer.get_power_spectral_density_matrix

espnet2.enh.layers.beamformer.get_power_spectral_density_matrix(xs, mask, normalization=True, reduction='mean', eps: float = 1e-15)

Return cross-channel power spectral density (PSD) matrix.

This function computes the cross-channel power spectral density (PSD) matrix by applying the provided mask to the input signal. The PSD matrix is crucial for various beamforming techniques in audio signal processing, especially in scenarios involving multi-channel audio.

Parameters:
- xs (torch.complex64/ComplexTensor) – The input signal tensor of shape (…, F, C, T), where F is the number of frequency bins, C is the number of channels, and T is the number of time frames.
- mask (torch.Tensor) – The mask tensor of shape (…, F, C, T) that is applied to the input signal to compute the PSD.
- normalization (bool) – Whether to normalize the mask along the time axis before computing the PSD. Default is True.
- reduction (str) – Specifies the reduction method to apply. Can be “mean” or “median”. Default is “mean”.
- eps (float) – A small constant added for numerical stability to avoid division by zero. Default is 1e-15.
Returns: The computed power spectral density matrix of shape (…, F, C, C), where each element represents the PSD between channels.
Return type: psd (torch.complex64/ComplexTensor)
Raises:ValueError – If an unknown reduction mode is specified.

Examples

>>> xs = torch.randn(2, 256, 4, 100, dtype=torch.complex64)
>>> mask = torch.randn(2, 256, 4, 100)
>>> psd_matrix = get_power_spectral_density_matrix(xs, mask)
>>> print(psd_matrix.shape)
torch.Size([2, 256, 4, 4])

NOTE

The reduction mode determines how the mask is aggregated across the channels. If “mean” is selected, the mean value is taken across the channel dimension. If “median” is selected, the median value is taken instead.