espnet2.enh.layers.beamformer.prepare_beamformer_stats

About 1 min

espnet2.enh.layers.beamformer.prepare_beamformer_stats

espnet2.enh.layers.beamformer.prepare_beamformer_stats(signal, masks_speech, mask_noise, powers=None, beamformer_type='mvdr', bdelay=3, btaps=5, eps=1e-06)

Prepare necessary statistics for constructing the specified beamformer.

This function computes the required statistics for different types of beamformers based on the provided signal, masks, and optional powers. The output statistics can include power spectral densities (PSD) of noise and speech, which are essential for beamforming algorithms.

Parameters:
- signal (torch.complex64/ComplexTensor) – Input signal tensor of shape (…, F, C, T), where F is the number of frequency bins, C is the number of channels, and T is the number of time frames.
- masks_speech (List *[*torch.Tensor ]) – A list of masks for all speech sources, each of shape (…, F, C, T).
- mask_noise (torch.Tensor) – Noise mask tensor of shape (…, F, C, T).
- powers (List *[*torch.Tensor ] , optional) – List of power tensors for all speech sources, each of shape (…, F, T). Used for wMPDR or WPD beamformers. Defaults to None.
- beamformer_type (str , optional) – Specifies the type of beamformer to use. Options include “mvdr”, “wmpdr”, “wpd”, etc. Defaults to “mvdr”.
- bdelay (int , optional) – Delay factor used for WPD beamformers. Defaults to 3.
- btaps (int , optional) – Number of filter taps used for WPD beamformers. Defaults to 5.
- eps (torch.Tensor , optional) – A small constant to prevent division by zero. Defaults to 1e-6.
Returns: A dictionary containing necessary statistics, including:
- “psd_n”: Power spectral density of noise.
- “psd_speech”: Power spectral density of speech.
- “psd_distortion”: Power spectral density of distortion.
Note: * When masks_speech is a tensor or a single-element list, all
returned statistics are tensors.
- When masks_speech is a multi-element list, some returned statistics can be a list, e.g., “psd_n” for MVDR, “psd_speech” and “psd_distortion”.
Return type: beamformer_stats (dict)

Examples

>>> signal = torch.randn(1, 64, 2, 128, dtype=torch.complex64)
>>> masks_speech = [torch.rand(1, 64, 2, 128) for _ in range(2)]
>>> mask_noise = torch.rand(1, 64, 2, 128)
>>> stats = prepare_beamformer_stats(signal, masks_speech, mask_noise)

Raises:AssertionError – If the specified beamformer type is not supported.