espnet2.gan_svs.visinger2.ddsp.multiscale_fft
Less than 1 minute
espnet2.gan_svs.visinger2.ddsp.multiscale_fft
espnet2.gan_svs.visinger2.ddsp.multiscale_fft(signal, scales, overlap)
Compute the multiscale Short-Time Fourier Transform (STFT) of a signal.
This function calculates the STFT for a given signal across multiple scales specified in the scales list. The overlapping factor between consecutive frames is defined by the overlap parameter.
- Parameters:
- signal (torch.Tensor) – Input audio signal tensor of shape (N, T), where N is the number of channels and T is the length of the signal.
- scales (list of int) – List of scales (window sizes) for which the STFT will be computed.
- overlap (float) – Fraction of overlap between consecutive frames, where 0 < overlap < 1.
- Returns: A list containing the STFTs of the signal for : each scale, where each tensor has shape (N, F, T’), F is the number of frequency bins, and T’ is the number of time frames for the corresponding scale.
- Return type: list of torch.Tensor
Examples
>>> signal = torch.randn(1, 16000) # Simulated audio signal
>>> scales = [256, 512, 1024]
>>> overlap = 0.5
>>> stfts = multiscale_fft(signal, scales, overlap)
>>> for idx, stft in enumerate(stfts):
... print(f'STFT for scale {scales[idx]}: {stft.shape}')
NOTE
The function uses a Hann window for the STFT computation. Ensure that the input signal is of type torch.Tensor and has the correct shape.