espnet2.layers.augmentation.time_stretch

Less than 1 minute

espnet2.layers.augmentation.time_stretch

espnet2.layers.augmentation.time_stretch(waveform, sample_rate: int, factor: float, n_fft: float = 0.032, win_length: float | None = None, hop_length: float = 0.008, window: str | None = 'hann')

Time scaling (speed up in time without modifying pitch) via phase vocoder.

Note: This function should be used with caution as it changes the signal duration.

Parameters:
- waveform (torch.Tensor) – audio signal (…, time)
- sample_rate (int) – sampling rate in Hz
- factor (float) – speed-up factor (e.g., 0.9 for 90% speed and 1.3 for 130% speed)
- n_fft (float) – length of FFT (in seconds)
- win_length (float or None) – The window length (in seconds) used for STFT If None, it is treated as equal to n_fft
- hop_length (float) – The hop size (in seconds) used for STFT
- window (str or None) – The windowing function applied to the signal after padding with zeros
Returns: perturbed signal (…, time)
Return type: ret (torch.Tensor)

Examples

>>> waveform = torch.randn(1, 16000)  # Simulated audio waveform
>>> sample_rate = 16000
>>> factor = 1.5  # Speed up by 50%
>>> stretched_waveform = time_stretch(waveform, sample_rate, factor)