espnet2.layers.augmentation.pitch_shift

Less than 1 minute

espnet2.layers.augmentation.pitch_shift

espnet2.layers.augmentation.pitch_shift(waveform, sample_rate: int, n_steps: int, bins_per_octave: int = 12, n_fft: float = 0.032, win_length: float | None = None, hop_length: float = 0.008, window: str | None = 'hann')

Shift the pitch of a waveform by n_steps steps.

Note: this function is slow.

Parameters:
- waveform (torch.Tensor) – audio signal (…, time).
- sample_rate (int) – sampling rate in Hz.
- n_steps (int) – the (fractional) steps to shift the pitch. -4 for shifting pitch down by 4/bins_per_octave octaves, 4 for shifting pitch up by 4/bins_per_octave octaves.
- bins_per_octave (int) – number of steps per octave.
- n_fft (float) – length of FFT (in second).
- win_length (float or None) – The window length (in second) used for STFT. If None, it is treated as equal to n_fft.
- hop_length (float) – The hop size (in second) used for STFT.
- window (str or None) – The windowing function applied to the signal after padding with zeros.
Returns: filtered signal (…, time).
Return type: ret (torch.Tensor)

Examples

>>> waveform = torch.randn(1, 16000)  # example waveform
>>> sample_rate = 16000
>>> n_steps = 4
>>> shifted_waveform = pitch_shift(waveform, sample_rate, n_steps)