espnet2.gan_svs.visinger2.visinger2_vocoder.TorchSTFT

About 4 min

espnet2.gan_svs.visinger2.visinger2_vocoder.TorchSTFT

class espnet2.gan_svs.visinger2.visinger2_vocoder.TorchSTFT(sample_rate, fft_size, hop_size, win_size, normalized=False, domain='linear', mel_scale=False, ref_level_db=20, min_level_db=-100)

Bases: Module

Compute Short-Time Fourier Transform (STFT) for audio signals.

This class performs the Short-Time Fourier Transform (STFT) on input audio signals, converting them from the time domain to the frequency domain. It supports options for normalization and mel scale conversion.

fft_size

Size of the FFT.

Type: int

hop_size

Hop size for the STFT.

Type: int

win_size

Window size for the STFT.

Type: int

ref_level_db

Reference level in dB.

Type: float

min_level_db

Minimum level in dB.

Type: float

window

Hann window tensor for STFT.

Type: Tensor

normalized

Whether to normalize the output.

Type: bool

domain

Domain type (‘linear’, ‘log’, or ‘double’).

Type: str

mel_scale

Instance of MelScale if mel_scale is True.

Type:MelScale
Parameters:
- sample_rate (int) – Sample rate of the audio signal.
- fft_size (int) – Size of the FFT.
- hop_size (int) – Hop size for the STFT.
- win_size (int) – Window size for the STFT.
- normalized (bool , optional) – If True, normalize the output. Default is False.
- domain (str , optional) – Domain of the output (‘linear’, ‘log’, ‘double’). Default is ‘linear’.
- mel_scale (bool , optional) – If True, apply mel scale conversion. Default is False.
- ref_level_db (float , optional) – Reference level in dB. Default is 20.
- min_level_db (float , optional) – Minimum level in dB. Default is -100.

######### Examples

>>> stft = TorchSTFT(sample_rate=22050, fft_size=1024, hop_size=256, win_size=1024)
>>> audio = torch.randn(1, 1, 22050)  # Simulated audio signal
>>> mag, phase = stft.transform(audio)

Returns: Magnitude and phase tensors after STFT transformation.
Return type: Tuple[Tensor, Tensor]

NOTE

The transform method will return different outputs based on the domain specified during initialization.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

complex(x)

Short-Time Fourier Transform (STFT) module.

This module computes the Short-Time Fourier Transform of an input signal. It allows for various configurations such as window size, FFT size, and normalization. The output can be returned in different domains including linear, log, and double.

fft_size

Size of the FFT.

Type: int

hop_size

Number of samples between successive frames.

Type: int

win_size

Size of the window used for STFT.

Type: int

ref_level_db

Reference level in decibels.

Type: float

min_level_db

Minimum level in decibels.

Type: float

window

Hann window tensor.

Type: Tensor

normalized

Flag to determine if the output should be normalized.

Type: bool

domain

Domain of the output signal.

Type: str

mel_scale

Mel scale transformation object if mel_scale is True.

Type:MelScale
Parameters:
- sample_rate (int) – Sample rate of the audio signal.
- fft_size (int) – Size of the FFT.
- hop_size (int) – Hop size for the STFT.
- win_size (int) – Window size for the STFT.
- normalized (bool , optional) – Whether to normalize the output. Defaults to False.
- domain (str , optional) – Domain of the output (‘linear’, ‘log’, or ‘double’). Defaults to ‘linear’.
- mel_scale (bool , optional) – Whether to apply mel scale transformation. Defaults to False.
- ref_level_db (float , optional) – Reference level in decibels. Defaults to 20.
- min_level_db (float , optional) – Minimum level in decibels. Defaults to -100.

######### Examples

>>> stft = TorchSTFT(sample_rate=22050, fft_size=1024, hop_size=256,
...                   win_size=1024, normalized=True, domain='log')
>>> input_signal = torch.randn(1, 1, 22050)  # (batch_size, channels, time)
>>> magnitude, phase = stft.transform(input_signal)

transform(x)

Computes the STFT of the input tensor.

complex(x)

Computes the complex representation of the STFT.

transform(x)

Short-Time Fourier Transform (STFT) module for audio processing.

This module performs the Short-Time Fourier Transform (STFT) on an input audio signal, providing both magnitude and phase information. It can also convert the resulting STFT to a mel frequency representation if required.

fft_size

The size of the FFT.

Type: int

hop_size

The hop length between frames.

Type: int

win_size

The window size for the STFT.

Type: int

ref_level_db

Reference level in decibels.

Type: int

min_level_db

Minimum level in decibels.

Type: int

window

The Hann window used for the STFT.

Type: Tensor

normalized

Whether to normalize the output.

Type: bool

domain

The domain of the output (‘linear’, ‘log’, or ‘double’).

Type: str

mel_scale

Mel scale converter if mel_scale is True.

Type:MelScale or None
Parameters:
- sample_rate (int) – Sample rate of the input audio signal.
- fft_size (int) – Size of the FFT.
- hop_size (int) – Hop size for the STFT.
- win_size (int) – Window size for the STFT.
- normalized (bool , optional) – If True, normalize the output. Defaults to False.
- domain (str , optional) – The domain of the output (‘linear’, ‘log’, ‘double’). Defaults to ‘linear’.
- mel_scale (bool , optional) – If True, convert the STFT to mel frequency. Defaults to False.
- ref_level_db (int , optional) – Reference level in decibels. Defaults to 20.
- min_level_db (int , optional) – Minimum level in decibels. Defaults to -100.

######### Examples

>>> stft = TorchSTFT(sample_rate=22050, fft_size=1024, hop_size=256)
>>> audio_input = torch.randn(1, 1, 22050)  # Example audio input
>>> magnitude, phase = stft.transform(audio_input)

transform(x)

Computes the STFT of the input tensor and returns magnitude and phase.

complex(x)

Computes the complex STFT of the input tensor.