espnet2.gan_svs.visinger2.visinger2_vocoder.TorchSTFT
espnet2.gan_svs.visinger2.visinger2_vocoder.TorchSTFT
class espnet2.gan_svs.visinger2.visinger2_vocoder.TorchSTFT(sample_rate, fft_size, hop_size, win_size, normalized=False, domain='linear', mel_scale=False, ref_level_db=20, min_level_db=-100)
Bases: Module
Compute Short-Time Fourier Transform (STFT) for audio signals.
This class performs the Short-Time Fourier Transform (STFT) on input audio signals, converting them from the time domain to the frequency domain. It supports options for normalization and mel scale conversion.
fft_size
Size of the FFT.
- Type: int
hop_size
Hop size for the STFT.
- Type: int
win_size
Window size for the STFT.
- Type: int
ref_level_db
Reference level in dB.
- Type: float
min_level_db
Minimum level in dB.
- Type: float
window
Hann window tensor for STFT.
- Type: Tensor
normalized
Whether to normalize the output.
- Type: bool
domain
Domain type (‘linear’, ‘log’, or ‘double’).
- Type: str
mel_scale
Instance of MelScale if mel_scale is True.
Type:MelScale
Parameters:
- sample_rate (int) – Sample rate of the audio signal.
- fft_size (int) – Size of the FFT.
- hop_size (int) – Hop size for the STFT.
- win_size (int) – Window size for the STFT.
- normalized (bool , optional) – If True, normalize the output. Default is False.
- domain (str , optional) – Domain of the output (‘linear’, ‘log’, ‘double’). Default is ‘linear’.
- mel_scale (bool , optional) – If True, apply mel scale conversion. Default is False.
- ref_level_db (float , optional) – Reference level in dB. Default is 20.
- min_level_db (float , optional) – Minimum level in dB. Default is -100.
######### Examples
>>> stft = TorchSTFT(sample_rate=22050, fft_size=1024, hop_size=256, win_size=1024)
>>> audio = torch.randn(1, 1, 22050) # Simulated audio signal
>>> mag, phase = stft.transform(audio)
- Returns: Magnitude and phase tensors after STFT transformation.
- Return type: Tuple[Tensor, Tensor]
NOTE
The transform method will return different outputs based on the domain specified during initialization.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
complex(x)
Short-Time Fourier Transform (STFT) module.
This module computes the Short-Time Fourier Transform of an input signal. It allows for various configurations such as window size, FFT size, and normalization. The output can be returned in different domains including linear, log, and double.
fft_size
Size of the FFT.
- Type: int
hop_size
Number of samples between successive frames.
- Type: int
win_size
Size of the window used for STFT.
- Type: int
ref_level_db
Reference level in decibels.
- Type: float
min_level_db
Minimum level in decibels.
- Type: float
window
Hann window tensor.
- Type: Tensor
normalized
Flag to determine if the output should be normalized.
- Type: bool
domain
Domain of the output signal.
- Type: str
mel_scale
Mel scale transformation object if mel_scale is True.
Type:MelScale
Parameters:
- sample_rate (int) – Sample rate of the audio signal.
- fft_size (int) – Size of the FFT.
- hop_size (int) – Hop size for the STFT.
- win_size (int) – Window size for the STFT.
- normalized (bool , optional) – Whether to normalize the output. Defaults to False.
- domain (str , optional) – Domain of the output (‘linear’, ‘log’, or ‘double’). Defaults to ‘linear’.
- mel_scale (bool , optional) – Whether to apply mel scale transformation. Defaults to False.
- ref_level_db (float , optional) – Reference level in decibels. Defaults to 20.
- min_level_db (float , optional) – Minimum level in decibels. Defaults to -100.
######### Examples
>>> stft = TorchSTFT(sample_rate=22050, fft_size=1024, hop_size=256,
... win_size=1024, normalized=True, domain='log')
>>> input_signal = torch.randn(1, 1, 22050) # (batch_size, channels, time)
>>> magnitude, phase = stft.transform(input_signal)
transform(x)
Computes the STFT of the input tensor.
complex(x)
Computes the complex representation of the STFT.
transform(x)
Short-Time Fourier Transform (STFT) module for audio processing.
This module performs the Short-Time Fourier Transform (STFT) on an input audio signal, providing both magnitude and phase information. It can also convert the resulting STFT to a mel frequency representation if required.
fft_size
The size of the FFT.
- Type: int
hop_size
The hop length between frames.
- Type: int
win_size
The window size for the STFT.
- Type: int
ref_level_db
Reference level in decibels.
- Type: int
min_level_db
Minimum level in decibels.
- Type: int
window
The Hann window used for the STFT.
- Type: Tensor
normalized
Whether to normalize the output.
- Type: bool
domain
The domain of the output (‘linear’, ‘log’, or ‘double’).
- Type: str
mel_scale
Mel scale converter if mel_scale is True.
Type:MelScale or None
Parameters:
- sample_rate (int) – Sample rate of the input audio signal.
- fft_size (int) – Size of the FFT.
- hop_size (int) – Hop size for the STFT.
- win_size (int) – Window size for the STFT.
- normalized (bool , optional) – If True, normalize the output. Defaults to False.
- domain (str , optional) – The domain of the output (‘linear’, ‘log’, ‘double’). Defaults to ‘linear’.
- mel_scale (bool , optional) – If True, convert the STFT to mel frequency. Defaults to False.
- ref_level_db (int , optional) – Reference level in decibels. Defaults to 20.
- min_level_db (int , optional) – Minimum level in decibels. Defaults to -100.
######### Examples
>>> stft = TorchSTFT(sample_rate=22050, fft_size=1024, hop_size=256)
>>> audio_input = torch.randn(1, 1, 22050) # Example audio input
>>> magnitude, phase = stft.transform(audio_input)
transform(x)
Computes the STFT of the input tensor and returns magnitude and phase.
complex(x)
Computes the complex STFT of the input tensor.