espnet2.tts.feats_extract.linear_spectrogram.LinearSpectrogram
espnet2.tts.feats_extract.linear_spectrogram.LinearSpectrogram
class espnet2.tts.feats_extract.linear_spectrogram.LinearSpectrogram(n_fft: int = 1024, win_length: int | None = None, hop_length: int = 256, window: str | None = 'hann', center: bool = True, normalized: bool = False, onesided: bool = True)
Bases: AbsFeatsExtract
Linear amplitude spectrogram.
This class computes the linear amplitude spectrogram from an input signal using Short-Time Fourier Transform (STFT). It inherits from the AbsFeatsExtract abstract class.
n_fft
Number of FFT points.
- Type: int
hop_length
Number of samples between adjacent STFT columns.
- Type: int
win_length
Window length. If None, it defaults to n_fft.
- Type: Optional[int]
window
Window function type. Defaults to “hann”.
- Type: Optional[str]
stft
STFT instance used for transforming the input signal.
Type:Stft
Parameters:
- n_fft (int , optional) – Number of FFT points. Defaults to 1024.
- win_length (Optional *[*int ] , optional) – Window length. Defaults to None.
- hop_length (int , optional) – Number of samples between STFT columns. Defaults to 256.
- window (Optional *[*str ] , optional) – Window type. Defaults to “hann”.
- center (bool , optional) – If True, the signal is padded so that the t-th frame is centered at time t. Defaults to True.
- normalized (bool , optional) – If True, the output will be normalized. Defaults to False.
- onesided (bool , optional) – If True, the output will be one-sided. Defaults to True.
Returns: A tuple containing the amplitude spectrogram and the lengths of the features.
Return type: Tuple[torch.Tensor, torch.Tensor]
########### Examples
Create a LinearSpectrogram instance
spectrogram = LinearSpectrogram(n_fft=2048, hop_length=512)
Forward pass with an input tensor
input_tensor = torch.randn(1, 16000) # Example input amp_spectrogram, lengths = spectrogram(input_tensor)
NOTE
The input tensor should have dimensions (batch_size, num_samples).
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(input: Tensor, input_lengths: Tensor | None = None) → Tuple[Tensor, Tensor]
Computes the linear amplitude spectrogram from the input tensor using Short-Time
Fourier Transform (STFT).
The forward method takes an input tensor, applies STFT to convert the time domain signal into the frequency domain, and then computes the amplitude spectrum from the resulting complex spectrogram.
- Parameters:
- input (torch.Tensor) – Input tensor of shape (…, T), where T is the number of time steps. This tensor represents the audio waveform.
- input_lengths (torch.Tensor , optional) – A tensor containing the lengths of the input sequences. This is useful for batching inputs of varying lengths. If not provided, the method assumes all inputs are of the same length.
- Returns: A tuple containing: : - A tensor of shape (…, F) representing the amplitude spectrum, where F is the number of frequency bins.
- A tensor of shape (…,) representing the lengths of the features after applying STFT.
- Return type: Tuple[torch.Tensor, torch.Tensor]
- Raises:AssertionError – If the input tensor’s dimension is less than 4 or if the last dimension of the input tensor is not equal to 2.
########### Examples
>>> linear_spectrogram = LinearSpectrogram(n_fft=1024)
>>> input_tensor = torch.randn(1, 16000) # Example input
>>> amp_spectrum, feats_lengths = linear_spectrogram.forward(input_tensor)
>>> print(amp_spectrum.shape) # Output shape will be (..., F)
>>> print(feats_lengths.shape) # Output shape will be (...,)
get_parameters() → Dict[str, Any]
Return the parameters required by Vocoder.
This method returns a dictionary containing the key parameters necessary for the vocoder, which include the number of FFT points, the hop length, the window length, and the window type.
- Returns:
- n_fft (int): The number of FFT points.
- n_shift (int): The hop length.
- win_length (Optional[int]): The window length.
- window (Optional[str]): The window type.
- Return type: A dictionary with the following keys
########### Examples
>>> spectrogram = LinearSpectrogram(n_fft=2048, hop_length=512)
>>> params = spectrogram.get_parameters()
>>> print(params)
{'n_fft': 2048, 'n_shift': 512, 'win_length': None, 'window': 'hann'}
output_size() → int
Returns the output size of the linear spectrogram, which is calculated as
half of the FFT size plus one. This value represents the number of frequency bins in the output spectrogram.
- Returns: The number of frequency bins in the output spectrogram.
- Return type: int
########### Examples
>>> spectrogram = LinearSpectrogram(n_fft=1024)
>>> spectrogram.output_size()
513