espnet2.tts.feats_extract.linear_spectrogram.LinearSpectrogram

About 2 min

espnet2.tts.feats_extract.linear_spectrogram.LinearSpectrogram

class espnet2.tts.feats_extract.linear_spectrogram.LinearSpectrogram(n_fft: int = 1024, win_length: int | None = None, hop_length: int = 256, window: str | None = 'hann', center: bool = True, normalized: bool = False, onesided: bool = True)

Bases: AbsFeatsExtract

Linear amplitude spectrogram.

This class computes the linear amplitude spectrogram from an input signal using Short-Time Fourier Transform (STFT). It inherits from the AbsFeatsExtract abstract class.

n_fft

Number of FFT points.

Type: int

hop_length

Number of samples between adjacent STFT columns.

Type: int

win_length

Window length. If None, it defaults to n_fft.

Type: Optional[int]

window

Window function type. Defaults to “hann”.

Type: Optional[str]

stft

STFT instance used for transforming the input signal.

Type:Stft
Parameters:
- n_fft (int , optional) – Number of FFT points. Defaults to 1024.
- win_length (Optional *[*int ] , optional) – Window length. Defaults to None.
- hop_length (int , optional) – Number of samples between STFT columns. Defaults to 256.
- window (Optional *[*str ] , optional) – Window type. Defaults to “hann”.
- center (bool , optional) – If True, the signal is padded so that the t-th frame is centered at time t. Defaults to True.
- normalized (bool , optional) – If True, the output will be normalized. Defaults to False.
- onesided (bool , optional) – If True, the output will be one-sided. Defaults to True.
Returns: A tuple containing the amplitude spectrogram and the lengths of the features.
Return type: Tuple[torch.Tensor, torch.Tensor]

########### Examples

Create a LinearSpectrogram instance

spectrogram = LinearSpectrogram(n_fft=2048, hop_length=512)

Forward pass with an input tensor

input_tensor = torch.randn(1, 16000) # Example input amp_spectrogram, lengths = spectrogram(input_tensor)

NOTE

The input tensor should have dimensions (batch_size, num_samples).

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(input: Tensor, input_lengths: Tensor | None = None) → Tuple[Tensor, Tensor]

Computes the linear amplitude spectrogram from the input tensor using Short-Time

Fourier Transform (STFT).

The forward method takes an input tensor, applies STFT to convert the time domain signal into the frequency domain, and then computes the amplitude spectrum from the resulting complex spectrogram.

Parameters:
- input (torch.Tensor) – Input tensor of shape (…, T), where T is the number of time steps. This tensor represents the audio waveform.
- input_lengths (torch.Tensor , optional) – A tensor containing the lengths of the input sequences. This is useful for batching inputs of varying lengths. If not provided, the method assumes all inputs are of the same length.
Returns: A tuple containing: : - A tensor of shape (…, F) representing the amplitude spectrum, where F is the number of frequency bins.
- A tensor of shape (…,) representing the lengths of the features after applying STFT.
Return type: Tuple[torch.Tensor, torch.Tensor]
Raises:AssertionError – If the input tensor’s dimension is less than 4 or if the last dimension of the input tensor is not equal to 2.

########### Examples

>>> linear_spectrogram = LinearSpectrogram(n_fft=1024)
>>> input_tensor = torch.randn(1, 16000)  # Example input
>>> amp_spectrum, feats_lengths = linear_spectrogram.forward(input_tensor)
>>> print(amp_spectrum.shape)  # Output shape will be (..., F)
>>> print(feats_lengths.shape)  # Output shape will be (...,)

get_parameters() → Dict[str, Any]

Return the parameters required by Vocoder.

This method returns a dictionary containing the key parameters necessary for the vocoder, which include the number of FFT points, the hop length, the window length, and the window type.

Returns:
- n_fft (int): The number of FFT points.
- n_shift (int): The hop length.
- win_length (Optional[int]): The window length.
- window (Optional[str]): The window type.
Return type: A dictionary with the following keys

########### Examples

>>> spectrogram = LinearSpectrogram(n_fft=2048, hop_length=512)
>>> params = spectrogram.get_parameters()
>>> print(params)
{'n_fft': 2048, 'n_shift': 512, 'win_length': None, 'window': 'hann'}

output_size() → int

Returns the output size of the linear spectrogram, which is calculated as

half of the FFT size plus one. This value represents the number of frequency bins in the output spectrogram.

Returns: The number of frequency bins in the output spectrogram.
Return type: int

########### Examples

>>> spectrogram = LinearSpectrogram(n_fft=1024)
>>> spectrogram.output_size()
513