espnet2.gan_svs.uhifigan.sine_generator.SineGen

About 2 min

espnet2.gan_svs.uhifigan.sine_generator.SineGen

class espnet2.gan_svs.uhifigan.sine_generator.SineGen(sample_rate, harmonic_num=0, sine_amp=0.1, noise_std=0.003, voiced_threshold=0, flag_for_pulse=False)

Bases: Module

Definition of a sine generator for audio synthesis.

This class implements a sine wave generator that can produce harmonic sine waves and corresponding unvoiced/voiced (U/V) signals based on the provided fundamental frequency (F0) values. It can also introduce Gaussian noise to the generated waveforms.

sine_amp

Amplitude of the sine waveform (default: 0.1).

Type: float

noise_std

Standard deviation of Gaussian noise (default: 0.003).

Type: float

harmonic_num

Number of harmonic overtones (default: 0).

Type: int

dim

Dimension of the output, equal to harmonic_num + 1.

Type: int

sampling_rate

Sampling rate in Hz.

Type: int

voiced_threshold

F0 threshold for U/V classification (default: 0).

Type: float

flag_for_pulse

Indicates if the generator is used in pulse mode (default: False).

Type: bool
Parameters:
- sample_rate (int) – Sampling rate in Hz.
- harmonic_num (int , optional) – Number of harmonic overtones (default: 0).
- sine_amp (float , optional) – Amplitude of sine waveform (default: 0.1).
- noise_std (float , optional) – Standard deviation of Gaussian noise (default: 0.003).
- voiced_threshold (float , optional) – F0 threshold for U/V classification (default: 0).
- flag_for_pulse (bool , optional) – Flag indicating if the SinGen is used inside PulseGen (default: False).

NOTE

When flag_for_pulse is True, the first time step of a voiced segment is always sin(np.pi) or cos(0).

####### Examples

>>> sine_gen = SineGen(sample_rate=16000, harmonic_num=3)
>>> f0 = torch.tensor([[[440.0]]])  # Example F0 input
>>> sine_waves, uv, noise = sine_gen(f0)
>>> print(sine_waves.shape)  # Output shape: (1, length, dim)
>>> print(uv.shape)          # Output shape: (1, length, 1)

Raises:ValueError – If the provided sample_rate is not a positive integer.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(f0)

Forward SineGen.

Computes the sine waveforms and the voiced/unvoiced (uv) signal based on the input fundamental frequency (F0).

Parameters:f0 (torch.Tensor) – A tensor of shape (batchsize, length, dim=1) representing the input F0 values. The F0 for unvoiced steps should be set to 0.
Returns: A tuple containing: : - sine_tensor (torch.Tensor): A tensor of shape (batchsize, length, dim) representing the generated sine waveforms.
- uv (torch.Tensor): A tensor of shape (batchsize, length, 1) indicating voiced/unvoiced segments.
Return type: tuple

####### Examples

>>> sine_gen = SineGen(samp_rate=44100, harmonic_num=2)
>>> f0_input = torch.tensor([[[440.0]]])  # Example F0 input
>>> sine_wave, uv_signal, noise = sine_gen.forward(f0_input)
>>> print(sine_wave.shape)  # Output: torch.Size([1, length, 1])
>>> print(uv_signal.shape)   # Output: torch.Size([1, length, 1])

NOTE

The F0 input tensor should have a last dimension of size 1, where unvoiced F0 values must be set to 0.

Raises:
- ValueError – If the input tensor shape does not match
- (batchsize, length**,** dim**)****.** –