espnet2.gan_svs.visinger2.visinger2_vocoder.Generator_Harm
espnet2.gan_svs.visinger2.visinger2_vocoder.Generator_Harm
class espnet2.gan_svs.visinger2.visinger2_vocoder.Generator_Harm(hidden_channels: int = 192, n_harmonic: int = 64, kernel_size: int = 3, padding: int = 1, dropout_rate: float = 0.1, sample_rate: int = 22050, hop_size: int = 256)
Bases: Module
Initialize harmonic generator module.
This module generates harmonics from input pitch and harmonic data. It uses a convolutional neural network architecture to process the input and produce harmonic signals.
Args: : hidden_channels (int): Number of channels in the input and hidden : layers. <br/> n_harmonic (int): Number of harmonic channels. kernel_size (int): Size of the convolutional kernel. padding (int): Amount of padding added to the input. dropout_rate (float): Dropout rate for regularization. sample_rate (int): Sampling rate of the input audio. hop_size (int): Hop size used in the analysis of the input audio.
Examples: : ```python
generator = Generator_Harm(hidden_channels=192, n_harmonic=64) f0 = torch.randn(1, 1, 100) # Example pitch tensor harm = torch.randn(1, 192, 100) # Example harmonic data mask = torch.ones(1, 1, 100) # Example mask output = generator(f0, harm, mask) output.shape torch.Size([1, 64, 25600]) # Output shape after generating harmonics
Returns: : Tensor: Harmonic signal tensor of shape (B, n_harmonic, T * hop_length).
Initialize harmonic generator module.
- Parameters:
- hidden_channels (int) – Number of channels in the input and hidden layers.
- n_harmonic (int) – Number of harmonic channels.
- kernel_size (int) – Size of the convolutional kernel.
- padding (int) – Amount of padding added to the input.
- dropout_rate (float) – Dropout rate.
- sample_rate (int) – Sampling rate of the input audio.
- hop_size (int) – Hop size used in the analysis of the input audio.
forward(f0, harm, mask)
Initialize harmonic generator module.
This module generates harmonics from input pitch (F0) and harmonic data.
- Parameters:
- hidden_channels (int) – Number of channels in the input and hidden layers.
- n_harmonic (int) – Number of harmonic channels.
- kernel_size (int) – Size of the convolutional kernel.
- padding (int) – Amount of padding added to the input.
- dropout_rate (float) – Dropout rate.
- sample_rate (int) – Sampling rate of the input audio.
- hop_size (int) – Hop size used in the analysis of the input audio.