espnet2.tts.prodiff.denoiser.SpectogramDenoiser
espnet2.tts.prodiff.denoiser.SpectogramDenoiser
class espnet2.tts.prodiff.denoiser.SpectogramDenoiser(idim: int, adim: int = 256, layers: int = 20, channels: int = 256, cycle_length: int = 1, timesteps: int = 200, timescale: int = 1, max_beta: float = 40.0, scheduler: str = 'vpsde', dropout_rate: float = 0.05)
Bases: Module
Spectogram Denoiser.
Ref: https://arxiv.org/pdf/2207.06389.pdf.
Initialization.
- Parameters:
- idim (int) β Dimension of the inputs.
- adim (int , optional) β Dimension of the hidden states. Defaults to 256.
- layers (int , optional) β Number of layers. Defaults to 20.
- channels (int , optional) β Number of channels of each layer. Defaults to 256.
- cycle_length (int , optional) β Cycle length of the diffusion. Defaults to 1.
- timesteps (int , optional) β Number of timesteps of the diffusion. Defaults to 200.
- timescale (int , optional) β Number of timescale. Defaults to 1.
- max_beta (float , optional) β Maximum beta value for schedueler. Defaults to 40.
- scheduler (str , optional) β Type of noise scheduler. Defaults to βvpsdeβ.
- dropout_rate (float , optional) β Dropout rate. Defaults to 0.05.
diffusion(xs_ref: Tensor, steps: Tensor, noise: Tensor | None = None) β Tensor
Calculate diffusion process during training.
- Parameters:
- xs_ref (torch.Tensor) β Input tensor.
- steps (torch.Tensor) β Number of step.
- noise (Optional *[*torch.Tensor ] , optional) β Noise tensor. Defaults to None.
- Returns: Output tensor.
- Return type: torch.Tensor
forward(xs: Tensor, ys: Tensor | None = None, masks: Tensor | None = None, is_inference: bool = False) β Tensor
Calculate forward propagation.
- Parameters:
- xs (torch.Tensor) β Phoneme-encoded tensor (#batch, time, dims)
- ys (Optional *[*torch.Tensor ] , optional) β Mel-based reference tensor (#batch, time, mels). Defaults to None.
- masks (Optional *[*torch.Tensor ] , optional) β Mask tensor (#batch, time). Defaults to None.
- Returns: Output tensor (#batch, time, dims).
- Return type: torch.Tensor
forward_denoise(xs_noisy: Tensor, step: Tensor, condition: Tensor) β Tensor
Calculate forward for denoising diffusion.
- Parameters:
- xs_noisy (torch.Tensor) β Input tensor.
- step (torch.Tensor) β Number of step.
- condition (torch.Tensor) β Conditioning tensor.
- Returns: Denoised tensor.
- Return type: torch.Tensor
inference(condition: Tensor) β Tensor
Calculate forward during inference.
- Parameters:condition (torch.Tensor) β Conditioning tensor (batch, time, dims).
- Returns: Output tensor.
- Return type: torch.Tensor
