espnet2.gan_tts.vits.duration_predictor.StochasticDurationPredictor
espnet2.gan_tts.vits.duration_predictor.StochasticDurationPredictor
class espnet2.gan_tts.vits.duration_predictor.StochasticDurationPredictor(channels: int = 192, kernel_size: int = 3, dropout_rate: float = 0.5, flows: int = 4, dds_conv_layers: int = 3, global_channels: int = -1)
Bases: Module
Stochastic duration predictor module.
This module implements a stochastic duration predictor as described in Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech.
pre
Convolutional layer for preprocessing input.
- Type: torch.nn.Conv1d
dds
Dilated depth separable convolution layer.
proj
Convolutional layer for projecting features.
- Type: torch.nn.Conv1d
log_flow
Log flow for managing the flow of information.
- Type:LogFlow
flows
List of flow modules for processing.
- Type: torch.nn.ModuleList
post_pre
Convolutional layer for post-processing input.
- Type: torch.nn.Conv1d
post_dds
Post-processing dilated depth separable convolution layer.
post_proj
Convolutional layer for projecting post-processed features.
- Type: torch.nn.Conv1d
post_flows
List of post-processing flow modules.
- Type: torch.nn.ModuleList
global_conv
Convolutional layer for global conditioning if global_channels > 0.
Type: torch.nn.Conv1d, optional
Parameters:
- channels (int) – Number of channels.
- kernel_size (int) – Kernel size.
- dropout_rate (float) – Dropout rate.
- flows (int) – Number of flows.
- dds_conv_layers (int) – Number of conv layers in DDS conv.
- global_channels (int) – Number of global conditioning channels.
####### Examples
>>> predictor = StochasticDurationPredictor()
>>> x = torch.randn(2, 192, 50) # Example input tensor
>>> x_mask = torch.ones(2, 1, 50) # Example mask tensor
>>> duration = torch.randn(2, 1, 50) # Example duration tensor
>>> output = predictor(x, x_mask, w=duration)
- Raises:AssertionError – If inverse is False and w is None in the forward method.
Initialize StochasticDurationPredictor module.
- Parameters:
- channels (int) – Number of channels.
- kernel_size (int) – Kernel size.
- dropout_rate (float) – Dropout rate.
- flows (int) – Number of flows.
- dds_conv_layers (int) – Number of conv layers in DDS conv.
- global_channels (int) – Number of global conditioning channels.
forward(x: Tensor, x_mask: Tensor, w: Tensor | None = None, g: Tensor | None = None, inverse: bool = False, noise_scale: float = 1.0) → Tensor
Calculate forward propagation.
This method performs the forward pass for the Stochastic Duration Predictor. It computes the negative log-likelihood (NLL) or log-duration tensor based on the provided input tensors and optional parameters.
- Parameters:
- x (Tensor) – Input tensor with shape (B, channels, T_text).
- x_mask (Tensor) – Mask tensor with shape (B, 1, T_text).
- w (Optional *[*Tensor ]) – Duration tensor with shape (B, 1, T_text). Required when inverse is False.
- g (Optional *[*Tensor ]) – Global conditioning tensor with shape (B, channels, 1).
- inverse (bool) – Whether to perform the inverse operation on the flow. Defaults to False.
- noise_scale (float) – Scale for the noise added to the latent space. Defaults to 1.0.
- Returns: If inverse is False, returns a negative log-likelihood (NLL) tensor with shape (B,). If inverse is True, returns a log-duration tensor with shape (B, 1, T_text).
- Return type: Tensor
####### Examples
>>> model = StochasticDurationPredictor()
>>> x = torch.randn(5, 192, 10) # Example input tensor
>>> x_mask = torch.ones(5, 1, 10) # Example mask tensor
>>> w = torch.randn(5, 1, 10) # Example duration tensor
>>> output = model.forward(x, x_mask, w) # Forward pass