espnet2.gan_tts.vits.flow.ConvFlow
espnet2.gan_tts.vits.flow.ConvFlow
class espnet2.gan_tts.vits.flow.ConvFlow(in_channels: int, hidden_channels: int, kernel_size: int, layers: int, bins: int = 10, tail_bound: float = 5.0)
Bases: Module
Convolutional flow module for generative modeling.
This module is part of a series of flow-based transformations used in the VITS model. It implements a convolutional flow that maps input tensors through a series of transformations.
half_channels
Number of input channels divided by two.
- Type: int
hidden_channels
Number of hidden channels.
- Type: int
bins
Number of bins for the transformation.
- Type: int
tail_bound
Tail bound value for the transformation.
Type: float
Parameters:
- in_channels (int) – Number of input channels.
- hidden_channels (int) – Number of hidden channels.
- kernel_size (int) – Kernel size for the convolution.
- layers (int) – Number of layers in the convolutional flow.
- bins (int , optional) – Number of bins for the transformation (default: 10).
- tail_bound (float , optional) – Tail bound value for the transformation (default: 5.0).
Returns: Output tensor if not inverse, or a tuple containing the output tensor and the log-determinant tensor for NLL if inverse is False.
Return type: Union[Tensor, Tuple[Tensor, Tensor]]
####### Examples
>>> conv_flow = ConvFlow(in_channels=64, hidden_channels=128, kernel_size=3, layers=4)
>>> x = torch.randn(32, 64, 100) # (B, channels, T)
>>> x_mask = torch.ones(32, 1, 100) # (B, 1, T)
>>> output, logdet = conv_flow(x, x_mask) # Forward propagation
>>> output_inv = conv_flow(x, x_mask, inverse=True) # Inverse propagation
NOTE
This implementation relies on a piecewise rational quadratic transform for the final mapping of the input.
Initialize ConvFlow module.
- Parameters:
- in_channels (int) – Number of input channels.
- hidden_channels (int) – Number of hidden channels.
- kernel_size (int) – Kernel size.
- layers (int) – Number of layers.
- bins (int) – Number of bins.
- tail_bound (float) – Tail bound value.
forward(x: Tensor, x_mask: Tensor, g: Tensor | None = None, inverse: bool = False) → Tensor | Tuple[Tensor, Tensor]
Convolutional flow module.
This module implements a convolutional flow for generative modeling. It transforms an input tensor using a series of convolutional layers and piecewise rational quadratic transformations.
- Parameters:
- in_channels (int) – Number of input channels.
- hidden_channels (int) – Number of hidden channels.
- kernel_size (int) – Kernel size.
- layers (int) – Number of layers.
- bins (int) – Number of bins.
- tail_bound (float) – Tail bound value.
half_channels
Half of the input channels.
- Type: int
hidden_channels
Number of hidden channels.
- Type: int
bins
Number of bins.
- Type: int
tail_bound
Tail bound value.
- Type: float
input_conv
Initial convolution layer.
- Type: torch.nn.Conv1d
dds_conv
Dilated depth-separable conv layer.
proj
Projection layer.
Type: torch.nn.Conv1d
Parameters:
- x (Tensor) – Input tensor (B, channels, T).
- x_mask (Tensor) – Mask tensor (B,).
- g (Optional *[*Tensor ]) – Global conditioning tensor (B, channels, 1).
- inverse (bool) – Whether to inverse the flow.
Returns: Output tensor (B, channels, T). Tensor: Log-determinant tensor for NLL (B,) if not inverse.
Return type: Tensor
####### Examples
>>> conv_flow = ConvFlow(in_channels=16, hidden_channels=32, kernel_size=3,
... layers=2, bins=10, tail_bound=5.0)
>>> x = torch.randn(8, 16, 100) # Batch of 8, 16 channels, 100 time steps
>>> x_mask = torch.ones(8, 1, 100) # Mask with all values as 1
>>> output, logdet = conv_flow(x, x_mask)
>>> print(output.shape) # Should be (8, 16, 100)
>>> print(logdet.shape) # Should be (8,)
NOTE
The piecewise_rational_quadratic_transform function is used to compute the transformation of the second half of the input tensor.