espnet2.gan_tts.vits.flow.ConvFlow

About 2 min

espnet2.gan_tts.vits.flow.ConvFlow

class espnet2.gan_tts.vits.flow.ConvFlow(in_channels: int, hidden_channels: int, kernel_size: int, layers: int, bins: int = 10, tail_bound: float = 5.0)

Bases: Module

Convolutional flow module for generative modeling.

This module is part of a series of flow-based transformations used in the VITS model. It implements a convolutional flow that maps input tensors through a series of transformations.

half_channels

Number of input channels divided by two.

Type: int

hidden_channels

Number of hidden channels.

Type: int

bins

Number of bins for the transformation.

Type: int

tail_bound

Tail bound value for the transformation.

Type: float
Parameters:
- in_channels (int) – Number of input channels.
- hidden_channels (int) – Number of hidden channels.
- kernel_size (int) – Kernel size for the convolution.
- layers (int) – Number of layers in the convolutional flow.
- bins (int , optional) – Number of bins for the transformation (default: 10).
- tail_bound (float , optional) – Tail bound value for the transformation (default: 5.0).
Returns: Output tensor if not inverse, or a tuple containing the output tensor and the log-determinant tensor for NLL if inverse is False.
Return type: Union[Tensor, Tuple[Tensor, Tensor]]

####### Examples

>>> conv_flow = ConvFlow(in_channels=64, hidden_channels=128, kernel_size=3, layers=4)
>>> x = torch.randn(32, 64, 100)  # (B, channels, T)
>>> x_mask = torch.ones(32, 1, 100)  # (B, 1, T)
>>> output, logdet = conv_flow(x, x_mask)  # Forward propagation
>>> output_inv = conv_flow(x, x_mask, inverse=True)  # Inverse propagation

NOTE

This implementation relies on a piecewise rational quadratic transform for the final mapping of the input.

Initialize ConvFlow module.

Parameters:
- in_channels (int) – Number of input channels.
- hidden_channels (int) – Number of hidden channels.
- kernel_size (int) – Kernel size.
- layers (int) – Number of layers.
- bins (int) – Number of bins.
- tail_bound (float) – Tail bound value.

forward(x: Tensor, x_mask: Tensor, g: Tensor | None = None, inverse: bool = False) → Tensor | Tuple[Tensor, Tensor]

Convolutional flow module.

This module implements a convolutional flow for generative modeling. It transforms an input tensor using a series of convolutional layers and piecewise rational quadratic transformations.

Parameters:
- in_channels (int) – Number of input channels.
- hidden_channels (int) – Number of hidden channels.
- kernel_size (int) – Kernel size.
- layers (int) – Number of layers.
- bins (int) – Number of bins.
- tail_bound (float) – Tail bound value.

half_channels

Half of the input channels.

Type: int

hidden_channels

Number of hidden channels.

Type: int

bins

Number of bins.

Type: int

tail_bound

Tail bound value.

Type: float

input_conv

Initial convolution layer.

Type: torch.nn.Conv1d

dds_conv

Dilated depth-separable conv layer.

Type:DilatedDepthSeparableConv

proj

Projection layer.

Type: torch.nn.Conv1d
Parameters:
- x (Tensor) – Input tensor (B, channels, T).
- x_mask (Tensor) – Mask tensor (B,).
- g (Optional *[*Tensor ]) – Global conditioning tensor (B, channels, 1).
- inverse (bool) – Whether to inverse the flow.
Returns: Output tensor (B, channels, T). Tensor: Log-determinant tensor for NLL (B,) if not inverse.
Return type: Tensor

####### Examples

>>> conv_flow = ConvFlow(in_channels=16, hidden_channels=32, kernel_size=3,
...                       layers=2, bins=10, tail_bound=5.0)
>>> x = torch.randn(8, 16, 100)  # Batch of 8, 16 channels, 100 time steps
>>> x_mask = torch.ones(8, 1, 100)  # Mask with all values as 1
>>> output, logdet = conv_flow(x, x_mask)
>>> print(output.shape)  # Should be (8, 16, 100)
>>> print(logdet.shape)  # Should be (8,)

NOTE

The piecewise_rational_quadratic_transform function is used to compute the transformation of the second half of the input tensor.