espnet2.gan_tts.vits.residual_coupling.ResidualAffineCouplingBlock

About 2 min

espnet2.gan_tts.vits.residual_coupling.ResidualAffineCouplingBlock

class espnet2.gan_tts.vits.residual_coupling.ResidualAffineCouplingBlock(in_channels: int = 192, hidden_channels: int = 192, flows: int = 4, kernel_size: int = 5, base_dilation: int = 1, layers: int = 4, global_channels: int = -1, dropout_rate: float = 0.0, use_weight_norm: bool = True, bias: bool = True, use_only_mean: bool = True)

Bases: Module

Residual affine coupling block module.

This module implements a residual affine coupling block, which is used as “Flow” in Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech.

flows

A list of residual affine coupling layers and flip flows.

Type: ModuleList
Parameters:
- in_channels (int) – Number of input channels. Defaults to 192.
- hidden_channels (int) – Number of hidden channels. Defaults to 192.
- flows (int) – Number of flows. Defaults to 4.
- kernel_size (int) – Kernel size for WaveNet. Defaults to 5.
- base_dilation (int) – Base dilation factor for WaveNet. Defaults to 1.
- layers (int) – Number of layers of WaveNet. Defaults to 4.
- global_channels (int) – Number of global channels. Defaults to -1.
- dropout_rate (float) – Dropout rate. Defaults to 0.0.
- use_weight_norm (bool) – Whether to use weight normalization in WaveNet. Defaults to True.
- bias (bool) – Whether to use bias parameters in WaveNet. Defaults to True.
- use_only_mean (bool) – Whether to estimate only mean. Defaults to True.

####### Examples

>>> block = ResidualAffineCouplingBlock()
>>> x = torch.randn(1, 192, 100)  # (B, in_channels, T)
>>> x_mask = torch.ones(1, 192, 100)  # (B, in_channels, T)
>>> output = block(x, x_mask)

Raises:ValueError – If the input tensor does not match the expected dimensions.

Initilize ResidualAffineCouplingBlock module.

Parameters:
- in_channels (int) – Number of input channels.
- hidden_channels (int) – Number of hidden channels.
- flows (int) – Number of flows.
- kernel_size (int) – Kernel size for WaveNet.
- base_dilation (int) – Base dilation factor for WaveNet.
- layers (int) – Number of layers of WaveNet.
- stacks (int) – Number of stacks of WaveNet.
- global_channels (int) – Number of global channels.
- dropout_rate (float) – Dropout rate.
- use_weight_norm (bool) – Whether to use weight normalization in WaveNet.
- bias (bool) – Whether to use bias paramters in WaveNet.
- use_only_mean (bool) – Whether to estimate only mean.

forward(x: Tensor, x_mask: Tensor, g: Tensor | None = None, inverse: bool = False) → Tensor

Calculate forward propagation.

This method computes the forward pass through the residual affine coupling block. It can operate in both normal and inverse modes, depending on the inverse flag. When inverse is set to True, it performs the inverse operation of the flow.

Parameters:
- x (Tensor) – Input tensor of shape (B, in_channels, T), where B is the batch size, in_channels is the number of input channels, and T is the sequence length.
- x_mask (Tensor) – A mask tensor of shape (B, in_channels, T) that indicates valid input values (1 for valid, 0 for masked).
- g (Optional *[*Tensor ]) – Global conditioning tensor of shape (B, global_channels, 1). This is used for conditioning the flow.
- inverse (bool) – A flag indicating whether to perform the inverse flow operation. Defaults to False.
Returns: Output tensor of shape (B, in_channels, T) after applying the : coupling block. If inverse is False, the output is the transformed tensor; otherwise, it is the original tensor reconstructed from the inverse flow.
Return type: Tensor

####### Examples

>>> block = ResidualAffineCouplingBlock()
>>> x = torch.randn(8, 192, 100)  # Example input
>>> x_mask = torch.ones(8, 192, 100)  # Example mask
>>> output = block.forward(x, x_mask)
>>> output_inverse = block.forward(output, x_mask, inverse=True)