espnet2.gan_tts.style_melgan.tade_res_block.TADEResBlock

About 1 min

espnet2.gan_tts.style_melgan.tade_res_block.TADEResBlock

class espnet2.gan_tts.style_melgan.tade_res_block.TADEResBlock(in_channels: int = 64, aux_channels: int = 80, kernel_size: int = 9, dilation: int = 2, bias: bool = True, upsample_factor: int = 2, upsample_mode: str = 'nearest', gated_function: str = 'softmax')

Bases: Module

TADEResBlock module for the StyleMelGAN architecture.

This module implements a residual block that incorporates two TADE layers for style-based mel-spectrogram generation. It processes input and auxiliary tensors to produce an output tensor while applying gated convolutions and upsampling. This design is adapted from the original ParallelWaveGAN code.

tade1

The first TADE layer in the residual block.

Type:TADELayer

gated_conv1

The first gated convolution layer.

Type:Conv1d

tade2

The second TADE layer in the residual block.

Type:TADELayer

gated_conv2

The second gated convolution layer.

Type:Conv1d

upsample

Upsampling layer for the output tensor.

Type:Upsample

gated_function

The gating function applied in the block.

Type: Callable
Parameters:
- in_channels (int) – Number of input channels. Default is 64.
- aux_channels (int) – Number of auxiliary channels. Default is 80.
- kernel_size (int) – Size of the convolutional kernel. Default is 9.
- dilation (int) – Dilation rate for the second gated convolution. Default is 2.
- bias (bool) – Whether to use a bias parameter in convolutions. Default is True.
- upsample_factor (int) – Factor by which to upsample the output. Default is 2.
- upsample_mode (str) – Mode of upsampling (e.g., ‘nearest’). Default is ‘nearest’.
- gated_function (str) – Type of gated function (‘softmax’ or ‘sigmoid’). Default is ‘softmax’.
Raises:ValueError – If an unsupported gated_function type is provided.

Examples

>>> res_block = TADEResBlock(in_channels=64, aux_channels=80)
>>> x = torch.randn(1, 64, 100)  # Input tensor
>>> c = torch.randn(1, 80, 50)    # Auxiliary tensor
>>> output, aux = res_block(x, c)
>>> print(output.shape)  # Output shape will be (1, 64, 200)

Initialize TADEResBlock module.

Parameters:
- in_channels (int) – Number of input channles.
- aux_channels (int) – Number of auxirialy channles.
- kernel_size (int) – Kernel size.
- bias (bool) – Whether to use bias parameter in conv.
- upsample_factor (int) – Upsample factor.
- upsample_mode (str) – Upsample mode.
- gated_function (str) – Gated function type (softmax of sigmoid).

forward(x: Tensor, c: Tensor) → Tensor

Calculate forward propagation.

Parameters:
- x (Tensor) – Input tensor (B, in_channels, T).
- c (Tensor) – Auxiliary input tensor (B, aux_channels, T’).
Returns: Output tensor (B, in_channels, T * in_upsample_factor). Tensor: Upsampled auxirialy tensor (B, in_channels, T * in_upsample_factor).
Return type: Tensor