espnet2.gan_tts.parallel_wavegan.upsample.ConvInUpsampleNetwork

About 1 min

espnet2.gan_tts.parallel_wavegan.upsample.ConvInUpsampleNetwork

class espnet2.gan_tts.parallel_wavegan.upsample.ConvInUpsampleNetwork(upsample_scales: List[int], nonlinear_activation: str | None = None, nonlinear_activation_params: Dict[str, Any] = {}, interpolate_mode: str = 'nearest', freq_axis_kernel_size: int = 1, aux_channels: int = 80, aux_context_window: int = 0)

Bases: Module

Convolution + upsampling network module.

This module combines a convolutional layer with an upsampling network to process input tensors, typically for tasks like audio synthesis.

Parameters:
- upsample_scales (List *[*int ]) – List of upsampling scales.
- nonlinear_activation (Optional *[*str ]) – Activation function name.
- nonlinear_activation_params (Dict *[*str , Any ]) – Arguments for the specified activation function.
- interpolate_mode (str) – Interpolation mode.
- freq_axis_kernel_size (int) – Kernel size in the direction of frequency axis.
- aux_channels (int) – Number of channels of pre-conv layer.
- aux_context_window (int) – Context window size of the pre-conv layer.

####### Examples

>>> model = ConvInUpsampleNetwork(
...     upsample_scales=[2, 2],
...     nonlinear_activation='ReLU',
...     aux_channels=80,
...     aux_context_window=2
... )
>>> input_tensor = torch.randn(1, 80, 100)  # (B, C, T_feats)
>>> output_tensor = model(input_tensor)
>>> output_tensor.shape
torch.Size([1, 80, 400])  # (B, C, T_wav) where T_wav = T_feats * prod(upsample_scales)

Initialize ConvInUpsampleNetwork module.

Parameters:
- upsample_scales (list) – List of upsampling scales.
- nonlinear_activation (Optional *[*str ]) – Activation function name.
- nonlinear_activation_params (Dict *[*str , Any ]) – Arguments for the specified activation function.
- mode (str) – Interpolation mode.
- freq_axis_kernel_size (int) – Kernel size in the direction of frequency axis.
- aux_channels (int) – Number of channels of pre-conv layer.
- aux_context_window (int) – Context window size of the pre-conv layer.

forward(c: Tensor) → Tensor

Calculate forward propagation.

This method performs forward propagation through the convolutional and upsampling layers of the ConvInUpsampleNetwork. It processes the input tensor and produces an upsampled output tensor.

Parameters:c (Tensor) – Input tensor with shape (B, C, T_feats), where:
- B is the batch size.
- C is the number of channels.
- T_feats is the number of feature frames.
Returns: Upsampled tensor with shape (B, C, T_wav), where: : T_wav = T_feats * prod(upsample_scales), representing the total number of time steps after upsampling.
Return type: Tensor

####### Examples

>>> model = ConvInUpsampleNetwork(upsample_scales=[2, 2])
>>> input_tensor = torch.randn(4, 80, 10)  # Example input
>>> output_tensor = model(input_tensor)
>>> output_tensor.shape
torch.Size([4, 80, 40])  # Example output shape after upsampling

NOTE

The upsampling is performed sequentially through a series of Stretch2d and Conv2d layers, followed by an optional nonlinear activation function.