espnet2.gan_tts.parallel_wavegan.upsample.ConvInUpsampleNetwork
espnet2.gan_tts.parallel_wavegan.upsample.ConvInUpsampleNetwork
class espnet2.gan_tts.parallel_wavegan.upsample.ConvInUpsampleNetwork(upsample_scales: List[int], nonlinear_activation: str | None = None, nonlinear_activation_params: Dict[str, Any] = {}, interpolate_mode: str = 'nearest', freq_axis_kernel_size: int = 1, aux_channels: int = 80, aux_context_window: int = 0)
Bases: Module
Convolution + upsampling network module.
This module combines a convolutional layer with an upsampling network to process input tensors, typically for tasks like audio synthesis.
- Parameters:
- upsample_scales (List *[*int ]) – List of upsampling scales.
- nonlinear_activation (Optional *[*str ]) – Activation function name.
- nonlinear_activation_params (Dict *[*str , Any ]) – Arguments for the specified activation function.
- interpolate_mode (str) – Interpolation mode.
- freq_axis_kernel_size (int) – Kernel size in the direction of frequency axis.
- aux_channels (int) – Number of channels of pre-conv layer.
- aux_context_window (int) – Context window size of the pre-conv layer.
####### Examples
>>> model = ConvInUpsampleNetwork(
... upsample_scales=[2, 2],
... nonlinear_activation='ReLU',
... aux_channels=80,
... aux_context_window=2
... )
>>> input_tensor = torch.randn(1, 80, 100) # (B, C, T_feats)
>>> output_tensor = model(input_tensor)
>>> output_tensor.shape
torch.Size([1, 80, 400]) # (B, C, T_wav) where T_wav = T_feats * prod(upsample_scales)
Initialize ConvInUpsampleNetwork module.
- Parameters:
- upsample_scales (list) – List of upsampling scales.
- nonlinear_activation (Optional *[*str ]) – Activation function name.
- nonlinear_activation_params (Dict *[*str , Any ]) – Arguments for the specified activation function.
- mode (str) – Interpolation mode.
- freq_axis_kernel_size (int) – Kernel size in the direction of frequency axis.
- aux_channels (int) – Number of channels of pre-conv layer.
- aux_context_window (int) – Context window size of the pre-conv layer.
forward(c: Tensor) → Tensor
Calculate forward propagation.
This method performs forward propagation through the convolutional and upsampling layers of the ConvInUpsampleNetwork. It processes the input tensor and produces an upsampled output tensor.
- Parameters:c (Tensor) – Input tensor with shape (B, C, T_feats), where:
- B is the batch size.
- C is the number of channels.
- T_feats is the number of feature frames.
- Returns: Upsampled tensor with shape (B, C, T_wav), where: : T_wav = T_feats * prod(upsample_scales), representing the total number of time steps after upsampling.
- Return type: Tensor
####### Examples
>>> model = ConvInUpsampleNetwork(upsample_scales=[2, 2])
>>> input_tensor = torch.randn(4, 80, 10) # Example input
>>> output_tensor = model(input_tensor)
>>> output_tensor.shape
torch.Size([4, 80, 40]) # Example output shape after upsampling
NOTE
The upsampling is performed sequentially through a series of Stretch2d and Conv2d layers, followed by an optional nonlinear activation function.