espnet2.gan_codec.hificodec.module.Encoder
espnet2.gan_codec.hificodec.module.Encoder
class espnet2.gan_codec.hificodec.module.Encoder(resblock_num, resblock_kernel_sizes, resblock_dilation_sizes, upsample_rates, upsample_kernel_sizes)
Bases: Module
Encoder module for HiFi-GAN-based neural audio codec.
This Encoder processes input audio signals through a series of convolutional layers and residual blocks to generate a high-level representation of the audio. It is designed to be part of a GAN-based codec architecture.
num_kernels
Number of kernel sizes used in the residual blocks.
- Type: int
num_upsamples
Number of upsampling operations.
- Type: int
conv_pre
Initial convolutional layer.
- Type: torch.nn.Module
normalize
Group normalization layers for residual outputs.
- Type: nn.ModuleList
ups
List of upsampling convolutional layers.
- Type: nn.ModuleList
resblocks
List of residual blocks.
- Type: nn.ModuleList
conv_post
Final convolutional layer.
Type: torch.nn.Module
Parameters:
- resblock_num (str) – Type of residual block to use (“1” or “2”).
- resblock_kernel_sizes (List *[*int ]) – List of kernel sizes for residual blocks.
- resblock_dilation_sizes (List *[*int ]) – List of dilation sizes for residual blocks.
- upsample_rates (List *[*int ]) – List of upsampling rates.
- upsample_kernel_sizes (List *[*int ]) – List of kernel sizes for upsampling.
Returns: None
######### Examples
encoder = Encoder( : resblock_num=”1”, resblock_kernel_sizes=[3, 5], resblock_dilation_sizes=[1, 3], upsample_rates=[8, 8, 2], upsample_kernel_sizes=[16, 16, 4]
) output = encoder(torch.randn(1, 1, 16000)) # Example input tensor
NOTE
The encoder is part of a larger GAN-based audio codec architecture.
- Raises:ValueError – If invalid parameters are passed during initialization.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(x)
Perform the forward pass of the Encoder.
This method takes an input tensor and applies a series of convolutional layers, followed by activation functions and normalization, to produce the output tensor. The input is progressively upsampled and passed through residual blocks.
- Parameters:x (torch.Tensor) – Input tensor of shape (B, 1, T), where B is the batch size, 1 is the number of channels, and T is the length of the sequence.
- Returns: Output tensor of shape (B, 512, T’), where T’ is the : length of the output sequence after processing.
- Return type: torch.Tensor
######### Examples
>>> encoder = Encoder(resblock_num="1",
... resblock_kernel_sizes=[3, 5],
... resblock_dilation_sizes=[1, 2],
... upsample_rates=[2, 2],
... upsample_kernel_sizes=[4, 4])
>>> input_tensor = torch.randn(16, 1, 16000) # Example input
>>> output_tensor = encoder(input_tensor)
>>> print(output_tensor.shape) # Should output: torch.Size([16, 512, T'])
remove_weight_norm()
Remove weight normalization from all layers in the generator or encoder.
This method iterates through the layers of the generator or encoder and removes weight normalization from each layer, including the convolutional layers and residual blocks. It is typically called when the model is being finalized for inference or evaluation.
It prints a message indicating that weight normalization is being removed for clarity.
######### Examples
>>> generator = Generator(...)
>>> generator.remove_weight_norm()
Removing weight norm...
>>> encoder = Encoder(...)
>>> encoder.remove_weight_norm()
Removing weight norm...
NOTE
This operation is irreversible. Once weight normalization is removed, it cannot be added back without reinitializing the model.
- Raises:
- ValueError – If the model has not been properly initialized
- or if layers are missing weight normalization. –