espnet2.gan_codec.hificodec.module.Encoder

About 2 min

espnet2.gan_codec.hificodec.module.Encoder

class espnet2.gan_codec.hificodec.module.Encoder(resblock_num, resblock_kernel_sizes, resblock_dilation_sizes, upsample_rates, upsample_kernel_sizes)

Bases: Module

Encoder module for HiFi-GAN-based neural audio codec.

This Encoder processes input audio signals through a series of convolutional layers and residual blocks to generate a high-level representation of the audio. It is designed to be part of a GAN-based codec architecture.

num_kernels

Number of kernel sizes used in the residual blocks.

Type: int

num_upsamples

Number of upsampling operations.

Type: int

conv_pre

Initial convolutional layer.

Type: torch.nn.Module

normalize

Group normalization layers for residual outputs.

Type: nn.ModuleList

ups

List of upsampling convolutional layers.

Type: nn.ModuleList

resblocks

List of residual blocks.

Type: nn.ModuleList

conv_post

Final convolutional layer.

Type: torch.nn.Module
Parameters:
- resblock_num (str) – Type of residual block to use (“1” or “2”).
- resblock_kernel_sizes (List *[*int ]) – List of kernel sizes for residual blocks.
- resblock_dilation_sizes (List *[*int ]) – List of dilation sizes for residual blocks.
- upsample_rates (List *[*int ]) – List of upsampling rates.
- upsample_kernel_sizes (List *[*int ]) – List of kernel sizes for upsampling.
Returns: None

######### Examples

encoder = Encoder( : resblock_num=”1”, resblock_kernel_sizes=[3, 5], resblock_dilation_sizes=[1, 3], upsample_rates=[8, 8, 2], upsample_kernel_sizes=[16, 16, 4]

) output = encoder(torch.randn(1, 1, 16000)) # Example input tensor

NOTE

The encoder is part of a larger GAN-based audio codec architecture.

Raises:ValueError – If invalid parameters are passed during initialization.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Perform the forward pass of the Encoder.

This method takes an input tensor and applies a series of convolutional layers, followed by activation functions and normalization, to produce the output tensor. The input is progressively upsampled and passed through residual blocks.

Parameters:x (torch.Tensor) – Input tensor of shape (B, 1, T), where B is the batch size, 1 is the number of channels, and T is the length of the sequence.
Returns: Output tensor of shape (B, 512, T’), where T’ is the : length of the output sequence after processing.
Return type: torch.Tensor

######### Examples

>>> encoder = Encoder(resblock_num="1",
...                    resblock_kernel_sizes=[3, 5],
...                    resblock_dilation_sizes=[1, 2],
...                    upsample_rates=[2, 2],
...                    upsample_kernel_sizes=[4, 4])
>>> input_tensor = torch.randn(16, 1, 16000)  # Example input
>>> output_tensor = encoder(input_tensor)
>>> print(output_tensor.shape)  # Should output: torch.Size([16, 512, T'])

remove_weight_norm()

Remove weight normalization from all layers in the generator or encoder.

This method iterates through the layers of the generator or encoder and removes weight normalization from each layer, including the convolutional layers and residual blocks. It is typically called when the model is being finalized for inference or evaluation.

It prints a message indicating that weight normalization is being removed for clarity.

######### Examples

>>> generator = Generator(...)
>>> generator.remove_weight_norm()
Removing weight norm...

>>> encoder = Encoder(...)
>>> encoder.remove_weight_norm()
Removing weight norm...

NOTE

This operation is irreversible. Once weight normalization is removed, it cannot be added back without reinitializing the model.

Raises:
- ValueError – If the model has not been properly initialized
- or if layers are missing weight normalization. –