espnet2.gan_codec.hificodec.module.ResBlock1

About 2 min

espnet2.gan_codec.hificodec.module.ResBlock1

class espnet2.gan_codec.hificodec.module.ResBlock1(channels, kernel_size=3, dilation=(1, 3, 5))

Bases: Module

Residual Block with multiple convolutional layers.

This class implements a residual block that consists of multiple convolutional layers with different dilation rates. The output of each layer is passed through a leaky ReLU activation function, and the input is added to the output to create a residual connection.

convs1

A list of convolutional layers for the first part of the residual block with varying dilation rates.

Type: nn.ModuleList

convs2

A list of convolutional layers for the second part of the residual block with unit dilation.

Type: nn.ModuleList
Parameters:
- channels (int) – The number of input and output channels for the convolutional layers.
- kernel_size (int , optional) – The size of the convolutional kernel. Defaults to 3.
- dilation (tuple , optional) – A tuple containing the dilation rates for the first part of the block. Defaults to (1, 3, 5).

######### Examples

>>> res_block = ResBlock1(channels=64)
>>> x = torch.randn(1, 64, 128)  # Batch size of 1, 64 channels, length 128
>>> output = res_block(x)
>>> print(output.shape)
torch.Size([1, 64, 128])

####### NOTE The input tensor must have the shape (batch_size, channels, length).

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Forward pass through the generator network.

This method takes an input tensor x, processes it through several convolutional and residual blocks, and returns the output tensor after applying a series of transformations including upsampling and non-linear activations.

Parameters:x (torch.Tensor) – Input tensor of shape (B, C, T), where B is the batch size, C is the number of input channels, and T is the length of the input sequence.
Returns: Output tensor of shape (B, 1, T’), where T’ is the length of the output sequence, typically different from T due to the upsampling operations.
Return type: torch.Tensor

######### Examples

>>> generator = Generator(
...     upsample_rates=[4, 4, 4],
...     upsample_kernel_sizes=[8, 8, 8],
...     upsample_initial_channel=256,
...     resblock_num="1",
...     resblock_kernel_sizes=[3, 5, 7],
...     resblock_dilation_sizes=[1, 3, 5],
...     out_dim=1
... )
>>> input_tensor = torch.randn(1, 256, 128)  # Example input
>>> output_tensor = generator(input_tensor)
>>> output_tensor.shape
torch.Size([1, 1, T'])

####### NOTE The number of output channels is fixed to 1 as this is intended for audio generation tasks.

remove_weight_norm()

Remove weight normalization from the layers of the model.

This method iterates through the upsampling layers, residual blocks, and the initial and final convolution layers, removing weight normalization from each layer. This is often done before saving the model or when performing inference to ensure that the model operates with its learned weights without the effects of weight normalization.

None

Parameters:self – An instance of the Generator or Encoder class.
Returns: None

######### Examples

>>> generator = Generator(...)
>>> generator.remove_weight_norm()

>>> encoder = Encoder(...)
>>> encoder.remove_weight_norm()

####### NOTE It is important to ensure that the model is fully trained before removing weight normalization, as it may affect the model’s performance if done prematurely.