espnet2.enh.layers.ncsnpp_utils.layers.ResnetBlockDDPM

About 3 min

espnet2.enh.layers.ncsnpp_utils.layers.ResnetBlockDDPM

class espnet2.enh.layers.ncsnpp_utils.layers.ResnetBlockDDPM(act, in_ch, out_ch=None, temb_dim=None, conv_shortcut=False, dropout=0.1)

Bases: Module

The ResNet Blocks used in DDPM.

This class implements a ResNet block specifically designed for diffusion models. It utilizes convolutional layers, group normalization, and activation functions to process input tensors. The block supports optional time embeddings for enhanced feature representation.

GroupNorm_0

First group normalization layer.

Type: nn.GroupNorm

act

Activation function applied after normalization.

Type: callable

Conv_0

First convolutional layer.

Type: nn.Conv2d

Dense_0

Linear layer for time embedding (if provided).

Type: nn.Linear

GroupNorm_1

Second group normalization layer.

Type: nn.GroupNorm

Dropout_0

Dropout layer for regularization.

Type: nn.Dropout

Conv_1

Second convolutional layer.

Type: nn.Conv2d

Conv_2

Convolutional layer for shortcut connection (if needed).

Type: nn.Conv2d

out_ch

Number of output channels.

Type: int

in_ch

Number of input channels.

Type: int

conv_shortcut

Flag to indicate if a convolutional shortcut should be used.

Type: bool
Parameters:
- act (callable) – Activation function to use.
- in_ch (int) – Number of input channels.
- out_ch (int , optional) – Number of output channels. Defaults to in_ch.
- temb_dim (int , optional) – Dimension of the time embedding. If provided, a linear layer will be added to process it.
- conv_shortcut (bool , optional) – If True, use a convolutional layer for the shortcut connection; otherwise, use a learnable linear layer. Defaults to False.
- dropout (float , optional) – Dropout probability. Defaults to 0.1.
Returns: The output tensor after passing through the ResNet block.
Return type: Tensor

####### Examples

>>> block = ResnetBlockDDPM(act=nn.ReLU(), in_ch=64, out_ch=128)
>>> input_tensor = torch.randn(8, 64, 32, 32)  # (batch, channels, height, width)
>>> output_tensor = block(input_tensor)
>>> output_tensor.shape
torch.Size([8, 128, 32, 32])  # Output shape

NOTE

The output shape will change based on the out_ch parameter. If out_ch is not specified, it defaults to in_ch.

Raises:AssertionError – If the input tensor does not have the expected number of channels.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x, temb=None)

The ResNet Blocks used in DDPM.

This class implements a ResNet block specifically designed for the Denoising Diffusion Probabilistic Models (DDPM). It consists of convolutional layers, normalization, activation functions, and optional temporal embeddings for enhanced feature representation.

act

Activation function to be applied.

Type: nn.Module

in_ch

Number of input channels.

Type: int

out_ch

Number of output channels.

Type: int

temb_dim

Dimension of the time embedding.

Type: int, optional

conv_shortcut

Whether to use a convolutional shortcut.

Type: bool

dropout

Dropout rate to apply after the first convolution.

Type: float

GroupNorm_0

First group normalization layer.

Type: nn.GroupNorm

Conv_0

First convolution layer.

Type: nn.Conv2d

Dense_0

Dense layer for time embedding.

Type: nn.Linear, optional

GroupNorm_1

Second group normalization layer.

Type: nn.GroupNorm

Dropout_0

Dropout layer.

Type: nn.Dropout

Conv_1

Second convolution layer.

Type: nn.Conv2d

Conv_2

Convolution layer for shortcut.

Type: nn.Conv2d, optional

NIN_0

Network-in-Network layer for shortcut.

Type:NIN, optional
Parameters:
- act (nn.Module) – Activation function to use (e.g., nn.ReLU).
- in_ch (int) – Number of input channels.
- out_ch (int , optional) – Number of output channels. Defaults to in_ch.
- temb_dim (int , optional) – Dimension of the temporal embedding.
- conv_shortcut (bool , optional) – Use convolutional shortcut. Defaults to False.
- dropout (float , optional) – Dropout rate. Defaults to 0.1.
Returns: The output of the ResNet block.
Return type: Tensor

####### Examples

>>> block = ResnetBlockDDPM(act=nn.ReLU(), in_ch=64, out_ch=128)
>>> input_tensor = torch.randn(8, 64, 32, 32)  # (batch_size, channels, height, width)
>>> output_tensor = block(input_tensor)
>>> output_tensor.shape
torch.Size([8, 128, 32, 32])  # Output shape may vary based on the architecture

NOTE

The block uses a combination of convolutional layers, group normalization, and activation functions to allow for residual learning.

Raises:AssertionError – If the input channels do not match the expected number.