espnet2.enh.layers.ncsnpp_utils.layers.ResnetBlockDDPM
espnet2.enh.layers.ncsnpp_utils.layers.ResnetBlockDDPM
class espnet2.enh.layers.ncsnpp_utils.layers.ResnetBlockDDPM(act, in_ch, out_ch=None, temb_dim=None, conv_shortcut=False, dropout=0.1)
Bases: Module
The ResNet Blocks used in DDPM.
This class implements a ResNet block specifically designed for diffusion models. It utilizes convolutional layers, group normalization, and activation functions to process input tensors. The block supports optional time embeddings for enhanced feature representation.
GroupNorm_0
First group normalization layer.
- Type: nn.GroupNorm
act
Activation function applied after normalization.
- Type: callable
Conv_0
First convolutional layer.
- Type: nn.Conv2d
Dense_0
Linear layer for time embedding (if provided).
- Type: nn.Linear
GroupNorm_1
Second group normalization layer.
- Type: nn.GroupNorm
Dropout_0
Dropout layer for regularization.
- Type: nn.Dropout
Conv_1
Second convolutional layer.
- Type: nn.Conv2d
Conv_2
Convolutional layer for shortcut connection (if needed).
- Type: nn.Conv2d
out_ch
Number of output channels.
- Type: int
in_ch
Number of input channels.
- Type: int
conv_shortcut
Flag to indicate if a convolutional shortcut should be used.
Type: bool
Parameters:
- act (callable) – Activation function to use.
- in_ch (int) – Number of input channels.
- out_ch (int , optional) – Number of output channels. Defaults to in_ch.
- temb_dim (int , optional) – Dimension of the time embedding. If provided, a linear layer will be added to process it.
- conv_shortcut (bool , optional) – If True, use a convolutional layer for the shortcut connection; otherwise, use a learnable linear layer. Defaults to False.
- dropout (float , optional) – Dropout probability. Defaults to 0.1.
Returns: The output tensor after passing through the ResNet block.
Return type: Tensor
####### Examples
>>> block = ResnetBlockDDPM(act=nn.ReLU(), in_ch=64, out_ch=128)
>>> input_tensor = torch.randn(8, 64, 32, 32) # (batch, channels, height, width)
>>> output_tensor = block(input_tensor)
>>> output_tensor.shape
torch.Size([8, 128, 32, 32]) # Output shape
NOTE
The output shape will change based on the out_ch parameter. If out_ch is not specified, it defaults to in_ch.
- Raises:AssertionError – If the input tensor does not have the expected number of channels.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(x, temb=None)
The ResNet Blocks used in DDPM.
This class implements a ResNet block specifically designed for the Denoising Diffusion Probabilistic Models (DDPM). It consists of convolutional layers, normalization, activation functions, and optional temporal embeddings for enhanced feature representation.
act
Activation function to be applied.
- Type: nn.Module
in_ch
Number of input channels.
- Type: int
out_ch
Number of output channels.
- Type: int
temb_dim
Dimension of the time embedding.
- Type: int, optional
conv_shortcut
Whether to use a convolutional shortcut.
- Type: bool
dropout
Dropout rate to apply after the first convolution.
- Type: float
GroupNorm_0
First group normalization layer.
- Type: nn.GroupNorm
Conv_0
First convolution layer.
- Type: nn.Conv2d
Dense_0
Dense layer for time embedding.
- Type: nn.Linear, optional
GroupNorm_1
Second group normalization layer.
- Type: nn.GroupNorm
Dropout_0
Dropout layer.
- Type: nn.Dropout
Conv_1
Second convolution layer.
- Type: nn.Conv2d
Conv_2
Convolution layer for shortcut.
- Type: nn.Conv2d, optional
NIN_0
Network-in-Network layer for shortcut.
Type:NIN, optional
Parameters:
- act (nn.Module) – Activation function to use (e.g., nn.ReLU).
- in_ch (int) – Number of input channels.
- out_ch (int , optional) – Number of output channels. Defaults to in_ch.
- temb_dim (int , optional) – Dimension of the temporal embedding.
- conv_shortcut (bool , optional) – Use convolutional shortcut. Defaults to False.
- dropout (float , optional) – Dropout rate. Defaults to 0.1.
Returns: The output of the ResNet block.
Return type: Tensor
####### Examples
>>> block = ResnetBlockDDPM(act=nn.ReLU(), in_ch=64, out_ch=128)
>>> input_tensor = torch.randn(8, 64, 32, 32) # (batch_size, channels, height, width)
>>> output_tensor = block(input_tensor)
>>> output_tensor.shape
torch.Size([8, 128, 32, 32]) # Output shape may vary based on the architecture
NOTE
The block uses a combination of convolutional layers, group normalization, and activation functions to allow for residual learning.
- Raises:AssertionError – If the input channels do not match the expected number.