espnet2.enh.layers.ncsnpp_utils.layers.MeanPoolConv

About 1 min

espnet2.enh.layers.ncsnpp_utils.layers.MeanPoolConv

class espnet2.enh.layers.ncsnpp_utils.layers.MeanPoolConv(input_dim, output_dim, kernel_size=3, biases=True)

Bases: Module

Mean Pooling followed by a Convolutional layer.

This class applies a mean pooling operation on the input tensor, followed by a convolution operation. The mean pooling is performed over a 2x2 spatial region, which effectively reduces the spatial dimensions by half.

conv

The convolutional layer that processes the output from the mean pooling operation.

Type: nn.Conv2d
Parameters:
- input_dim (int) – The number of input channels.
- output_dim (int) – The number of output channels.
- kernel_size (int , optional) – The size of the convolutional kernel. Default is 3.
- biases (bool , optional) – If True, adds a learnable bias to the convolutional layer. Default is True.
Returns: The output of the convolution after applying mean pooling on the input.
Return type: Tensor

####### Examples

>>> mean_pool_conv = MeanPoolConv(input_dim=3, output_dim=16)
>>> input_tensor = torch.randn(1, 3, 64, 64)  # (batch_size, channels, height, width)
>>> output_tensor = mean_pool_conv(input_tensor)
>>> print(output_tensor.shape)
torch.Size([1, 16, 32, 32])  # Output shape after mean pooling and convolution

NOTE

The mean pooling operation averages the values in a 2x2 region across the input tensor, which reduces the spatial dimensions by a factor of 2. The convolutional layer then processes this pooled output to produce the final result.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(inputs)

Performs the forward pass of the Conditional Residual Block.

This method takes an input tensor and a conditional tensor, processes them through several convolutional layers, applies normalization and activation functions, and finally returns the output of the residual connection.

Parameters:
- x (torch.Tensor) – The input tensor of shape (B, C, H, W), where: B = batch size C = number of input channels H = height of the input tensor W = width of the input tensor
- y (torch.Tensor) – The conditional tensor of shape (B, num_classes), which is used for conditional normalization.
Returns: The output tensor of the same shape as the input tensor after applying the conditional residual operations.
Return type: torch.Tensor

####### Examples

>>> block = ConditionalResidualBlock(input_dim=64, output_dim=128,
...                                   num_classes=10)
>>> input_tensor = torch.randn(8, 64, 32, 32)  # Batch of 8 images
>>> conditional_tensor = torch.randint(0, 10, (8, 10))  # Batch of labels
>>> output = block(input_tensor, conditional_tensor)
>>> print(output.shape)  # Output shape should be (8, 128, 32, 32)

NOTE

The class should be initialized with the appropriate dimensions and normalization method to ensure proper functionality.

Raises:
- Exception – If the output dimension does not match the input dimension
- and resampling is not None. –