espnet2.enh.layers.dcunet.DCUNetComplexDecoderBlock
espnet2.enh.layers.dcunet.DCUNetComplexDecoderBlock
class espnet2.enh.layers.dcunet.DCUNetComplexDecoderBlock(in_chan, out_chan, kernel_size, stride, padding, dilation, output_padding=(0, 0), norm_type='bN', activation='leaky_relu', embed_dim=None, temb_layers=1, temb_activation='swish', complex_time_embedding=False)
Bases: Module
A complex-valued decoder block for the DCUNet architecture.
This block performs upsampling using transposed convolutions and includes normalization and activation functions. It can also integrate time embeddings to enhance the network’s capacity to learn from temporal features.
in_chan
Number of input channels.
- Type: int
out_chan
Number of output channels.
- Type: int
kernel_size
Size of the convolution kernel.
- Type: tuple
stride
Stride of the convolution.
- Type: tuple
padding
Padding added to both sides of the input.
- Type: tuple
dilation
Dilation rate for the convolution.
- Type: tuple
output_padding
Additional size added to the output.
- Type: tuple
complex_time_embedding
Flag indicating if complex time embedding is used.
- Type: bool
temb_layers
Number of layers for the time embedding.
- Type: int
temb_activation
Activation function for the time embedding.
- Type: str
embed_dim
Dimension of the embedding.
- Type: int, optional
deconv
The transposed convolution layer.
- Type: nn.Module
norm
Normalization layer.
- Type: nn.Module
activation
Activation layer.
- Type: nn.Module
embed_layer
Layer for processing time embeddings.
Type: nn.Sequential, optional
Parameters:
- in_chan (int) – Number of input channels.
- out_chan (int) – Number of output channels.
- kernel_size (tuple) – Size of the convolution kernel.
- stride (tuple) – Stride of the convolution.
- padding (tuple) – Padding added to both sides of the input.
- dilation (tuple) – Dilation rate for the convolution.
- output_padding (tuple , optional) – Additional size added to the output.
- norm_type (str , optional) – Type of normalization to use. Default is “bN”.
- activation (str , optional) – Activation function to use. Default is “leaky_relu”.
- embed_dim (int , optional) – Dimension of the embedding. Default is None.
- temb_layers (int , optional) – Number of layers for the time embedding. Default is 1.
- temb_activation (str , optional) – Activation function for the time embedding. Default is “swish”.
- complex_time_embedding (bool , optional) – Flag indicating if complex time embedding is used. Default is False.
####### Examples
>>> decoder_block = DCUNetComplexDecoderBlock(
... in_chan=64,
... out_chan=32,
... kernel_size=(3, 3),
... stride=(2, 2),
... padding=(1, 1)
... )
>>> input_tensor = torch.randn(4, 64, 128, 128) + 1j * torch.randn(4, 64, 128, 128)
>>> output_tensor = decoder_block(input_tensor, t_embed=None)
>>> print(output_tensor.shape)
torch.Size([4, 32, 256, 256])
NOTE
The input tensor should be a complex-valued tensor where the real and imaginary parts are represented separately.
- Raises:
- ValueError – If the input tensor dimensions do not match the expected
- shape or if the specified normalization type is not supported. –
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(x, t_embed, output_size=None)
Performs the forward pass of the DCUNetComplexDecoderBlock.
This method takes an input tensor x and a time embedding tensor t_embed, processes them through a series of operations including complex transposed convolution, normalization, and activation, and returns the resulting tensor.
- Parameters:
- x (Tensor) – Input tensor, expected shape is (batch, in_chan, height, width) where in_chan is the number of input channels.
- t_embed (Tensor) – Time embedding tensor, shape is expected to match the embedding dimensions used during initialization.
- output_size (tuple , optional) – If provided, specifies the target output size for the transposed convolution. The shape should be (batch, out_chan, target_height, target_width). Defaults to None, in which case the output size is determined automatically.
- Returns: Output tensor after applying the decoder block, shape will be (batch, out_chan, height, width) or adjusted to match the provided output_size.
- Return type: Tensor
####### Examples
>>> decoder_block = DCUNetComplexDecoderBlock(
... in_chan=64, out_chan=32, kernel_size=(3, 3), stride=(2, 2),
... padding=(1, 1), output_padding=(1, 1))
>>> x = torch.randn(4, 64, 16, 16) # Example input tensor
>>> t_embed = torch.randn(4, 128) # Example time embedding
>>> output = decoder_block(x, t_embed, output_size=(4, 32, 33, 33))
>>> print(output.shape)
torch.Size([4, 32, 33, 33])