espnet2.enh.layers.dcunet.DCUNetComplexDecoderBlock

About 2 min

espnet2.enh.layers.dcunet.DCUNetComplexDecoderBlock

class espnet2.enh.layers.dcunet.DCUNetComplexDecoderBlock(in_chan, out_chan, kernel_size, stride, padding, dilation, output_padding=(0, 0), norm_type='bN', activation='leaky_relu', embed_dim=None, temb_layers=1, temb_activation='swish', complex_time_embedding=False)

Bases: Module

A complex-valued decoder block for the DCUNet architecture.

This block performs upsampling using transposed convolutions and includes normalization and activation functions. It can also integrate time embeddings to enhance the network’s capacity to learn from temporal features.

in_chan

Number of input channels.

Type: int

out_chan

Number of output channels.

Type: int

kernel_size

Size of the convolution kernel.

Type: tuple

stride

Stride of the convolution.

Type: tuple

padding

Padding added to both sides of the input.

Type: tuple

dilation

Dilation rate for the convolution.

Type: tuple

output_padding

Additional size added to the output.

Type: tuple

complex_time_embedding

Flag indicating if complex time embedding is used.

Type: bool

temb_layers

Number of layers for the time embedding.

Type: int

temb_activation

Activation function for the time embedding.

Type: str

embed_dim

Dimension of the embedding.

Type: int, optional

deconv

The transposed convolution layer.

Type: nn.Module

norm

Normalization layer.

Type: nn.Module

activation

Activation layer.

Type: nn.Module

embed_layer

Layer for processing time embeddings.

Type: nn.Sequential, optional
Parameters:
- in_chan (int) – Number of input channels.
- out_chan (int) – Number of output channels.
- kernel_size (tuple) – Size of the convolution kernel.
- stride (tuple) – Stride of the convolution.
- padding (tuple) – Padding added to both sides of the input.
- dilation (tuple) – Dilation rate for the convolution.
- output_padding (tuple , optional) – Additional size added to the output.
- norm_type (str , optional) – Type of normalization to use. Default is “bN”.
- activation (str , optional) – Activation function to use. Default is “leaky_relu”.
- embed_dim (int , optional) – Dimension of the embedding. Default is None.
- temb_layers (int , optional) – Number of layers for the time embedding. Default is 1.
- temb_activation (str , optional) – Activation function for the time embedding. Default is “swish”.
- complex_time_embedding (bool , optional) – Flag indicating if complex time embedding is used. Default is False.

####### Examples

>>> decoder_block = DCUNetComplexDecoderBlock(
...     in_chan=64,
...     out_chan=32,
...     kernel_size=(3, 3),
...     stride=(2, 2),
...     padding=(1, 1)
... )
>>> input_tensor = torch.randn(4, 64, 128, 128) + 1j * torch.randn(4, 64, 128, 128)
>>> output_tensor = decoder_block(input_tensor, t_embed=None)
>>> print(output_tensor.shape)
torch.Size([4, 32, 256, 256])

NOTE

The input tensor should be a complex-valued tensor where the real and imaginary parts are represented separately.

Raises:
- ValueError – If the input tensor dimensions do not match the expected
- shape or if the specified normalization type is not supported. –

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x, t_embed, output_size=None)

Performs the forward pass of the DCUNetComplexDecoderBlock.

This method takes an input tensor x and a time embedding tensor t_embed, processes them through a series of operations including complex transposed convolution, normalization, and activation, and returns the resulting tensor.

Parameters:
- x (Tensor) – Input tensor, expected shape is (batch, in_chan, height, width) where in_chan is the number of input channels.
- t_embed (Tensor) – Time embedding tensor, shape is expected to match the embedding dimensions used during initialization.
- output_size (tuple , optional) – If provided, specifies the target output size for the transposed convolution. The shape should be (batch, out_chan, target_height, target_width). Defaults to None, in which case the output size is determined automatically.
Returns: Output tensor after applying the decoder block, shape will be (batch, out_chan, height, width) or adjusted to match the provided output_size.
Return type: Tensor

####### Examples

>>> decoder_block = DCUNetComplexDecoderBlock(
...     in_chan=64, out_chan=32, kernel_size=(3, 3), stride=(2, 2),
...     padding=(1, 1), output_padding=(1, 1))
>>> x = torch.randn(4, 64, 16, 16)  # Example input tensor
>>> t_embed = torch.randn(4, 128)   # Example time embedding
>>> output = decoder_block(x, t_embed, output_size=(4, 32, 33, 33))
>>> print(output.shape)
torch.Size([4, 32, 33, 33])