espnet2.gan_codec.shared.encoder.seanet_2d.SConv2d

About 2 min

espnet2.gan_codec.shared.encoder.seanet_2d.SConv2d

class espnet2.gan_codec.shared.encoder.seanet_2d.SConv2d(in_channels: int, out_channels: int, kernel_size: int | Tuple[int, int], stride: int | Tuple[int, int] = 1, dilation: int | Tuple[int, int] = 1, groups: int = 1, bias: bool = True, causal: bool = False, norm: str = 'none', norm_kwargs: Dict[str, Any] = {}, pad_mode: str = 'reflect')

Bases: Module

Conv2d with built-in handling of asymmetric or causal padding and normalization.

Note: Causal padding only makes sense on the time (last) axis. The frequency (second last) axis is always non-causally padded.

conv

The convolutional layer with normalization.

Type:NormConv2d

causal

Indicates if causal padding is used.

Type: bool

pad_mode

Padding mode for the convolutions.

Type: str
Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- kernel_size (Union *[*int , Tuple *[*int , int ] ]) – Size of the convolving kernel.
- stride (Union *[*int , Tuple *[*int , int ] ] , optional) – Stride of the convolution. Default is 1.
- dilation (Union *[*int , Tuple *[*int , int ] ] , optional) – Spacing between kernel elements. Default is 1.
- groups (int , optional) – Number of blocked connections from input to output. Default is 1.
- bias (bool , optional) – If True, adds a learnable bias to the output. Default is True.
- causal (bool , optional) – If True, applies causal padding. Default is False.
- norm (str , optional) – Normalization method to apply. Default is “none”.
- norm_kwargs (Dict *[*str , Any ] , optional) – Additional keyword arguments for normalization. Default is an empty dict.
- pad_mode (str , optional) – Padding mode for the convolution. Default is “reflect”.
Raises:AssertionError – If the input tensor does not have 4 dimensions.

####### Examples

>>> conv_layer = SConv2d(in_channels=1, out_channels=32, kernel_size=(3, 3))
>>> input_tensor = torch.randn(8, 1, 64, 64)  # Batch of 8, 1 channel, 64x64
>>> output_tensor = conv_layer(input_tensor)
>>> output_tensor.shape
torch.Size([8, 32, 62, 62])  # Output shape after convolution

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Applies the convolution and normalization to the input tensor.

This method takes an input tensor x and passes it through the convolutional layer followed by the normalization layer. It is assumed that the input tensor has 4 dimensions (B, C, F, T), where B is the batch size, C is the number of channels, F is the frequency dimension, and T is the time dimension.

Parameters:x (torch.Tensor) – Input tensor with shape (B, C, F, T).
Returns: The output tensor after applying the convolution and normalization, maintaining the same shape (B, C, F, T).
Return type: torch.Tensor
Raises:AssertionError – If the input tensor does not have 4 dimensions.

####### Examples

>>> sconv2d = SConv2d(in_channels=3, out_channels=16, kernel_size=(3, 3))
>>> input_tensor = torch.randn(8, 3, 64, 64)  # Example input
>>> output_tensor = sconv2d(input_tensor)
>>> output_tensor.shape
torch.Size([8, 16, 62, 62])  # Output shape will depend on kernel size