espnet2.enh.layers.dcunet.GaussianFourierProjection

About 1 min

espnet2.enh.layers.dcunet.GaussianFourierProjection

class espnet2.enh.layers.dcunet.GaussianFourierProjection(embed_dim, scale=16, complex_valued=False)

Bases: Module

Gaussian random features for encoding time steps.

This module generates Gaussian random features for encoding time steps, which can be used in neural networks to capture temporal information. Depending on the complex_valued flag, it can produce either complex or real-valued outputs.

complex_valued

Indicates whether the output should be complex valued.

Type: bool

Fixed random weights sampled from a Gaussian distribution, scaled by a specified factor.

Type: nn.Parameter
Parameters:
- embed_dim (int) – The dimension of the embedding space.
- scale (float , optional) – Scaling factor for the weights. Default is 16.
- complex_valued (bool , optional) – If True, the output will be complex valued. Default is False.
Returns: A tensor of shape (batch_size, embed_dim) containing the encoded time steps.
Return type: Tensor

####### Examples

>>> gfp = GaussianFourierProjection(embed_dim=128, scale=16)
>>> time_steps = torch.tensor([[0.1], [0.2], [0.3]])
>>> output = gfp(time_steps)
>>> output.shape
torch.Size([3, 128])  # For real-valued output
>>> gfp_complex = GaussianFourierProjection(embed_dim=128, scale=16,
...                                          complex_valued=True)
>>> output_complex = gfp_complex(time_steps)
>>> output_complex.shape
torch.Size([3, 128])  # For complex-valued output

NOTE

The output for real-valued features consists of concatenated sine and cosine components, while for complex-valued features, it directly outputs complex exponentials.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(t)

Forward pass for the DCUNet model.

This method processes the input complex spectrogram tensor through the network layers to produce an output tensor. The input is expected to be of shape (batch, nfreqs, time), where nfreqs - 1 must be divisible by the frequency strides of the encoders, and time - 1 must be divisible by the time strides of the encoders.

Parameters:spec (Tensor) – A complex spectrogram tensor with shape (batch, nfreqs, time). It can be a 1D, 2D, or 3D tensor, where the time dimension is expected to be the last dimension.
Returns: The output tensor, which has a shape of (batch, time) : or (time) depending on the configuration of the model.
Return type: Tensor

####### Examples

>>> net = DCUNet()
>>> dnn_input = torch.randn(4, 2, 257, 256) + 1j * torch.randn(4, 2, 257, 256)
>>> output = net(dnn_input, torch.randn(4))
>>> print(output.shape)
torch.Size([4, 2, 257, 256])  # Example output shape, actual may vary.

NOTE

Ensure that the input dimensions meet the requirements of the model configuration to avoid runtime errors.

Raises:TypeError – If the input shape is not compatible with the expected dimensions or if the input is not divisible by the specified frequency or time products.