espnet2.enh.layers.bsrnn.ChannelFreqwiseLayerNorm

About 2 min

espnet2.enh.layers.bsrnn.ChannelFreqwiseLayerNorm

class espnet2.enh.layers.bsrnn.ChannelFreqwiseLayerNorm(channel_size, shape='BDTF')

Bases: Module

Channel-and-Frequency-wise Layer Normalization (cfLN).

This layer normalizes the input tensor across the channel and frequency dimensions, improving the training stability and convergence of neural networks. It computes the mean and variance for each channel across the frequency dimension and normalizes the input accordingly.

gamma

Scale parameter for the normalization.

Type: torch.Parameter

beta

Shift parameter for the normalization.

Type: torch.Parameter

shape

The shape of the input tensor, either “BDTF” or “BTFD”.

Type: str
Parameters:
- channel_size (int) – The number of channels in the input tensor.
- shape (str) – The shape of the input tensor; must be “BDTF” or “BTFD”.
Raises:AssertionError – If the provided shape is not “BDTF” or “BTFD”.

######### Examples

>>> layer_norm = ChannelFreqwiseLayerNorm(channel_size=16, shape="BDTF")
>>> input_tensor = torch.randn(8, 16, 100, 50)  # (B, C, T, K)
>>> output_tensor = layer_norm(input_tensor)
>>> output_tensor.shape
torch.Size([8, 16, 100, 50])

####### NOTE The normalization is performed in a way that preserves the input shape.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(y)

Channel-and-Frequency-wise Layer Normalization (cfLN).

This class implements a normalization layer that performs normalization across both the channel and frequency dimensions. It helps in stabilizing the training process by reducing internal covariate shift.

gamma

Learnable scale parameter of shape [1, N, 1, 1].

Type: torch.Parameter

beta

Learnable shift parameter of shape [1, N, 1, 1].

Type: torch.Parameter

shape

The shape of the input tensor, either “BDTF” or “BTFD”.

Type: str
Parameters:
- channel_size (int) – The number of channels to normalize.
- shape (str) – The shape of the input tensor, either “BDTF” or “BTFD”.
Raises:AssertionError – If the shape is not “BDTF” or “BTFD”.

######### Examples

>>> layer_norm = ChannelFreqwiseLayerNorm(channel_size=64)
>>> input_tensor = torch.randn(32, 64, 128, 256)  # [M, N, T, K]
>>> output_tensor = layer_norm(input_tensor)
>>> print(output_tensor.shape)  # Output shape will be [32, 64, 128, 256]

####### NOTE This implementation uses PyTorch’s automatic mixed precision (AMP) for forward pass.

reset_parameters()

Channel-and-Frequency-wise Layer Normalization (cfLN).

This layer normalizes the input tensor across both the channel and frequency dimensions, allowing for improved training stability and performance in deep learning models. It applies normalization in a way that considers the interdependence between channels and frequencies.

gamma

Learnable scale parameter of shape [1, N, 1, 1].

Type: torch.Tensor

beta

Learnable shift parameter of shape [1, N, 1, 1].

Type: torch.Tensor

shape

Specifies the input tensor shape, either “BDTF” or “BTFD”.

Type: str
Parameters:
- channel_size (int) – The number of channels in the input tensor.
- shape (str) – The shape of the input tensor. It can be either “BDTF” (Batch, Depth, Time, Frequency) or “BTFD” (Batch, Time, Frequency, Depth).

######### Examples

>>> layer_norm = ChannelFreqwiseLayerNorm(channel_size=128)
>>> input_tensor = torch.randn(32, 128, 50, 50)  # [Batch, Channel, Time, Frequency]
>>> output_tensor = layer_norm(input_tensor)

####### NOTE The normalization is performed using the formula: gLN_y = γ * (y - mean) / sqrt(var + EPS) + β where mean and var are calculated across the channel and frequency dimensions.