espnet2.enh.layers.bsrnn.ChannelFreqwiseLayerNorm
espnet2.enh.layers.bsrnn.ChannelFreqwiseLayerNorm
class espnet2.enh.layers.bsrnn.ChannelFreqwiseLayerNorm(channel_size, shape='BDTF')
Bases: Module
Channel-and-Frequency-wise Layer Normalization (cfLN).
This layer normalizes the input tensor across the channel and frequency dimensions, improving the training stability and convergence of neural networks. It computes the mean and variance for each channel across the frequency dimension and normalizes the input accordingly.
gamma
Scale parameter for the normalization.
- Type: torch.Parameter
beta
Shift parameter for the normalization.
- Type: torch.Parameter
shape
The shape of the input tensor, either “BDTF” or “BTFD”.
Type: str
Parameters:
- channel_size (int) – The number of channels in the input tensor.
- shape (str) – The shape of the input tensor; must be “BDTF” or “BTFD”.
Raises:AssertionError – If the provided shape is not “BDTF” or “BTFD”.
######### Examples
>>> layer_norm = ChannelFreqwiseLayerNorm(channel_size=16, shape="BDTF")
>>> input_tensor = torch.randn(8, 16, 100, 50) # (B, C, T, K)
>>> output_tensor = layer_norm(input_tensor)
>>> output_tensor.shape
torch.Size([8, 16, 100, 50])
####### NOTE The normalization is performed in a way that preserves the input shape.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(y)
Channel-and-Frequency-wise Layer Normalization (cfLN).
This class implements a normalization layer that performs normalization across both the channel and frequency dimensions. It helps in stabilizing the training process by reducing internal covariate shift.
gamma
Learnable scale parameter of shape [1, N, 1, 1].
- Type: torch.Parameter
beta
Learnable shift parameter of shape [1, N, 1, 1].
- Type: torch.Parameter
shape
The shape of the input tensor, either “BDTF” or “BTFD”.
Type: str
Parameters:
- channel_size (int) – The number of channels to normalize.
- shape (str) – The shape of the input tensor, either “BDTF” or “BTFD”.
Raises:AssertionError – If the shape is not “BDTF” or “BTFD”.
######### Examples
>>> layer_norm = ChannelFreqwiseLayerNorm(channel_size=64)
>>> input_tensor = torch.randn(32, 64, 128, 256) # [M, N, T, K]
>>> output_tensor = layer_norm(input_tensor)
>>> print(output_tensor.shape) # Output shape will be [32, 64, 128, 256]
####### NOTE This implementation uses PyTorch’s automatic mixed precision (AMP) for forward pass.
reset_parameters()
Channel-and-Frequency-wise Layer Normalization (cfLN).
This layer normalizes the input tensor across both the channel and frequency dimensions, allowing for improved training stability and performance in deep learning models. It applies normalization in a way that considers the interdependence between channels and frequencies.
gamma
Learnable scale parameter of shape [1, N, 1, 1].
- Type: torch.Tensor
beta
Learnable shift parameter of shape [1, N, 1, 1].
- Type: torch.Tensor
shape
Specifies the input tensor shape, either “BDTF” or “BTFD”.
Type: str
Parameters:
- channel_size (int) – The number of channels in the input tensor.
- shape (str) – The shape of the input tensor. It can be either “BDTF” (Batch, Depth, Time, Frequency) or “BTFD” (Batch, Time, Frequency, Depth).
######### Examples
>>> layer_norm = ChannelFreqwiseLayerNorm(channel_size=128)
>>> input_tensor = torch.randn(32, 128, 50, 50) # [Batch, Channel, Time, Frequency]
>>> output_tensor = layer_norm(input_tensor)
####### NOTE The normalization is performed using the formula: gLN_y = γ * (y - mean) / sqrt(var + EPS) + β where mean and var are calculated across the channel and frequency dimensions.