espnet2.speechlm.module.valle.AdaLN

About 1 min

espnet2.speechlm.module.valle.AdaLN

class espnet2.speechlm.module.valle.AdaLN(n_state, eps=1e-05)

Bases: Module

AdaLN is a custom layer normalization module that adapts the normalization

parameters based on an additional embedding vector. This class extends torch.nn.Module and allows for the dynamic adjustment of layer normalization weights and biases using an input tensor, level_emb.

n_state

The dimensionality of the input and output tensors.

Type: int

eps

A small value added for numerical stability in layer normalization.

Type: float
Parameters:
- n_state (int) – The number of input features for the layer.
- eps (float , optional) – The epsilon value for numerical stability in layer normalization. Default is 1e-5.
Returns: The output tensor after applying the adaptive layer normalization.
Return type: Tensor

####### Examples

>>> ada_ln = AdaLN(n_state=128)
>>> x = torch.randn(10, 128)  # Batch of 10, 128 features
>>> level_emb = torch.randn(10, 128)  # Batch of 10, 128 features
>>> output = ada_ln(x, level_emb)
>>> print(output.shape)
torch.Size([10, 128])

NOTE

This implementation uses two linear layers without biases to compute the weight and bias for the layer normalization. The weight and bias are computed from the level_emb tensor, which allows for dynamic adjustments based on contextual embeddings.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x: Tensor, level_emb: Tensor)

Apply the AdaLN layer normalization with learned scaling and bias.

This method performs layer normalization on the input tensor x using scaling and bias that are learned based on the level_emb tensor. The scaling and bias are obtained by passing level_emb through the weight and bias linear layers, respectively.

Parameters:
- x (Tensor) – The input tensor to be normalized, typically of shape (batch_size, n_state).
- level_emb (Tensor) – The embedding tensor used to compute the scaling and bias, of shape (batch_size, n_state).
Returns: The output tensor after applying layer normalization and scaling and bias adjustment, with the same shape as the input tensor x.
Return type: Tensor

####### Examples

>>> ada_ln = AdaLN(n_state=64)
>>> x = torch.randn(10, 64)  # Batch of 10 samples
>>> level_emb = torch.randn(10, 64)  # Corresponding level embeddings
>>> output = ada_ln.forward(x, level_emb)
>>> print(output.shape)  # Should be (10, 64)

NOTE

The layer normalization is applied with an epsilon value to avoid division by zero, which can be set during the initialization of the AdaLN instance.