espnet2.gan_codec.shared.quantizer.modules.core_vq.EuclideanCodebook

About 7 min

espnet2.gan_codec.shared.quantizer.modules.core_vq.EuclideanCodebook

class espnet2.gan_codec.shared.quantizer.modules.core_vq.EuclideanCodebook(dim: int, codebook_size: int, kmeans_init: int = False, kmeans_iters: int = 10, decay: float = 0.99, epsilon: float = 1e-05, threshold_ema_dead_code: int = 2)

Bases: Module

Codebook with Euclidean distance for vector quantization.

This class implements a codebook that utilizes Euclidean distance for quantizing input vectors. It supports k-means initialization, decay for exponential moving average (EMA), and expiration of dead codes based on cluster sizes.

decay

Decay factor for EMA updates of the codebooks.

Type: float

codebook_size

The size of the codebook.

Type: int

kmeans_iters

Number of iterations for the k-means algorithm.

Type: int

epsilon

Small value for numerical stability during calculations.

Type: float

threshold_ema_dead_code

Minimum cluster size for expiration of codes.

Type: int
Parameters:
- dim (int) – Dimension of the input vectors.
- codebook_size (int) – Number of code vectors in the codebook.
- kmeans_init (bool) – If True, uses k-means for initializing the codebook.
- kmeans_iters (int) – Number of iterations for k-means initialization.
- decay (float) – Decay factor for EMA updates.
- epsilon (float) – Small value for numerical stability.
- threshold_ema_dead_code (int) – Minimum size threshold for dead code expiration.

######################### Examples

>>> codebook = EuclideanCodebook(dim=128, codebook_size=256)
>>> x = torch.randn(10, 128)  # 10 vectors of dimension 128
>>> embed_indices = codebook.encode(x)
>>> quantized_output = codebook.decode(embed_indices)

########### NOTE The k-means initialization is performed only on the first batch of training data if kmeans_init is set to True. The class maintains an exponential moving average of the codebook vectors and their corresponding cluster sizes.

Raises:ValueError – If dim or codebook_size is less than or equal to zero.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

decode(embed_ind)

Decode the indices of the embedded vectors into their original representations.

This method takes the encoded indices (embedded indices) and retrieves the corresponding vectors from the codebook. It effectively reverses the quantization process, allowing you to obtain the actual vector representation for further processing or analysis.

Parameters:embed_ind (Tensor) – A tensor containing the indices of the embeddings to be decoded. The shape should be (B, T) where B is the batch size and T is the number of time steps.
Returns: A tensor containing the decoded vectors corresponding to the input : indices. The shape will be (B, T, D) where D is the dimensionality of the vectors in the codebook.
Return type: Tensor

######################### Examples

>>> codebook = EuclideanCodebook(dim=128, codebook_size=512)
>>> indices = torch.tensor([[1, 2], [3, 4]])
>>> decoded_vectors = codebook.decode(indices)
>>> print(decoded_vectors.shape)
torch.Size([2, 2, 128])  # Assuming the codebook vectors are of dimension 128

########### NOTE The decoded vectors are obtained using an embedding lookup on the codebook.

dequantize(embed_ind)

Dequantizes the given embedding indices using the codebook.

This method retrieves the corresponding vectors from the codebook based on the provided embedding indices. It effectively maps the quantized indices back to their respective continuous representations.

Parameters:embed_ind (Tensor) – A tensor containing the indices of the quantized embeddings. The shape should be (B, T) where B is the batch size and T is the number of time steps.
Returns: A tensor containing the dequantized vectors. The shape will be (B, T, D), where D is the dimension of the codebook vectors.
Return type: Tensor

######################### Examples

>>> codebook = EuclideanCodebook(dim=128, codebook_size=256)
>>> indices = torch.tensor([[0, 1], [2, 3]])
>>> dequantized_vectors = codebook.dequantize(indices)
>>> print(dequantized_vectors.shape)
torch.Size([2, 2, 128])

encode(x)

Encodes input tensor using the Euclidean codebook for vector quantization.

This method preprocesses the input tensor, quantizes it by finding the nearest codebook vectors, and then post-processes the resulting indices to match the original input shape.

Parameters:x (Tensor) – Input tensor of shape (B, T, D) where B is the batch size, T is the sequence length, and D is the dimension of each vector.
Returns: The encoded indices of shape (B, T) that correspond to the : nearest codebook vectors.
Return type: Tensor

######################### Examples

>>> codebook = EuclideanCodebook(dim=256, codebook_size=512)
>>> input_tensor = torch.randn(10, 20, 256)  # Batch of 10, 20 time steps, 256 dimensions
>>> encoded_indices = codebook.encode(input_tensor)
>>> print(encoded_indices.shape)  # Output: torch.Size([10, 20])

expire_codes_(batch_samples)

Expire codes that have a low exponential moving average cluster size.

This method checks the cluster sizes and replaces any codes in the codebook that have an exponential moving average size less than the specified threshold (threshold_ema_dead_code) with randomly selected vectors from the current batch. This helps to ensure that the codebook remains relevant and does not contain unused or under-utilized codes.

Parameters:batch_samples (Tensor) – A tensor containing the current batch of samples. This should be a 2D tensor where the first dimension is the number of samples and the second dimension is the feature size.
Returns: This method modifies the codebook in place and does not return any value.
Return type: None
Raises:None – This method does not raise any exceptions.

######################### Examples

>>> codebook = EuclideanCodebook(dim=128, codebook_size=256)
>>> batch_samples = torch.randn(32, 128)  # 32 samples, 128 features
>>> codebook.expire_codes_(batch_samples)

########### NOTE

The method only executes the expiration logic if threshold_ema_dead_code is greater than 0.
If no codes are expired, the method returns early without making any changes to the codebook.

forward(x)

Codebook Forward with EMA.

This method processes the input tensor x through the codebook using exponential moving average (EMA) for quantization. It initializes the embedding if it is the first forward pass, performs quantization, and handles the training updates for the codebook.

Parameters:x (Tensor) – Vector for quantization (B, T, D), where B is the batch size, T is the sequence length, and D is the dimension of the vectors.
Returns: A tuple containing: : - Tensor: Quantized output (B, T, D)
- Tensor: Codebook Index (B, T)
Return type: Tuple[Tensor, Tensor]

######################### Examples

>>> codebook = EuclideanCodebook(dim=256, codebook_size=512)
>>> input_tensor = torch.randn(8, 10, 256)  # Batch of 8, seq len 10
>>> quantized_output, codebook_indices = codebook(input_tensor)
>>> print(quantized_output.shape)  # Should output: torch.Size([8, 10, 256])
>>> print(codebook_indices.shape)   # Should output: torch.Size([8, 10])

########### NOTE The method will only initialize the embedding on the first forward call, which is controlled by the inited buffer. If self.training is True, it updates the codebook with the current batch statistics.

Raises:RuntimeError – If the input tensor does not have the expected shape or is not a 3D tensor.

init_embed_(data)

Initialize the codebook embeddings using k-means clustering.

This method initializes the codebook embeddings if they have not been initialized yet. It performs k-means clustering on the input data to determine the initial codebook vectors and updates the relevant buffers.

Parameters:data (Tensor) – The input data used for initializing the codebook. The shape of the tensor should be (N, D), where N is the number of samples and D is the dimension of each sample.
Returns: This method modifies the internal state of the class in place.
Return type: None

########### NOTE This method is only activated during the first call, as it checks the inited flag to determine whether the embeddings have already been initialized. If they have, the method returns early without making any changes.

######################### Examples

>>> codebook = EuclideanCodebook(dim=128, codebook_size=256)
>>> data = torch.randn(1000, 128)
>>> codebook.init_embed_(data)  # Initializes the codebook embeddings

Raises:ValueError – If the input data is not of the expected shape.

postprocess_emb(embed_ind, shape)

Post-process the embedding indices to reshape them into the desired output shape.

This method takes the embedding indices produced by the quantization process and reshapes them according to the specified shape. It is particularly useful in maintaining the original dimensions of the input tensor after quantization.

Parameters:
- embed_ind (torch.Tensor) – The embedding indices obtained from the quantization process, typically a tensor of shape (B, T) where B is the batch size and T is the number of time steps.
- shape (tuple) – The desired output shape for the reshaped embedding indices. This should generally correspond to the shape of the original input tensor before quantization.
Returns: The reshaped embedding indices with the specified shape.
Return type: torch.Tensor

######################### Examples

>>> codebook = EuclideanCodebook(dim=128, codebook_size=256)
>>> embed_ind = torch.tensor([[0, 1, 2], [3, 4, 5]])
>>> shape = (2, 3)  # Batch size of 2 and 3 time steps
>>> reshaped_embed_ind = codebook.postprocess_emb(embed_ind, shape)
>>> print(reshaped_embed_ind.shape)
torch.Size([2, 3])  # Output shape matches the specified shape

preprocess(x)

Preprocess input data for the Euclidean codebook.

This function rearranges the input tensor x to ensure it has the correct shape for further processing within the codebook. Specifically, it flattens the last dimension while maintaining the rest of the structure.

Parameters:x (torch.Tensor) – The input tensor to preprocess. It can have any number of leading dimensions, with the last dimension representing the features.
Returns: The preprocessed tensor with the last dimension flattened : into a single dimension while retaining the leading dimensions.
Return type: torch.Tensor

######################### Examples

>>> import torch
>>> x = torch.randn(2, 3, 4)  # A tensor with shape (2, 3, 4)
>>> preprocessed_x = preprocess(x)
>>> print(preprocessed_x.shape)  # Should output: torch.Size([6, 4])

quantize(x)

Core vector quantization implementation for the EuclideanCodebook package.

This module provides an implementation of vector quantization using Euclidean distance, including the ability to initialize codebooks using k-means clustering, and manage codebook updates using exponential moving averages.

dim

Dimension of the input vectors.

Type: int

codebook_size

Size of the codebook (number of centroids).

Type: int

kmeans_init

Flag indicating whether to initialize the codebook with k-means.

Type: bool

kmeans_iters

Number of iterations for k-means initialization.

Type: int

decay

Decay rate for the exponential moving average of the codebooks.

Type: float

epsilon

Small value for numerical stability during calculations.

Type: float

threshold_ema_dead_code

Minimum cluster size for a code to be considered alive.

Type: int
Parameters:x (torch.Tensor) – Input tensor for quantization of shape (B, T, D).
Returns:
- Quantized output of shape (B, T, D).
- Codebook index of shape (B, T).
Return type: Tuple[torch.Tensor, torch.Tensor]
Raises:ValueError – If the input tensor does not have the expected shape.

######################### Examples

Example of using the EuclideanCodebook

codebook = EuclideanCodebook(dim=128, codebook_size=256, kmeans_init=True) input_tensor = torch.randn(10, 20, 128) # Batch of 10, 20 time steps, 128 dimensions quantized_output, codebook_indices = codebook(input_tensor)

########### NOTE The forward method initializes the codebook only on the first call using the provided input data.

replace_(samples, mask)

Replace the entries in the codebook based on a mask.

This method updates the codebook entries with randomly selected vectors from the provided samples where the corresponding mask is True. The existing codebook entries remain unchanged where the mask is False.

Parameters:
- samples (Tensor) – The tensor containing sample vectors from which new codebook entries will be drawn. Shape should be (N, D) where N is the number of samples and D is the dimensionality of each sample.
- mask (Tensor) – A boolean tensor indicating which codebook entries should be replaced. Shape should be (C,) where C is the number of codebook entries.

######################### Examples

>>> codebook = EuclideanCodebook(dim=128, codebook_size=256)
>>> samples = torch.randn(10, 128)  # 10 samples of dimension 128
>>> mask = torch.tensor([True, False, True, False, True,
...                      False, True, False, True, False,
...                      True, False, True, False, True,
...                      False, True, False, True, False,
...                      True, False, True, False, True,
...                      False, True, False, True, False])
>>> codebook.replace_(samples, mask)

########### NOTE This method is typically used in the training process to adapt the codebook entries based on the current batch of samples.