espnet2.gan_codec.shared.quantizer.modules.core_vq.VectorQuantization

About 3 min

espnet2.gan_codec.shared.quantizer.modules.core_vq.VectorQuantization

class espnet2.gan_codec.shared.quantizer.modules.core_vq.VectorQuantization(dim: int, codebook_size: int, codebook_dim: int | None = None, decay: float = 0.99, epsilon: float = 1e-05, kmeans_init: bool = True, kmeans_iters: int = 50, threshold_ema_dead_code: int = 2, commitment_weight: float = 1.0, quantizer_dropout: bool = False)

Bases: Module

Vector quantization implementation.

Currently supports only Euclidean distance.

Parameters:
- dim (int) – Dimension of the input data.
- codebook_size (int) – Size of the codebook.
- codebook_dim (Optional *[*int ]) – Dimension of the codebook. If not defined, uses the specified dimension in dim.
- decay (float) – Decay for exponential moving average over the codebooks.
- epsilon (float) – Epsilon value for numerical stability.
- kmeans_init (bool) – Whether to use k-means to initialize the codebooks.
- kmeans_iters (int) – Number of iterations used for k-means initialization.
- threshold_ema_dead_code (int) – Threshold for dead code expiration. Replace any codes that have an exponential moving average cluster size less than the specified threshold with a randomly selected vector from the current batch.
- commitment_weight (float) – Weight for the commitment loss during training.
- quantizer_dropout (bool) – Whether to apply dropout to the quantizer.

############# Examples

>>> vq = VectorQuantization(dim=256, codebook_size=512)
>>> x = torch.randn(10, 256)  # Batch of 10 samples with 256 features
>>> quantized_output, embed_indices, loss = vq(x)

####### NOTE The forward method returns different outputs depending on whether the model is in training or evaluation mode. In training mode, it returns the quantized output, embedding indices, and losses; while in evaluation mode, it returns only the quantized output and embedding indices.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

property codebook

Core vector quantization implementation.

This module provides a framework for vector quantization, including an implementation of Euclidean distance-based codebooks and the vector quantization algorithm. The main classes included are EuclideanCodebook, VectorQuantization, and ResidualVectorQuantization, each with specific functions to handle encoding, decoding, and loss calculations.

None

Parameters:
- dim (int) – Dimension of the input vectors.
- codebook_size (int) – Size of the codebook.
- codebook_dim (Optional *[*int ]) – Dimension of the codebook. If not defined, uses the specified dimension in dim.
- decay (float) – Decay for exponential moving average over the codebooks.
- epsilon (float) – Epsilon value for numerical stability.
- kmeans_init (bool) – Whether to use k-means to initialize the codebooks.
- kmeans_iters (int) – Number of iterations used for k-means initialization.
- threshold_ema_dead_code (int) – Threshold for dead code expiration. Replace any codes that have an exponential moving average cluster size less than the specified threshold with a randomly selected vector from the current batch.
- commitment_weight (float) – Weight for commitment loss in vector quantization.
- quantizer_dropout (bool) – Flag to enable dropout in quantization layers.

############# Examples

Initialize a vector quantization model

vq_model = VectorQuantization(dim=256, codebook_size=512)

Forward pass with input tensor

quantized_output, indices, loss = vq_model(input_tensor)

Encode and decode process

encoded_indices = vq_model.encode(input_tensor) reconstructed_tensor = vq_model.decode(encoded_indices)

####### NOTE The code is inspired by implementations from the encodec and vector-quantize-pytorch repositories. The code is licensed under the Apache 2.0 and MIT licenses.

decode(embed_ind)

Decode the quantized indices back to their original vector representations.

This method takes the indices of the quantized vectors and retrieves the corresponding vectors from the codebook, effectively reversing the quantization process.

Parameters:embed_ind (Tensor) – A tensor containing the indices of the quantized vectors. It is expected to be of shape (B, T), where B is the batch size and T is the number of time steps.
Returns: A tensor of shape (B, T, D) containing the decoded vectors, : where D is the dimension of the original vectors.
Return type: Tensor

############# Examples

>>> quantizer = VectorQuantization(dim=128, codebook_size=256)
>>> indices = torch.randint(0, 256, (10, 20))  # Simulated indices
>>> decoded_vectors = quantizer.decode(indices)
>>> print(decoded_vectors.shape)  # Output: torch.Size([10, 20, 128])

encode(x)

Encodes the input tensor into indices of the codebook.

This method preprocesses the input tensor, quantizes it by finding the nearest codebook entries, and then post-processes the indices to match the original shape.

Parameters:x (Tensor) – Input tensor to be encoded. The expected shape is (B, D, N) where B is the batch size, D is the dimension of each vector, and N is the number of vectors.
Returns: A tensor of shape (B, N) containing the indices of the : quantized vectors from the codebook.
Return type: Tensor

############# Examples

>>> vq = VectorQuantization(dim=256, codebook_size=512)
>>> input_tensor = torch.randn(2, 256, 10)  # Batch of 2, 10 vectors
>>> indices = vq.encode(input_tensor)
>>> print(indices.shape)  # Output: torch.Size([2, 10])

forward(x, mask=None)

Codebook Forward with EMA.

This method performs the forward pass of the vector quantization process, applying exponential moving average (EMA) updates to the codebook embeddings during training.

Parameters:x (Tensor) – Vector for quantization with shape (B, T, D), where B is the batch size, T is the sequence length, and D is the dimension of the vectors.
Returns: A tuple containing: : - Tensor: Quantized output with shape (B, T, D).
- Tensor: Codebook Index with shape (B, T).
Return type: Tuple[Tensor, Tensor]

####### NOTE During the training phase, the method updates the codebook using EMA and expires codes based on their usage.

############# Examples

>>> model = VectorQuantization(dim=128, codebook_size=512)
>>> input_tensor = torch.randn(10, 20, 128)  # Batch of 10, 20 time steps
>>> quantized_output, codebook_index = model(input_tensor)