espnet2.gan_codec.shared.quantizer.modules.core_vq.ResidualVectorQuantization

About 2 min

espnet2.gan_codec.shared.quantizer.modules.core_vq.ResidualVectorQuantization

class espnet2.gan_codec.shared.quantizer.modules.core_vq.ResidualVectorQuantization(*, num_quantizers, **kwargs)

Bases: Module

Residual vector quantization implementation.

Follows Algorithm 1 in https://arxiv.org/pdf/2107.03312.pdf.

This class implements a residual vector quantization mechanism that allows for more efficient encoding by utilizing multiple quantizers in sequence. Each quantizer processes the residual from the previous step, effectively reducing the quantization error iteratively.

layers

A list of VectorQuantization layers.

Type: nn.ModuleList

quantizer_dropout

Whether to apply dropout to quantizers during training.

Type: bool
Parameters:
- num_quantizers (int) – The number of quantizers to be used in the residual vector quantization process.
- **kwargs – Additional keyword arguments passed to the VectorQuantization initialization.
Returns: The final quantized output. out_indices (Tensor): Indices of the quantized representations. out_losses (Tensor): Losses associated with each quantization step
during training.
Return type: quantized_out (Tensor)

########### Examples

>>> rvq = ResidualVectorQuantization(num_quantizers=3, dim=128,
...                                   codebook_size=256)
>>> x = torch.randn(16, 10, 128)  # Batch of 16 sequences of length 10
>>> quantized_out, indices, losses = rvq(x)

NOTE

The implementation includes dropout behavior for the quantizers, which can be controlled via the quantizer_dropout parameter. This allows for random selection of quantizers during training, improving model robustness.

Raises:ValueError – If num_quantizers is not a positive integer.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

decode(q_indices: Tensor) → Tensor

Decode the quantized indices into their corresponding vectors.

This method retrieves the original vectors from the codebook using the provided quantized indices. It effectively performs the reverse operation of the quantization process.

Parameters:embed_ind (Tensor) – A tensor of quantized indices of shape (B, T) where B is the batch size and T is the number of time steps.
Returns: A tensor containing the decoded vectors of shape (B, T, D), : where D is the dimension of the vectors.
Return type: Tensor

########### Examples

>>> codebook = EuclideanCodebook(dim=128, codebook_size=256)
>>> indices = torch.tensor([[0, 1], [2, 3]])  # Example indices
>>> decoded_vectors = codebook.decode(indices)
>>> print(decoded_vectors.shape)  # Should output: torch.Size([2, 2, 128])

encode(x: Tensor, n_q: int | None = None, st: int | None = None) → Tensor

Encodes input tensor using the vector quantization process.

This method preprocesses the input tensor, performs quantization, and post-processes the resulting indices. The output is a tensor containing the indices of the quantized vectors.

Parameters:x (Tensor) – Input tensor of shape (B, T, D), where B is the batch size, T is the sequence length, and D is the dimensionality of the input vectors.
Returns: A tensor of shape (B, T) containing the indices of the : quantized vectors.
Return type: Tensor

########### Examples

>>> import torch
>>> vq = VectorQuantization(dim=128, codebook_size=256)
>>> input_tensor = torch.randn(10, 20, 128)  # (B, T, D)
>>> indices = vq.encode(input_tensor)
>>> print(indices.shape)  # Output: torch.Size([10, 20])

forward(x, n_q: int | None = None)

Perform forward pass for residual vector quantization.

This method processes the input tensor through a series of vector quantization layers, computing the quantized output, indices, and losses associated with the quantization process.

Parameters:
- x (Tensor) – Input tensor to be quantized (B, D, N).
- n_q (Optional *[*int ]) – Number of quantizers to use. If None, uses all available quantizers.
Returns: Quantized output (B, D, N). List[Tensor]: Indices of the quantized vectors for each layer. List[Tensor]: Losses associated with each layer’s quantization
(if not using dropout).
Return type: Tensor

NOTE

If quantizer_dropout is enabled, only a subset of quantizers will be used during training, based on a dropout mechanism.

########### Examples

>>> rvq = ResidualVectorQuantization(num_quantizers=3,
...                                    dim=128,
...                                    codebook_size=256)
>>> input_tensor = torch.randn(10, 128, 20)
>>> quantized_output, indices, losses = rvq(input_tensor)