espnet2.gan_codec.shared.quantizer.modules.core_vq.kmeans

Less than 1 minute

espnet2.gan_codec.shared.quantizer.modules.core_vq.kmeans

espnet2.gan_codec.shared.quantizer.modules.core_vq.kmeans(samples, num_clusters: int, num_iters: int = 10)

Perform k-means clustering on the given samples.

This function applies the k-means algorithm to cluster the input samples into a specified number of clusters. The clustering is done iteratively, updating the cluster centroids based on the assigned samples in each iteration.

Parameters:
- samples (Tensor) – Input tensor of shape (N, D) where N is the number of samples and D is the dimension of each sample.
- num_clusters (int) – The number of clusters to form.
- num_iters (int , optional) – The number of iterations to run the k-means algorithm. Defaults to 10.
Returns: A tuple containing: : - Tensor: The final cluster centroids of shape (num_clusters, D).
- Tensor: The number of samples assigned to each cluster, of shape (num_clusters,).
Return type: Tuple[Tensor, Tensor]

Examples

>>> import torch
>>> samples = torch.rand(100, 2)  # 100 samples in 2D
>>> centroids, cluster_sizes = kmeans(samples, num_clusters=3, num_iters=5)
>>> print(centroids.shape)  # Should print: torch.Size([3, 2])
>>> print(cluster_sizes.shape)  # Should print: torch.Size([3])

NOTE

The input samples must be a 2D tensor. The algorithm may not converge in cases where the number of clusters is too large or the data is not well-separated.