espnet2.gan_codec.shared.quantizer.modules.core_vq.kmeans
Less than 1 minute
espnet2.gan_codec.shared.quantizer.modules.core_vq.kmeans
espnet2.gan_codec.shared.quantizer.modules.core_vq.kmeans(samples, num_clusters: int, num_iters: int = 10)
Perform k-means clustering on the given samples.
This function applies the k-means algorithm to cluster the input samples into a specified number of clusters. The clustering is done iteratively, updating the cluster centroids based on the assigned samples in each iteration.
- Parameters:
- samples (Tensor) – Input tensor of shape (N, D) where N is the number of samples and D is the dimension of each sample.
- num_clusters (int) – The number of clusters to form.
- num_iters (int , optional) – The number of iterations to run the k-means algorithm. Defaults to 10.
- Returns: A tuple containing: : - Tensor: The final cluster centroids of shape (num_clusters, D).
- Tensor: The number of samples assigned to each cluster, of shape (num_clusters,).
- Return type: Tuple[Tensor, Tensor]
Examples
>>> import torch
>>> samples = torch.rand(100, 2) # 100 samples in 2D
>>> centroids, cluster_sizes = kmeans(samples, num_clusters=3, num_iters=5)
>>> print(centroids.shape) # Should print: torch.Size([3, 2])
>>> print(cluster_sizes.shape) # Should print: torch.Size([3])
NOTE
The input samples must be a 2D tensor. The algorithm may not converge in cases where the number of clusters is too large or the data is not well-separated.