espnet3.components.dataset.ShardedDataset
Less than 1 minute
espnet3.components.dataset.ShardedDataset
class espnet3.components.dataset.ShardedDataset
Bases: ABC, Dataset
Abstract base class for datasets that support sharding.
This interface is used in ESPnet’s multiple-iterator mode, where datasets are split into shards for parallel or distributed data loading. Any dataset subclassing ShardedDataset must implement the shard() method.
NOTE
- This class is intended to be used with CombinedDataset in ESPnet.
- All datasets combined must subclass ShardedDataset if sharding is used.
Example
>>> class MyDataset(ShardedDataset):
... def shard(self, idx):
... return Subset(self, shard_indices[idx])shard(idx: int)
Return a new dataset shard corresponding to the given index.
This method must be implemented by subclasses to return a subset of the data for sharded training or evaluation.
- Parameters:idx (int) – The index of the shard to return.
- Returns: A dataset instance representing the shard.
- Return type: Dataset
- Raises:NotImplementedError – Always in the base class. Must be overridden.
