espnet2.speechlm.dataloader.dataset.CombinedDataset
Less than 1 minute
espnet2.speechlm.dataloader.dataset.CombinedDataset
class espnet2.speechlm.dataloader.dataset.CombinedDataset(datasets: List[Tuple[str, str]] = [], registered_datasets: List[str] = [], rank: int = 0, world_size: int = 1)
Bases: Dataset
Combined ESPnet Speech Language Model Dataset.
Combines multiple datasets from both direct paths and registered datasets.
- Parameters:
- datasets β List of (name, json_path) tuples for direct dataset paths (default: [])
- registered_datasets β List of registered dataset names to look up in registry (default: [])
- rank β Process rank for distributed training (default: 0)
- world_size β Total number of processes (default: 1)
property dataset_names : List[str]
Return list of all dataset names.
get_all_examples() β Dict[str, List[str]]
Return all examples as a dictionary mapping dataset names to sample IDs.
- Returns: Dictionary mapping dataset names to lists of sample IDs
