espnet3.components.data_organizer.DataOrganizer
espnet3.components.data_organizer.DataOrganizer
class espnet3.components.data_organizer.DataOrganizer(train: List[DatasetConfig | Dict[str, Any]] | None = None, valid: List[DatasetConfig | Dict[str, Any]] | None = None, test: List[DatasetConfig | Dict[str, Any]] | None = None, preprocessor: Callable[[dict], dict] | None = None)
Bases: object
Organizes training, validation, and test datasets into a unified interface.
This class constructs combined datasets for training and validation, and individual named datasets for testing, optionally applying a transform and preprocessor per dataset.
- Parameters:
- train (Optional *[*List *[*Union [DatasetConfig , Dict *[*str , Any ] ] ] ]) โ A list of training dataset configuration objects.
- valid (Optional *[*List *[*Union [DatasetConfig , Dict *[*str , Any ] ] ] ]) โ A list of validation dataset configuration objects.
- test (Optional *[*List *[*Union [DatasetConfig , Dict *[*str , Any ] ] ] ]) โ A list of test dataset configurations, each with a name and corresponding dataset and optional transform.
- preprocessor (Optional *[*Callable ]) โ A global preprocessor function applied after each datasetโs transform. If itโs an instance of AbsPreprocessor, (uid, sample) is passed.
train
Combined dataset built from training configurations, or None if not provided.
- Type:CombinedDataset
valid
Combined dataset built from validation configurations, or None if not provided.
- Type:CombinedDataset
test_sets
Dictionary mapping test set names to DatasetWithTransform instances.
Type: Dict[str, DatasetWithTransform]
Raises:
- RuntimeError โ If only one of train or valid is provided.
- RuntimeError โ If train and valid are of mismatched types (e.g., one is CombinedDataset, the other is None).
- AssertionError โ If preprocessor is not callable.
NOTE
The DataOrganizer is designed to support both training and testing workflows:
- For training: provide both train and valid.
- For testing only: provide test and omit train / valid.
- All three (train, valid, test) can also be provided simultaneously.
If any of the train, valid, or test are omitted, the corresponding : attributes will be set to None or empty.
Example (training + validation): : ```python
organizer = DataOrganizer( ... train=train_cfgs, ... valid=valid_cfgs, ... preprocessor=MyPreprocessor() ... ) sample = organizer.train[0] test_sample = organizer.test["test_clean"][0]
Example (testing only):
: ```python
>>> organizer = DataOrganizer(
... test=test_cfgs,
... preprocessor=MyPreprocessor()
... )
>>> test_sample = organizer.test["test_clean"][0]property test
