espnet3.components.data.data_organizer.DataOrganizer
espnet3.components.data.data_organizer.DataOrganizer
class espnet3.components.data.data_organizer.DataOrganizer(train: List[DatasetConfig | Dict[str, Any]] | None = None, valid: List[DatasetConfig | Dict[str, Any]] | None = None, test: List[DatasetConfig | Dict[str, Any]] | None = None, preprocessor: Callable[[dict], dict] | None = None)
Bases: object
Organizes training, validation, and test datasets into a unified interface.
This class constructs combined datasets for training and validation, and individual named datasets for testing, optionally applying a transform and preprocessor per dataset.
- Parameters:
- train (Optional *[*List *[*Union [DatasetConfig , Dict *[*str , Any ] ] ] ]) – A list of training dataset configuration objects.
- valid (Optional *[*List *[*Union [DatasetConfig , Dict *[*str , Any ] ] ] ]) – A list of validation dataset configuration objects.
- test (Optional *[*List *[*Union [DatasetConfig , Dict *[*str , Any ] ] ] ]) – A list of test dataset configurations, each with a name and corresponding dataset and optional transform.
- preprocessor (Optional *[*Callable ]) – A global preprocessor function applied after each dataset’s transform. If it’s an instance of AbsPreprocessor, (uid, sample) is passed.
train
Combined dataset built from training configurations, or None if not provided.
- Type:CombinedDataset
valid
Combined dataset built from validation configurations, or None if not provided.
- Type:CombinedDataset
test_sets
Dictionary mapping test set names to DatasetWithTransform instances.
Type: Dict[str, DatasetWithTransform]
Raises:
- RuntimeError – If only one of train or valid is provided.
- RuntimeError – If train and valid are of mismatched types (e.g., one is CombinedDataset, the other is None).
- AssertionError – If preprocessor is not callable.
NOTE
The DataOrganizer is designed to support both training and testing workflows:
- For training: provide both train and valid.
- For testing only: provide test and omit train / valid.
- All three (train, valid, test) can also be provided simultaneously.
If any of the train, valid, or test are omitted, the corresponding : attributes will be set to None or empty.
Example (training + validation): : ```python
organizer = DataOrganizer( ... train=train_configs, ... valid=valid_configs, ... preprocessor=MyPreprocessor() ... ) sample = organizer.train[0] test_sample = organizer.test["test_clean"][0]
Example (testing only):
: ```python
>>> organizer = DataOrganizer(
... test=test_configs,
... preprocessor=MyPreprocessor()
... )
>>> test_sample = organizer.test["test_clean"][0]Initialize DataOrganizer object.
property test
Get the dictionary of test datasets.
