espnet3.components.data.data_organizer.DatasetConfig
espnet3.components.data.data_organizer.DatasetConfig
class espnet3.components.data.data_organizer.DatasetConfig(name: str | None = None, data_src: str | None = None, data_src_args: Dict[str, Any] | None = None, transform: Dict[str, Any] | None = None, split: str | None = None)
Bases: object
Configuration class for dataset metadata and construction.
This class encapsulates the necessary fields to define and instantiate a dataset. Used with Hydra to allow modular and flexible configuration via YAML or dictionaries.
name
Name identifier for the dataset.
- Type: str
data_src
Optional dataset source reference resolved via espnet3.components.data.dataset_module.
- Type: Optional[str]
data_src
Keyword arguments passed to the recipe Dataset class.
- Type: Optional[Dict[str, Any]]
transform
A dictionary for Hydra instantiation of a transform applied to each sample after loading.
- Type: Optional[Dict[str, Any]]
split
Optional split label kept for compatibility with configs that still carry it as metadata.
- Type: Optional[str]
Examples
Recipe-backed dataset entry. : ```python
config_dict = { ... "name": "custom", ... "data_src": "mini_an4/asr", ... "data_src_args": {"split": "test"}, ... "transform": { ... "target": "my_project.transforms.uppercase_transform", ... }, ... } config = DatasetConfig(**config_dict) config.data_src 'mini_an4/asr'
**Local recipe dataset entry.**
: ```python
>>> config = DatasetConfig(
... name="local_eval",
... data_src_args={"split": "eval"},
... )
>>> config.data_src is None
Truedata_src
data_src
name
split
transform
