ESPnet3 Dataset Configuration
ESPnet3 Dataset Configuration
This page explains the dataset section used across training, inference, and metrics configs.
Important
ESPnet3 dataset config is centered on DataOrganizer plus dataset reference entries. Do not assume each split directly instantiates a _target_ dataset class.
Basic shape
Most recipes use DataOrganizer and define train, valid, and test entries under dataset.
dataset:
_target_: espnet3.components.data.data_organizer.DataOrganizer
recipe_dir: ${recipe_dir}
train:
- data_src_args:
split: train
valid:
- data_src_args:
split: valid
test:
- name: test
data_src_args:
split: testWhat each dataset entry contains
Each dataset entry may contain:
namedata_srcdata_src_argstransform
The main rule is:
data_src_argsgoes toDataset(...)nameandtransformstay in organizer space
data_src
data_src tells ESPnet3 where the dataset module comes from.
Supported forms:
Recipe tag
data_src: mini_an4/asrThis resolves to:
egs3.mini_an4.asr.datasetExplicit module path
data_src: egs3.mini_an4.asr.datasetOmit data_src
If data_src is omitted, ESPnet3 loads:
${recipe_dir}/dataset/__init__.pyThis is the normal pattern for recipe-local dataset code.
data_src_args
data_src_args is forwarded directly to the exported Dataset class.
data_src_args:
split: test-clean
recipe_dir: ${recipe_dir}
source_dir: ${dataset_dir}This becomes:
Dataset(split="test-clean", recipe_dir=recipe_dir, source_dir=dataset_dir)Split meanings
trainis used bycollect_statsandtrainvalidis used bycollect_statsand validation during trainingtestis used byinferand may also helpmeasure
For inference, each dataset.test[].name becomes the output subdirectory under ${inference_dir}.
preprocessor
dataset.preprocessor is optional. When set, it is instantiated by DataOrganizer and applied after dataset loading.
Relationship to create_dataset
create_dataset is separate from dataset.
create_datasetprepares manifests or raw assetsdatasetdescribes how later stages load them
If you want the full dataset module contract, see Dataset references and builders.
