ESPnet3 Training Configuration
ESPnet3 Training Configuration
This page explains the training.yaml schema used by create_dataset, train_tokenizer, collect_stats, and train.
Important
training.yaml usually drives more than training itself. It is often the main config for dataset preparation, tokenizer work, stats collection, and model fitting.
Minimum required keys
Typical training runs need:
exp_dirdatasetdataloadertrainermodelortaskoptimizerandscheduler, oroptimizersandschedulers
Common optional keys:
stats_dircreate_datasettokenizerparallelbest_model_criterionfitseed
Basic shape
num_device: 1
num_nodes: 1
task:
recipe_dir: .
data_dir: ${recipe_dir}/data
exp_tag: ${self_name:}
exp_dir: ${recipe_dir}/exp/${exp_tag}
stats_dir: ${recipe_dir}/exp/stats
dataset_dir: /path/to/your/dataset
create_dataset:
func: src.creating_dataset.create_dataset
dataset_dir: ${dataset_dir}
dataset:
_target_: espnet3.components.data.data_organizer.DataOrganizer
recipe_dir: ${recipe_dir}
train:
valid:
test:
preprocessor:
tokenizer:
model:
optimizer:
_target_: torch.optim.Adam
lr: 0.002
scheduler:
_target_: espnet2.schedulers.warmup_lr.WarmupLR
warmup_steps: 15000
parallel:
env: local
n_workers: 1
dataloader:
collate_fn:
_target_: espnet2.train.collate_fn.CommonCollateFn
int_pad_value: -1
train:
total_shards: 1
dist_world_size: 1
iter_factory:
_target_: espnet2.iterators.sequence_iter_factory.SequenceIterFactory
shuffle: true
collate_fn: ${dataloader.collate_fn}
batches:
type: unsorted
shape_files:
- ${stats_dir}/train/feats_shape
batch_size: 4
batch_bins: 4000000
valid:
total_shards: 1
dist_world_size: 1
iter_factory:
_target_: espnet2.iterators.sequence_iter_factory.SequenceIterFactory
shuffle: false
collate_fn: ${dataloader.collate_fn}
batches:
type: ${dataloader.train.iter_factory.batches.type}
shape_files:
- ${stats_dir}/valid/feats_shape
batch_size: ${dataloader.train.iter_factory.batches.batch_size}
batch_bins: ${dataloader.train.iter_factory.batches.batch_bins}
trainer:
accelerator: auto
devices: ${num_device}
num_nodes: ${num_nodes}Main sections
| Section | Description |
|---|---|
task | Optional ESPnet task class path used by get_espnet_model(...) |
recipe_dir, exp_dir, stats_dir, dataset_dir | Path scaffold |
create_dataset | Optional hook for the create_dataset stage |
dataset | Dataset organizer and split definitions |
tokenizer | Optional tokenizer config used by train_tokenizer |
model | Model config passed to ESPnet task logic or Hydra instantiate |
optimizer / scheduler | Single-optimizer path |
optimizers / schedulers | Multiple-optimizer path |
parallel | Dask config for collect_stats and runner-based helpers |
dataloader | Iterator, collate, batching, and sharding |
trainer | lightning.Trainer(...) settings |
task vs. model
If task is set, ESPnet3 uses the ESPnet2 task-side model path and passes model into get_espnet_model(...).
If task is unset, ESPnet3 instantiates model directly through Hydra.
create_dataset
create_dataset.func is resolved and called with the remaining fields in the create_dataset block as keyword arguments.
Use this when the recipe needs to generate manifests or prepare dataset assets before later stages run.
dataset
Training uses DataOrganizer. Split entries are defined under train, valid, and optionally test.
For the full dataset reference contract, see Dataset Config and Dataset References.
Optimizers and schedulers
ESPnet3 supports:
- single optimizer via
optimizer+scheduler - multiple optimizers via
optimizers+schedulers
Single-optimizer example:
optimizer:
_target_: torch.optim.Adam
lr: 0.002
scheduler:
_target_: espnet2.schedulers.warmup_lr.WarmupLR
warmup_steps: 15000For the multiple-optimizer contract, see Optimizer Configuration and Multiple Optimizers.
Dataloader
The current template uses SequenceIterFactory by default. You can also set iter_factory: null and use standard DataLoader-style fields such as batch_size, num_workers, and shuffle.
Trainer
Most fields under trainer are passed directly to lightning.Trainer(...).
This is also where model-training parallelism belongs, for example:
devicesnum_nodesstrategy
Use parallel only for Dask-backed helper execution, not for Lightning DDP.
