Config diff
Config diff
ESPnet3 is much more config-driven than ESPnet2.
If you are porting a recipe, this usually means:
- move shell variables into YAML
- split one large config surface into several files
- keep the model part mostly intact
- rewrite dataset, dataloader, and inference sections around ESPnet3 concepts
The biggest change
In ESPnet2, you usually combine:
- shell options in
asr.sh - model config in
conf/train_asr_*.yaml - decode config in
conf/decode_asr.yaml
In ESPnet3, those concerns are split across:
training.yamlinference.yamlmetrics.yamlpublication.yamldemo.yaml
So the first migration step is not "rename one file". It is "split one ESPnet2 config surface into several ESPnet3 configs".
File mapping
| ESPnet2 | ESPnet3 |
|---|---|
conf/train_asr_*.yaml | conf/training.yaml |
conf/decode_asr.yaml | conf/inference.yaml |
| shell-side scoring options | conf/metrics.yaml |
| shell-side packing / upload settings | conf/publication.yaml |
| no direct equivalent | conf/demo.yaml |
CLI flag mapping
| ESPnet2 shell side | ESPnet3 |
|---|---|
--asr_config | --training_config |
--inference_config | --inference_config |
| shell scoring stage options | --metrics_config |
| shell packing/upload options | --publication_config |
| no direct equivalent | --demo_config |
Important
ESPnet3 expects canonical file and flag names. Use training.yaml, inference.yaml, metrics.yaml, publication.yaml, demo.yaml, and the matching --*_config flags.
What you can usually copy almost as-is
The safest part to port from ESPnet2 is the model definition itself.
For many ASR recipes, this means:
encoderencoder_confdecoderdecoder_confmodel_confoptim/optim_confscheduler/scheduler_conf
These often map directly into the model, optimizer, and scheduler sections of training.yaml.
Model migration
ESPnet2 style
Typical ESPnet2 training config:
encoder: transformer
encoder_conf:
output_size: 256
attention_heads: 4
decoder: transformer
decoder_conf:
attention_heads: 4
model_conf:
ctc_weight: 0.3
optim: adam
optim_conf:
lr: 0.001
scheduler: warmuplr
scheduler_conf:
warmup_steps: 2500ESPnet3 style
In ESPnet3, there are two main paths.
Reuse the ESPnet2 task path
This is the easiest migration path.
Set task, then put the old task-side model config under model:.
task: espnet2.tasks.asr.ASRTask
model:
encoder: transformer
encoder_conf:
output_size: 256
attention_heads: 4
decoder: transformer
decoder_conf:
attention_heads: 4
model_conf:
ctc_weight: 0.3Then migrate optimizer and scheduler into Hydra-style sections:
optimizer:
_target_: torch.optim.Adam
lr: 0.001
scheduler:
_target_: espnet2.schedulers.warmup_lr.WarmupLR
warmup_steps: 2500This is usually the best first port.
Instantiate a pure ESPnet3 model directly
If you are no longer using the ESPnet2 task bridge, model becomes a direct Hydra instantiation block:
model:
_target_: my_package.models.MyModel
hidden_size: 256But for ESPnet2-to-ESPnet3 migration, start with the first path unless you already need a custom model wrapper.
Training Config
See where model, optimizer, scheduler, dataloader, and trainer settings live.
Model Components
Compare the ESPnet2 task path with direct model instantiation.
Optimizer Config
See Hydra-style optimizer and scheduler configuration.
Dataloader migration
This is the second big change.
In ESPnet2, you often have top-level fields such as:
batch_typebatch_sizeaccum_grad- sometimes shell-side
num_splits_*
In ESPnet3, batching moves into dataloader.
ESPnet2 style
batch_type: folded
batch_size: 64
accum_grad: 1
max_epoch: 200ESPnet3 style
dataloader:
train:
iter_factory:
shuffle: true
batches:
type: folded
shape_files:
- ${stats_dir}/train/feats_shape
batch_size: 64
trainer:
accumulate_grad_batches: 1
max_epochs: 200Practical rewrite rule
Use this mapping:
| ESPnet2 | ESPnet3 |
|---|---|
batch_type | dataloader.*.iter_factory.batches.type |
batch_size | dataloader.*.iter_factory.batches.batch_size |
accum_grad | trainer.accumulate_grad_batches |
max_epoch | trainer.max_epochs |
Dataloader
See how iter_factory, samplers, batch samplers, and collate functions work.
Stats Collection
See where feats_shape files come from before folded batching uses them.
Train Stage
See how training reads training.yaml and launches the trainer.
Path and experiment settings
ESPnet3 makes many path settings explicit.
This is one of the biggest "extra config" additions compared with ESPnet2.
Typical training.yaml path scaffold:
recipe_dir: .
data_dir: ${recipe_dir}/data
exp_tag: ${self_name:}
exp_dir: ${recipe_dir}/exp/${exp_tag}
stats_dir: ${recipe_dir}/exp/stats
dataset_dir: /path/to/your/datasetESPnet2 often kept these in:
- shell vars like
expdir,dumpdir - recipe-local assumptions
- stage-local defaults
ESPnet3 prefers making them visible in config.
Main new path concepts
| ESPnet3 key | Why it exists |
|---|---|
recipe_dir | anchor for local modules and relative paths |
exp_tag | experiment naming |
exp_dir | run output root |
stats_dir | collect-stats outputs |
dataset_dir | single shared dataset root across stages |
inference_dir | inference output root |
Config Overview
See how config files, defaults, and path resolvers fit together.
System and Stages
See which config object each stage receives.
Recipe Structure
See where conf, data, exp, and src files live in ESPnet3 recipes.
Dataset migration
This is a major structural change.
In ESPnet2, data flow often depends on:
data/<split>/wav.scp,text,utt2spk- shell stage logic in
local/data.sh
In ESPnet3, dataset access is described in config through DataOrganizer.
ESPnet3 training dataset shape
dataset:
train:
- data_src_args:
split: train
valid:
- data_src_args:
split: valid
test:
- name: test
data_src_args:
split: test
preprocessor:New concepts to learn
| ESPnet3 concept | Meaning |
|---|---|
data_src | where the dataset class comes from |
data_src_args | kwargs passed to Dataset(...) |
name | logical test-set name |
preprocessor | shared preprocessing layer |
recipe_dir | allows local dataset/__init__.py resolution |
Migration rule
If ESPnet2 used local/data.sh plus manifest-style files, the ESPnet3 version usually becomes:
dataset/builder.pyfor source prep and manifest generationdataset/__init__.pyordataset/dataset.pyforDatasetdataset:entries intraining.yamlandinference.yaml
Dataset Config
See the YAML format for train, valid, test, data_src, and data_src_args.
DataOrganizer
See how dataset entries become train, valid, and named test splits.
Data Pipeline
See how local/data.sh work maps to builder.py and dataset.py.
Trainer settings
ESPnet2 top-level training knobs often move into trainer:.
Typical examples:
| ESPnet2 | ESPnet3 |
|---|---|
max_epoch | trainer.max_epochs |
accum_grad | trainer.accumulate_grad_batches |
| distributed shell vars | trainer.devices, trainer.num_nodes, trainer.strategy |
So in ESPnet3, model-training parallelism belongs mainly to trainer, not to the Dask parallel block.
Trainer
See the ESPnet3 Lightning trainer wrapper and trainer config surface.
Multi-GPU Guide
Compare trainer parallelism with provider/runner parallelism.
Training Config
See the trainer block inside the full training.yaml schema.
Parallel config is new
ESPnet2 cluster behavior is usually shell-driven. ESPnet3 adds an explicit parallel: config block.
Typical example:
parallel:
env: local
n_workers: 1Or:
parallel:
env: slurm
n_workers: 16
options:
queue: batch
cores: 4
memory: 32GBUse this for Dask-backed helper execution such as:
collect_stats- provider/runner workloads
- inference runner parallelism
Do not confuse it with Lightning DDP settings under trainer.
Parallel Config
See env, n_workers, and backend options for local, GPU, and cluster execution.
Parallel Runtime
See provider/runner execution for collect_stats, inference, and fan-out work.
Cluster Migration
See what replaces nj, run.pl, queue.pl, and shell scheduler options.
Inference is not "decode config" anymore
This is one of the most important differences.
In ESPnet2, decode_asr.yaml is often just decoding hyperparameters:
beam_size: 10
ctc_weight: 0.3
lm_weight: 0.1
penalty: 0.0In ESPnet3, inference.yaml is much broader. It is not only beam-search settings.
It also defines:
- dataset test splits
- output directory
- provider
- runner
- input selection
- output formatting
- optional artifact writing
- optional parallel backend
ESPnet3 inference shape
recipe_dir: .
exp_tag:
exp_dir: ${recipe_dir}/exp/${exp_tag}
inference_dir: ${exp_dir}/${self_name:}
dataset:
_target_: espnet3.components.data.data_organizer.DataOrganizer
recipe_dir: ${recipe_dir}
test:
- name: test
data_src_args:
split: test
parallel:
env: local
n_workers: 1
model:
input_key: speech
output_fn: src.inference.build_output
provider:
_target_: espnet3.systems.base.inference_provider.InferenceProvider
runner:
_target_: espnet3.systems.base.inference_runner.InferenceRunnerThe main migration rule for decoding
Think of ESPnet3 inference as:
- old decode config
- plus dataset selection
- plus output writing contract
- plus runtime execution backend
all combined in one file.
What to do with old decode hyperparameters
Old ESPnet2 decode hyperparameters such as:
beam_sizectc_weightlm_weightpenaltymaxlenratiominlenratio
usually do not disappear.
They normally move under the inference-side model config, because the model in ESPnet3 is often instantiated directly from inference.yaml.
Conceptually:
model:
_target_: ...
beam_size: 10
ctc_weight: 0.3
lm_weight: 0.1
penalty: 0.0So the beam-search knobs still exist, but they no longer define the whole inference config by themselves.
Inference Config
See how dataset, provider, runner, model, and outputs fit in inference.yaml.
Inference Stage
See how the infer stage reads inference.yaml and writes outputs.
Inference Provider
See the provider contract used by parallel inference jobs.
decode vs inference
ESPnet2 naming often uses decode. ESPnet3 docs and configs prefer inference.
Use these newer names when porting:
inference.yaml, notdecode_asr.yamlinference_dir, notdecode_dirinferstage implementation, not ad-hoc decode shell logic
Metrics config is also split out
In ESPnet2, scoring is often a continuation of decoding shell stages.
In ESPnet3, metric computation moves into metrics.yaml.
Typical shape:
metrics:
- metric:
_target_: espnet3.systems.asr.metrics.wer.WER
ref_key: ref
hyp_key: hypThis means:
- inference writes structured outputs first
- metrics read those outputs later
So scoring is no longer just a shell postprocess step.
Metrics Config
See the metrics.yaml shape for metric classes and keys.
Metrics Stage
See how ESPnet3 runs metric computation after inference.
Metrics Components
See how metric classes are implemented and configured.
Publication and demo configs are extra
ESPnet3 also adds config surfaces that many ESPnet2 recipes did not expose as first-class YAML files:
publication.yamldemo.yaml
These exist because ESPnet3 treats:
- model packing
- model upload
- demo packing
- demo upload
as normal named stages with explicit config.
Publication Config
See how model packing and upload settings are configured.
Publish Stage
See how ESPnet3 publishes packed models from stage config.
Demo Config
See the config surface for demo packaging and upload.
A compact checklist
- Copy the old model/task-side config into
training.yaml:model - Convert
optim*andscheduler*into Hydra-style blocks - Move
batch_*settings intodataloader - Move epoch/accumulation/device settings into
trainer - Add explicit path scaffold keys
- Replace shell data assumptions with
dataset: - Rewrite decode config into a full
inference.yaml - Split scoring into
metrics.yaml - Add
parallel:only when runner/Dask execution is needed
Related pages
Training Config
See the full `training.yaml` structure used by create_dataset, stats, and train.
Inference Config
See how ESPnet3 inference combines dataset, provider, runner, and outputs.
Metrics Config
See how scoring moves into `metrics.yaml`.
Dataset Config
See the dataset reference format used by training and inference.
Parallel Config
See the Dask-backed parallel config surface.
Recipe structure
See how the file layout changes when porting a recipe.
