From Hugging Face
From Hugging Face
If you come from Hugging Face, the main difference is scope.
- Hugging Face often starts from a model and a Trainer
- ESPnet3 often starts from a recipe, a System, and stage configs
Do not try to map every class one-to-one first. Map the workflow first.
Rough mental mapping
| Hugging Face | ESPnet3 |
|---|---|
| model class | model in training.yaml, often src/model.py |
Trainer | train stage + Lightning trainer wrapper |
| dataset object | dataset.py or builder.py + dataset: config |
| preprocessing function | dataset transform, builder step, or collate function |
| generation config | inference.yaml |
| evaluation script | metrics.yaml + measure stage |
| model repo artifact | publication bundle |
What usually feels familiar
These parts are usually easy to understand:
- normal PyTorch model code still works
- normal
torch.utils.data.Datasetstill works - optimizer and scheduler config is explicit
- distributed training uses familiar backend settings under
trainer:
What usually feels different
These are the main shifts:
- config is split by stage, not by one big training script
- inference is a first-class stage, not just
generate()calls - recipes are expected to own data preparation too
- publication and demo flows are part of the system design
A practical migration strategy
Move in this order:
- get your dataset loading working
- get your model instantiated from
training.yaml - run one
trainstage locally - add
inference.yaml - add
metrics.yamlonly after inference output looks right
Do not start by porting every helper utility.
When to keep code recipe-local
Keep code under src/ when it is specific to one recipe:
src/model.pysrc/system.pysrc/dataset.pysrc/trainer.pysrc/lightning_module.py
Move code into espnet3/ only when it is reusable across recipes.
Good pages to read next
What is a recipe?
See how ESPnet3 organizes one experiment as one recipe directory.
Coming from PyTorch
See the closest mental model if you already understand raw PyTorch training code.
Config overview
See how `training.yaml`, `inference.yaml`, and `metrics.yaml` split the workflow.
Custom dataset
See how to plug in your own dataset without adopting a special dataset base class.
Customize the model
See how to use `src/model.py` and switch away from the task bridge when needed.
