
ESPnet3: a modern major release
Pythonic, end-to-end speech workflows—from dataset creation to training, inference, evaluation, packaging, and demo generation.
Stages
create_dataset
Download/build datasets for your recipe.
collect_stats
Compute feature shapes and global stats.
train
Run Lightning training with `train.yaml`.
infer
Write `.scp` outputs under `infer_dir`.
measure
Compute metrics from inference outputs.
Publish-related
Pack and upload model artifacts (`pack_model` / `upload_model`).
Demo stages
Generate and upload a demo UI.
System-specific stages
Add your own stages in the System class.
How to cite ESPnet
@inproceedings{watanabe18_interspeech,
title = {ESPnet: End-to-End Speech Processing Toolkit},
author = {Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
year = {2018},
booktitle = {Proc. Interspeech},
pages = {2207--2211},
doi = {10.21437/Interspeech.2018-1456},
issn = {2958-1796},
}To cite individual modules, models, or recipes, please refer to Additional Citations.
