Getting Started with ESPnet3

Masao SomekiAbout 2 min

🚀 Getting Started with ESPnet3

This guide provides the fastest way to start using ESPnet3.
Choose the workflow that fits your environment and follow the examples below.

⚡ Quick Start (ASR Example)

1. Install ESPnet3

Pick a system that is closest to your work. And then you can install only packages that is required for your work.

Systems

asr

tts

enh

s2st

s2t

spk

speechlm

Automatic Speech Recognition

espnet[asr]

End-to-end ASR with CTC, attention, and hybrid decoding. Includes WER evaluation and CTC segmentation.

Packages

ctc-segmentation editdistance opt_einsum jiwer

espnet[asr]

$pip install "espnet[asr]"

conda / source install → Installation docs

2. Clone a recipe

Pick a built-in recipe and clone it into a local project directory. The clone command copies the full recipe, so you can run it immediately without touching the ESPnet3 source tree.

espnet3 clone librispeech/asr --project my_project
cd my_project
python run.py

The cloned project looks like this:

my_project/
├── run.py                  # entry point — run stages from here
├── conf/
│   ├── training.yaml       # model, trainer, optimizer, dataloader
│   ├── inference.yaml      # decoder settings, test sets
│   ...
└── src/
    ├── dataset.py          # dataset class for this recipe
    └── builder.py          # DatasetBuilder called by create_dataset

To see how each file in the cloned project connects to a stage, see How files and stages connect.

See Our Github recipes for the full list of recipes.

Note

The cloned directory is not just a sample. It is your working project for baseline runs, fine-tuning, inference, and publication.

3. Run with run.py

Each stage maps to a named step in the pipeline. Run them one at a time or all at once.

# Step by step
python run.py --stages create_dataset \
  --training_config conf/training.yaml

python run.py --stages collect_stats train infer measure \
  --training_config conf/training.yaml \
  --inference_config conf/inference.yaml \
  --metrics_config conf/metrics.yaml

python run.py --stages pack_model upload_model \
  --training_config conf/training.yaml \
  --inference_config conf/inference.yaml \
  --metrics_config conf/metrics.yaml \
  --publication_config conf/publication.yaml

Or run everything in one go:

python run.py \
  --stages all \
  --training_config conf/training.yaml \
  --inference_config conf/inference.yaml \
  --metrics_config conf/metrics.yaml \
  --publication_config conf/publication.yaml

All stage settings live in conf/*.yaml — there are no CLI key-value overrides. Outputs go to exp/<exp_tag>/ (checkpoints, logs) and exp/<exp_tag>/inference/ (hypotheses, metrics.json).

Warning

Unlike tools such as Hydra, ESPnet3 does not support overriding config values from the CLI. This is intentional: ESPnet3 enforces a strict one config per experiment policy so that every run is fully reproducible from its YAML files alone. If you need a different setting, edit or copy the config file — do not pass ad-hoc flags on the command line.

Advanced: import-based execution

ESPnet3 recipes are fully importable, which is useful for programmatic pipelines or MLOps workflows where you want to drive stage execution from Python rather than the CLI.

from argparse import Namespace
from pathlib import Path

from egs3.TEMPLATE.asr.run import main
from espnet3.systems.asr.system import ASRSystem

args = Namespace(
    stages=["create_dataset", "collect_stats", "train", "infer", "measure"],
    training_config=Path("/path/to/training.yaml"),
    inference_config=Path("/path/to/inference.yaml"),
    metrics_config=Path("/path/to/metrics.yaml"),
    publication_config=None,
    demo_config=None,
    dry_run=False,
    write_requirements=False,
)

main(args=args, system_cls=ASRSystem)

Next steps

What a Recipe Is

Build a minimal recipe from scratch with your own data

Installation Troubleshooting

CUDA mismatch, missing packages, and common errors

System and Stages

Learn how `run.py` and named stages fit together in ESPnet3.

Config Overview

See how training, inference, metrics, and publication configs connect.