ESPnet3 System Entry Point
About 1 min
ESPnet3 training is driven by System classes. A System is the entry point that binds configs to stage functions (prepare, train, decode, score, publish). This document explains the base interface, how stages are invoked, and how to extend it for custom tasks.
🧠BaseSystem at a glance
espnet3.systems.base.system.BaseSystem implements the common stage hooks:
class BaseSystem:
def __init__(self, train_config, eval_config=None):
self.train_config = train_config
self.eval_config = eval_config
self.exp_dir = Path(train_config.exp_dir)
def create_dataset(self): ...
def collect_stats(self): return collect_stats(self.train_config)
def train(self): return train(self.train_config)
def evaluate(self): self.decode(); return self.score()
def decode(self): return inference(self.eval_config)
def score(self): return score(self.eval_config)
def publish(self): ...- The constructor creates
exp_dirand stores the configs. evaluate()callsdecode()thenscore()for convenience.- Defaults delegate to stage functions in
espnet3.systems.base.{train,inference,score}.
✅ What you typically override
| Method / area | When you override it | Typical user responsibility |
|---|---|---|
create_dataset | Custom corpus preparation or manifest generation | Call recipe-specific create_dataset.py logic |
train | Extra steps before/after training | Tokenizer training, warmup, custom logging |
decode / score | Custom decoding pipeline or metrics | Hook into custom inference / scoring scripts |
publish | Packaging models for release | Upload to hubs, export ONNX, create tarballs |
How stages are executed
run.pyin each recipe instantiates a System and runs selected stages fromDEFAULT_STAGESviaespnet3.utils.stages.run_stages.- CLI flags choose stages (
--stage train decode) and pass per-stage overrides parsed byparse_stage_unknown_args. - Each stage receives the System instance, so overrides can be forwarded to methods (e.g.,
create_datasetkwargs).
Extending for a task: example ASRSystem
espnet3.systems.asr.system.ASRSystem customises a few hooks:
- create_dataset(func, kwargs): dynamically import and run a dataset creation function.
- train(): calls
BaseSystem.train(). - train_tokenizer(dataset_dir): builds text and runs SentencePiece training.
You can subclass BaseSystem to add task-specific steps (e.g., TTS vocoder training).
When to override
- create_dataset: add corpus-specific preparation or manifest generation.
- train: insert pre-training steps (tokenizer, LM warmup, checkpoint restoration).
- decode/score: swap in custom inference or metrics pipeline.
- publish: package artifacts, upload to model zoo, or export ONNX.
Keep overrides minimal; rely on the base helpers for standard behavior.
Config expectations
train_config.exp_diris required; System will create it.train_configis passed to training/collect_stats;eval_configis passed to decode/score. Keep train/eval configs separate to avoid accidental overrides.- Hydra configs can instantiate models, data modules, callbacks, and parallel settings; Systems only orchestrate stages.
Quick recipe usage
python run.py \
--stage train decode score \
--train_config conf/train.yaml \
--eval_config conf/eval.yamlThis loads configs, builds the System, and runs the chosen stages in order, using ESPnet3’s Provider/Runner and Lightning plumbing under the hood.
