Customize the model
Customize the model
If you want to change the model more seriously, the practical ESPnet3 pattern is usually:
- add
src/model.py - instantiate that model from
training.yaml - keep dataset and inference code aligned with the new model contract
For many recipes, that is all you need to know first.
Important
If the old task-side model path is too restrictive, put your model in egs3/<recipe>/<system>/src/model.py and instantiate it directly from training.yaml.
That is the clean ESPnet3 answer to "I want my own model now."
Typical recipe-local layout
egs3/<recipe>/<system>/
conf/
training.yaml
inference.yaml
src/
model.py
inference.py
dataset/
dataset.pyThe important point is that src/ is where recipe-local Python code lives.
Typical responsibilities:
src/model.py: model classsrc/inference.py: output formatting or inference helpers
The main config change
The key change is in training.yaml.
Old task-bridge style
task: espnet2.tasks.asr.ASRTask
model:
encoder: transformer
decoder: transformer
model_conf:
ctc_weight: 0.3Direct ESPnet3 recipe-local model
task:
model:
_target_: src.model.MyModel
hidden_size: 256
vocab_size: 500So the practical migration is:
- leave
taskunset - make
modela direct Hydra instantiation block
That is the main switch from system-driven to recipe-local model ownership.
A minimal src/model.py
At minimum, this can just be a normal PyTorch module.
import torch
class MyModel(torch.nn.Module):
def __init__(self, hidden_size: int, vocab_size: int):
super().__init__()
self.encoder = torch.nn.Linear(80, hidden_size)
self.head = torch.nn.Linear(hidden_size, vocab_size)
def forward(self, speech):
hidden = self.encoder(speech)
return self.head(hidden)Inference alignment
Training is not the only place that changes.
If the model output no longer matches the old inference assumptions, update the recipe-local inference helper too.
Typical location:
src/inference.pyFor example, if the model returns a custom object or a different hypothesis shape, output_fn should convert it into the final dict written by inference.
Conceptually:
def build_output(data, model_output, idx):
return {
"utt_id": data["utt_id"],
"hyp": ...,
"ref": data.get("text", ""),
}Then inference.yaml points to it:
output_fn: src.inference.build_outputReusing an existing checkpoint
This is where the old "load checkpoint" idea usually ends up.
If the recipe-local model still matches an older checkpoint closely enough, you can reuse that checkpoint here.
Typical cases are:
- full reuse when the architecture still matches
- partial reuse when only the backbone matches
- keep the backbone and replace the head
- use a PEFT-style wrapper model that owns the adaptation logic itself
So in practice, checkpoint reuse is usually part of custom-model work, not a separate topic.
A useful rule of thumb
Prefer the espnet2 task path when:
- the model is still basically the same
- only small config changes are needed
Prefer src/model.py when:
- the model itself is now recipe-local logic
- the old task abstraction is in the way
- future work will keep changing the architecture
Related pages
Custom dataset
Make sure the dataset contract matches the recipe-local model.
Data pipeline
See how dataset outputs and collate behavior affect custom models.
Training Config
See where the recipe-local model is instantiated and optimized.
Inference Config
See how the custom model connects to provider, runner, and output_fn.
