ESPnet3 Inference Configuration
ESPnet3 Inference Configuration
This page explains the inference.yaml schema used by the infer stage.
Important
Inference config is built around provider, runner, dataset test splits, and output writing. It is not just a model _target_ plus a dataloader.
Minimum required keys
Typical inference needs:
dataset.testinference_dirinput_keyproviderrunner
Common optional keys:
paralleloutput_fnoutput_keysidx_keybatch_sizeoutput_artifacts
Basic shape
recipe_dir: .
exp_tag:
exp_dir: ${recipe_dir}/exp/${exp_tag}
inference_dir: ${exp_dir}/${self_name:}
dataset:
_target_: espnet3.components.data.data_organizer.DataOrganizer
recipe_dir: ${recipe_dir}
test:
- name: test
data_src_args:
split: test
preprocessor:
parallel:
env: local
n_workers: 1
input_key: speech
output_fn: src.inference.build_output
provider:
_target_: espnet3.systems.base.inference_provider.InferenceProvider
runner:
_target_: espnet3.systems.base.inference_runner.InferenceRunnerMain sections
| Section | Description |
|---|---|
dataset.test | Test set definitions. name becomes the output subdirectory. |
parallel | Dask backend settings for runner-based execution. |
input_key | Input field name, or a list of field names, passed into the model. |
output_fn | Optional formatter that turns model output into a dict for SCP writing. |
output_keys | Optional explicit list of SCP fields to write. |
idx_key | Sample ID key in final outputs. Default is utt_id. |
provider | Builds the dataset/model/runtime environment. |
runner | Runs inference over indices and writes shard outputs. |
output_artifacts | Optional artifact writers for non-scalar outputs. |
provider and runner
The current base inference path requires both.
providerbuilds the runtime environmentrunnerexecutes inference and writes shard outputs
The default template uses:
provider:
_target_: espnet3.systems.base.inference_provider.InferenceProvider
runner:
_target_: espnet3.systems.base.inference_runner.InferenceRunnerinput_key
input_key can be:
- a single field name, such as
speech - a list of field names, such as
[speech, text]
If it is a list, inference builds a dict of those values and passes them to the model call.
output_fn
output_fn is optional.
- If set, it must return a dict, or a list of dicts for batched inference
- If omitted, the model output itself must already have the final dict shape
That final dict must contain utt_id unless you override idx_key.
Typical output:
{"utt_id": "utt1", "hyp": "hello world"}output_keys and idx_key
idx_keydefaults toutt_idoutput_keysis optional
If output_keys is omitted, ESPnet3 infers output fields from the first sample result, excluding idx_key.
batch_size
If batch_size is set, InferenceRunner.forward receives a list of indices. If it is unset, it receives one index at a time.
Use batch_size: null when your model or output_fn only supports single-sample inference.
output_artifacts
Primitive values are written directly into SCP files. Non-scalar values can be written as artifacts and referenced from SCP.
Built-in types include:
wavnpyjsonpickle
Output directory layout
Inference writes outputs under:
${inference_dir}/<test_name>/Example:
${inference_dir}/
test-clean/
hyp.scp
ref.scpWhen shard-local artifacts are needed, they are written under split-specific subdirectories and then merged.
