ESPnet3 Inference Configuration

Masao SomekiAbout 2 min

ESPnet3 Inference Configuration

This page explains the inference.yaml schema used by the infer stage.

Important

Inference config is built around provider, runner, dataset test splits, and output writing. It is not just a model _target_ plus a dataloader.

Minimum required keys

Typical inference needs:

dataset.test
inference_dir
input_key
provider
runner

Common optional keys:

parallel
output_fn
output_keys
idx_key
batch_size
output_artifacts

Basic shape

recipe_dir: .
exp_tag:
exp_dir: ${recipe_dir}/exp/${exp_tag}
inference_dir: ${exp_dir}/${self_name:}

dataset:
  _target_: espnet3.components.data.data_organizer.DataOrganizer
  recipe_dir: ${recipe_dir}
  test:
    - name: test
      data_src_args:
        split: test
  preprocessor:

parallel:
  env: local
  n_workers: 1

input_key: speech
output_fn: src.inference.build_output

provider:
  _target_: espnet3.systems.base.inference_provider.InferenceProvider

runner:
  _target_: espnet3.systems.base.inference_runner.InferenceRunner

Main sections

Section	Description
`dataset.test`	Test set definitions. `name` becomes the output subdirectory.
`parallel`	Dask backend settings for runner-based execution.
`input_key`	Input field name, or a list of field names, passed into the model.
`output_fn`	Optional formatter that turns model output into a dict for SCP writing.
`output_keys`	Optional explicit list of SCP fields to write.
`idx_key`	Sample ID key in final outputs. Default is `utt_id`.
`provider`	Builds the dataset/model/runtime environment.
`runner`	Runs inference over indices and writes shard outputs.
`output_artifacts`	Optional artifact writers for non-scalar outputs.

provider and runner

The current base inference path requires both.

provider builds the runtime environment
runner executes inference and writes shard outputs

The default template uses:

provider:
  _target_: espnet3.systems.base.inference_provider.InferenceProvider

runner:
  _target_: espnet3.systems.base.inference_runner.InferenceRunner

input_key

input_key can be:

a single field name, such as speech
a list of field names, such as [speech, text]

If it is a list, inference builds a dict of those values and passes them to the model call.

output_fn

output_fn is optional.

If set, it must return a dict, or a list of dicts for batched inference
If omitted, the model output itself must already have the final dict shape

That final dict must contain utt_id unless you override idx_key.

Typical output:

{"utt_id": "utt1", "hyp": "hello world"}

output_keys and idx_key

idx_key defaults to utt_id
output_keys is optional

If output_keys is omitted, ESPnet3 infers output fields from the first sample result, excluding idx_key.

batch_size

If batch_size is set, InferenceRunner.forward receives a list of indices. If it is unset, it receives one index at a time.

Use batch_size: null when your model or output_fn only supports single-sample inference.

output_artifacts

Primitive values are written directly into SCP files. Non-scalar values can be written as artifacts and referenced from SCP.

Built-in types include:

wav
npy
json
pickle

Output directory layout

Inference writes outputs under:

${inference_dir}/<test_name>/

Example:

${inference_dir}/
  test-clean/
    hyp.scp
    ref.scp

When shard-local artifacts are needed, they are written under split-specific subdirectories and then merged.

Parallel Config

See how local, local GPU, and HPC backends are configured.

Inference Stage

Read the stage-level behavior behind `infer`.

Provider / Runner

Read the runtime contract used by inference.