ESPnet3 Inference Configuration
ESPnet3 Inference Configuration
Minimum required keys
Typical inference runs need:
dataset.testmodelproviderrunnerinput_keyinference_dir
Optional:
output_fnoutput_keysidx_keybatch_sizeparallel
Config sections overview
| Section | Description |
|---|---|
recipe_dir, exp_dir, inference_dir, ... | path scaffold for inference outputs |
dataset | test-set definitions resolved through DataOrganizer |
model | inference-time model entrypoint and arguments |
input_key | which dataset field or fields are passed into the model |
output_fn | import path to the formatting function used by InferenceRunner |
output_keys, idx_key | controls SCP output names and utterance ID key |
batch_size | enables batched runner execution |
parallel | local, multi-worker, or cluster execution settings |
Default values
| Key | Default value |
|---|---|
recipe_dir | . |
data_dir | ${recipe_dir}/data |
exp_tag | empty |
exp_dir | ${recipe_dir}/exp/${exp_tag} |
inference_dir | ${exp_dir}/${self_name:} |
parallel.env | local |
parallel.n_workers | 1 |
input_key | speech |
output_fn | src.inference.build_output |
provider._target_ | espnet3.systems.base.inference_provider.InferenceProvider |
runner._target_ | espnet3.systems.base.inference_runner.InferenceRunner |
Naming behavior
If training_config is also loaded in the same run.py invocation, inference inherits these two values from training:
exp_tagexp_dir
If inference runs standalone, inference.yaml must carry its own experiment identity, typically through:
exp_tag: my_eval
exp_dir: ${recipe_dir}/exp/${exp_tag}Minimal example
exp_tag: inference_beam5
dataset:
test:
- name: test
data_src: mini_an4/asr
data_src_args:
split: test
model:
_target_: espnet2.bin.asr_inference.Speech2Text
asr_train_config: ${exp_dir}/config.yaml
asr_model_file: ${exp_dir}/last.ckptModel definition
model should point to an inference-time callable. For ESPnet2-compatible recipes this is often an espnet2.bin.* helper such as espnet2.bin.asr_inference.Speech2Text.
When you use a custom inference stack, provide your own _target_ and arguments. The instantiated object should accept the inputs named by input_key, and output_fn should know how to interpret the return value.
See Model for model implementation details.
Output directory layout
Inference writes SCP outputs under inference_dir, one folder per test-set name:
${inference_dir}/
test-clean/
hyp.scp
wav.scp
test-other/
hyp.scpThe exact filenames are determined by:
output_keysif it is set- otherwise the keys returned by
output_fnfor the first sample, excludingidx_key
Output control
Useful keys:
idx_key: sample ID key, defaultutt_idoutput_keys: explicit SCP filenames to writeoutput_artifacts: artifact-writing config for non-scalar outputs
Built-in artifact behavior:
dict-> JSONnumpy.ndarray-> NPY- CPU
torch.Tensor-> NPY - unknown Python object -> pickle
Supported artifact types:
| Type | Saved as | Notes |
|---|---|---|
wav | .wav | requires sample_rate |
npy | .npy | good for NumPy arrays or CPU tensors |
json | .json | good for dict outputs |
pickle | .pkl | fallback for custom Python objects |
writer | custom path | uses a user-defined function via _target_ |
See Inference stage for:
- WAV output example
- custom writer example
- artifact directory structure
Batched execution
If batch_size is set, InferenceRunner.forward() receives a list of indices and passes list-valued inputs to the model.
In that case:
databecomes a list of dataset samplesidxbecomes a list of indicesoutput_fnmust return a list of output dicts
If you do not want batched inference, leave batch_size unset.
Parallel execution
Minimal local example:
parallel:
env: local
n_workers: 1Minimal SLURM example:
parallel:
env: slurm
n_workers: 8
options:
queue: gpu
cores: 8
processes: 1
memory: 16GB
walltime: 30:00
job_extra_directives:
- "--gres=gpu:1"Parallel execution details are documented here:
