ESPnet3 Metrics
ESPnet3 Metrics
This page describes how to implement custom metrics for the measure stage. It focuses on the metric interface (AbsMetrics) and the I/O contracts between inference outputs (.scp) and measurement (measures.json).
What the measure stage passes to a metric
Metrics are instantiated from measure.yaml and must implement espnet3.components.metrics.abs_metric.AbsMetrics.
At runtime, the measure stage loads .scp files under:
<infer_dir>/<test_name>/*.scp
aligns them by utterance ID, and passes them to your metric as a dictionary of aligned lists:
data["utt_id"]: aligned utterance IDsdata["ref"],data["hyp"], ...: aligned values loaded from SCPs (strings)
The contract is:
- every list in
datahas the same length asdata["utt_id"] - alignment is by utterance ID, not by file order
- if any required SCP is missing or utterance IDs differ across files, measurement fails early
AbsMetrics interface
from espnet3.components.metrics.abs_metric import AbsMetrics
class MyMetric(AbsMetrics):
def __call__(self, data, test_name, output_dir):
...Arguments:
data:Dict[str, List[str]](aligned lists; keys come frommetrics[*].inputs)test_name: the current test set name (e.g.,test-clean)output_dir: the inference output root (typicallyinfer_dir)
Return value:
- must be JSON-serializable (it will be written into
<infer_dir>/measures.json)
SCP format and alignment
Each SCP file is a text file containing one utterance per line:
utt_id VALUE...Only the first whitespace splits the ID from the value, so the value may contain spaces (useful for full sentences).
Measurement reads the requested SCPs and aligns them by the set of utterance IDs:
- IDs are collected from each SCP and must match across all requested inputs
- IDs are sorted to produce a stable order
- values are gathered in that order to form
data[...]lists
Configure a metric in measure.yaml
Each metric entry has:
metric: Hydra target for yourAbsMetricssubclassinputs(optional): which SCP keys to load, and how to name them indata
infer_dir: exp/my_exp/infer
dataset:
_target_: espnet3.components.data.data_organizer.DataOrganizer
test: []
metrics:
- metric:
_target_: my_pkg.metrics.WER
inputs:
ref: ref # reads <infer_dir>/<test_name>/ref.scp into data["ref"]
hyp: hyp # reads <infer_dir>/<test_name>/hyp.scp into data["hyp"]Using aliases (mapping data keys to SCP filenames)
inputs supports aliasing: alias -> filename.
This is useful when inference produces multiple outputs such as hyp0.scp and hyp1.scp, but your metric expects the key name hyp:
metrics:
- metric:
_target_: my_pkg.metrics.WER
inputs:
ref: ref
hyp: hyp0 # reads hyp0.scp into data["hyp"]Minimal metric example (WER)
from espnet3.components.metrics.abs_metric import AbsMetrics
import jiwer
class WER(AbsMetrics):
def __call__(self, data, test_name, output_dir):
refs = data["ref"]
hyps = data["hyp"]
wer = jiwer.wer(refs, hyps)
return {"WER": round(wer * 100, 2)}Audio/file-based metrics (example)
For speech generation tasks, SCP values are often file paths (strings). For example, inference can write:
utt001 <infer_dir>/<test_name>/audio/utt001.wav
utt002 <infer_dir>/<test_name>/audio/utt002.wavYour metric receives those paths as strings in data["hyp"] (or any key you map via inputs) and can load audio on demand.
