ESPnet3 Demo Guide
ESPnet3 Demo Guide
This page explains how to create and configure interactive Gradio demos.
A key advantage is that demos reuse your existing inference code (providers/runners/models), so you do not need to write extra demo-specific Python.
1. Run
python run.py --stages pack_demo upload_demo --demo_config conf/demo.yaml2. Outputs
After pack_demo, ESPnet3 writes a runnable Gradio app into the output directory (default is demo/ if you set pack.out_dir: demo).
cd demo
python app.pyThis starts a local Gradio server. Open the printed URL in your browser.
[!IMPORTANT]
gradiois required for local demo execution. It can be installed usingpip install gradio.
After packing, the output directory contains the runnable app, configs, and links to your recipe assets. Example:
demo/
βββ app.py
βββ config
β βββ infer.yaml
βββ data -> ../data/
βββ demo.yaml
βββ exp -> ../exp/
βββ README.md
βββ requirements.txt3. Configuration
Keep the core settings in demo.yaml. For the full list, see Demo configuration.
| Config section | Description |
|---|---|
infer_config | Path to the inference config used by the demo runtime. |
ui | UI layout and component definitions. |
inputs | Input field definitions and preprocessing mappings. |
outputs | Output field definitions and postprocessing mappings. |
UI configuration (Gradio)
UI is configured under ui in demo.yaml. The demo app is generated from this config and wires inputs/outputs directly to your inference runner.
The settings in the ui sections can be used to override the system defaults (defined in espnet3.systems.<system>.demo), if available. If there are no default settings defined for the target system, ui must be manually configured in demo.yaml.
The UI section contains the following fields:
| Field | Description |
|---|---|
title | App title shown at the top of the demo page. |
description | Markdown text shown under the title. |
article | Optional Markdown section shown at the bottom. |
article_path | Path to a Markdown file to load as article. If set, it overrides article. |
button.label | Label for the Run button. |
inputs | List of input component configs (name, type, and type-specific fields). |
outputs | List of output component configs (name, type, and type-specific fields). |
Demo README
After pack_demo, the packed demo directory contains a README.md and the demo UI renders it as the page article by default. To change what is shown in the demo screen, edit demo/README.md in the packed directory (or set ui.article_path to a different Markdown file).
Sample UI Config
ui:
title: "ESPnet3 Demo"
description: "Run inference with your existing provider/runner."
button:
label: "Run"
inputs:
- name: speech
type: audio
sources: [mic, upload]
- name: lang
type: dropdown
choices: [en, ja]
outputs:
- name: text
type: textbox
lines: 24. Developer Notes
Supported component types
Each entry in inputs/outputs requires name and type. The following type values and key fields are supported:
| Type | Key Fields |
|---|---|
audio | sources (mic/upload), audio_type (numpy by default) |
textbox | lines, placeholder |
dropdown | choices, value |
number | value |
slider | min, max, step, value |
checkbox | value |
image | |
file |
The component name becomes the key used by the demo runtime, so it must match the expected inference input/output mapping.
For audio inputs, Gradio returns (sample_rate, np.ndarray). The demo runtime normalizes this to a float32 waveform array before passing it to the runner.
UI input values are placed into a single-item dataset. The runner receives that dataset and should read the inputs from dataset[0]. Only extra_kwargs from demo.yaml are passed as kwargs to the runner.
Under the hood, the demo runtime packages those UI inputs into a single-item dataset. This keeps the call pattern consistent with standard inference runners (forward(idx, dataset=..., model=...)) while still letting you pass simple UI fields.
Minimal demo dataset (conceptual behavior):
class SingleItemDataset:
def __init__(self, item):
self._item = item
def __len__(self):
return 1
def __getitem__(self, idx):
if idx != 0:
raise IndexError(idx)
# Returns UI-defined fields (e.g., speech, lang) as a dict.
return self._itemWith this dataset, a runner can still use the familiar signature and pull values from dataset[0]. Example runner implementation that matches the UI sample above:
from espnet3.parallel.base_runner import BaseRunner
class DemoRunner(BaseRunner):
@staticmethod
def forward(idx, dataset=None, model=None, **_):
item = dataset[idx]
hyp = model(item["speech"], lang=item.get("lang"))
return {"hyp": hyp}Output mapping (output_keys)
When outputs are defined, output_keys maps UI output names to keys returned by your runner/model result. This lets you return a structured dict (e.g., {"hyp": ...}) and map it to UI outputs. Example:
output_keys:
text: hypIf output_keys is missing but outputs are defined, demo runtime raises an error.
System defaults (ASR example)
For ASR, defaults live in espnet3/systems/asr/demo.py:
build_ui_default()defines the default input/output components.build_ui(demo_cfg)optionally modifies defaults using demo config.build_inference_default()defines defaultoutput_keysandextra_kwargs.
Minimal ASR UI defaults:
def build_ui_default():
return {
"title": "ASR Demo",
"inputs": [{"name": "speech", "type": "audio", "sources": ["mic", "upload"]}],
"outputs": [{"name": "text", "type": "textbox"}],
}