Demo Runtime
Demo Runtime
This page explains the runtime boundary between DemoSession and InferenceModel.
The important files are:
espnet3/publication/demo/session.pyespnet3/publication/inference_model.py
Core idea
The current demo runtime is split into two layers:
DemoSession: demo-side session and UI wiringInferenceModel: packaged-model inference runtime
DemoSession does not rebuild the inference stack itself. It loads one InferenceModel and then calls it.
Responsibility split
DemoSession
DemoSession owns:
- loaded
demo.yaml - resolved title and description
- UI input specs
- UI output specs
model.call_args- UI asset registry
Its job is to turn Gradio values into one inference call and map the returned values back to the UI.
InferenceModel
InferenceModel owns:
- loading the packed model bundle or remote pretrained tag
- reading packed
meta.yaml - finding packed
conf/inference.yaml - rebuilding the backend inference model
- applying
input_key - applying the packed
output_fnwhen configured
Its job is to behave like a small callable inference API.
Runtime flow
The runtime flow is:
- the recipe launcher calls
load_demo_session(...) load_demo_session(...)loadsdemo.yamlload_demo_session(...)buildsInferenceModel- the launcher builds Gradio components from
session.input_specsandsession.output_specs - the launcher calls
session.create_inference_fn(...) - button clicks call that function
- the function calls
session.model(item, **session.call_args)
So the call chain is:
Gradio UI
-> DemoSession.create_inference_fn()
-> InferenceModel.__call__()
-> packed inference config / backend modelHow DemoSession creates InferenceModel
load_demo_session(...) reads demo_cfg.model.dir_or_tag.
Then:
- if it resolves to a local directory, it calls
InferenceModel.from_packed(...) - otherwise, it calls
InferenceModel.from_pretrained(...)
This means DemoSession does not care whether the model came from:
../model_pack- some downloaded pretrained tag
That detail is absorbed by InferenceModel.
What InferenceModel loads from the pack
For a local packed model, InferenceModel.from_packed(...) loads:
meta.yaml- the packed
conf/inference.yaml - the backend model declared there
- the packed
output_fn, if one is configured
So the demo runtime does not need a separate copy of inference.yaml in the demo bundle. It reads inference behavior from the model pack.
What DemoSession passes into InferenceModel
DemoSession.create_inference_fn(...) builds a callback that:
- receives positional values from Gradio
- maps them to
ui.inputs[].key - builds one input dict
- calls
self.model(item, **self.call_args)
Example:
ui:
inputs:
- key: speech
type: audio
outputs:
- key: hyp
type: textThis becomes:
item = {"speech": value_from_gradio}
result = session.model(item, **session.call_args)Then DemoSession picks result["hyp"] and returns it to the output component.
Why this split is useful
This split keeps the demo layer thin.
DemoSessiononly understands demo config and UI specsInferenceModelonly understands packed inference behavior
That means:
- demo UI code can stay recipe-local
- packed model behavior stays reusable outside demos
- the same packed model can be called from demos, tests, or plain Python
