Demo Runtime

Masao SomekiAbout 2 min

Demo Runtime

This page explains the runtime boundary between DemoSession and InferenceModel.

The important files are:

espnet3/publication/demo/session.py
espnet3/publication/inference_model.py

Core idea

The current demo runtime is split into two layers:

DemoSession: demo-side session and UI wiring
InferenceModel: packaged-model inference runtime

DemoSession does not rebuild the inference stack itself. It loads one InferenceModel and then calls it.

Responsibility split

DemoSession

DemoSession owns:

loaded demo.yaml
resolved title and description
UI input specs
UI output specs
model.call_args
UI asset registry

Its job is to turn Gradio values into one inference call and map the returned values back to the UI.

InferenceModel

InferenceModel owns:

loading the packed model bundle or remote pretrained tag
reading packed meta.yaml
finding packed conf/inference.yaml
rebuilding the backend inference model
applying input_key
applying the packed output_fn when configured

Its job is to behave like a small callable inference API.

Runtime flow

The runtime flow is:

the recipe launcher calls load_demo_session(...)
load_demo_session(...) loads demo.yaml
load_demo_session(...) builds InferenceModel
the launcher builds Gradio components from session.input_specs and session.output_specs
the launcher calls session.create_inference_fn(...)
button clicks call that function
the function calls session.model(item, **session.call_args)

So the call chain is:

Gradio UI
  -> DemoSession.create_inference_fn()
  -> InferenceModel.__call__()
  -> packed inference config / backend model

How DemoSession creates InferenceModel

load_demo_session(...) reads demo_cfg.model.dir_or_tag.

Then:

if it resolves to a local directory, it calls InferenceModel.from_packed(...)
otherwise, it calls InferenceModel.from_pretrained(...)

This means DemoSession does not care whether the model came from:

../model_pack
some downloaded pretrained tag

That detail is absorbed by InferenceModel.

What InferenceModel loads from the pack

For a local packed model, InferenceModel.from_packed(...) loads:

meta.yaml
the packed conf/inference.yaml
the backend model declared there
the packed output_fn, if one is configured

So the demo runtime does not need a separate copy of inference.yaml in the demo bundle. It reads inference behavior from the model pack.

What DemoSession passes into InferenceModel

DemoSession.create_inference_fn(...) builds a callback that:

receives positional values from Gradio
maps them to ui.inputs[].key
builds one input dict
calls self.model(item, **self.call_args)

Example:

ui:
  inputs:
    - key: speech
      type: audio
  outputs:
    - key: hyp
      type: text

This becomes:

item = {"speech": value_from_gradio}
result = session.model(item, **session.call_args)

Then DemoSession picks result["hyp"] and returns it to the output component.

Why this split is useful

This split keeps the demo layer thin.

DemoSession only understands demo config and UI specs
InferenceModel only understands packed inference behavior

That means:

demo UI code can stay recipe-local
packed model behavior stays reusable outside demos
the same packed model can be called from demos, tests, or plain Python

Pack Pipeline

See how `demo.yaml` and `model.dir_or_tag` are prepared.

UI Definition

See how UI specs are defined and turned into components.

Demo Config

See the config fields consumed by `DemoSession`.