ESPnet3 Parallel

Masao SomekiLess than 1 minute

ESPnet3 Parallel

The parallel execution layer for provider/runner workflows — shard planning, local and Dask-backed dispatch, resume, and merge. For YAML settings only, see Parallel config.

01 / overview

Four roles, one pipeline

ESPnet3 parallel processing is built around four components. Each has a clear responsibility. Colors are used consistently throughout this guide.

Driver-sideWorker-sideProvider / initforward()done / complete

BaseRunner

The orchestrator (Driver)

Splits data into shards, dispatches them to workers, and merges results when all workers finish. It never processes data itself.

__call__()_plan_shards()merge()

forward()

The work unit (Worker)

A pure function that processes one sample or batch. Must be a @staticmethod — no self — so Dask can serialize and send it to remote workers.

@staticmethodpickle-safe

EnvironmentProvider

The initialization blueprint

Defines how and where to build the dataset and model — locally on the driver, or once per worker process. Two methods, two different timings.

build_env_local()build_worker_setup_fn()

Cluster / Client

The execution environment

Set once with set_parallel(config). Changing env switches from running everything locally to submitting jobs to a SLURM cluster — no other code changes needed.

env: localenv: slurmenv: local_gpu

ℹ

You don't need to know Dask. ESPnet3 uses Dask internally, but you only interact with set_parallel(config). The BaseRunner handles everything else.

02 / execution modes

Execution modes: who runs where

A single value in parallel_config.env fundamentally changes where execution happens. Switch tabs to see the flow for each mode.

🖥 Your machine (Driver)

__call__(indices)

Plans shards, writes manifest.json

↓ _run_local()

_run_one_shard()

Processes shards sequentially (no parallelism)

↓

forward(idx, dataset, model)

Runs in the same process as the driver

↓ merge()

merge(shard_dirs)

Combines all shards into final output

📁 Filesystem

output_dir/

manifest.json
split.0/ → done
split.1/ → done
...

Local mode: No Dask, no workers. The driver processes all shards in sequence. Progress is shown with tqdm.

# local mode (config.yaml)
parallel:
  env: local
  n_workers: 1    # not used in local mode
  options: {}

03 / provider

Provider: two methods, two timings

The biggest source of confusion with EnvironmentProvider is that the two methods are called at completely different times and places.

build_env_local()called: on the driver, right now

1You call runner(indices)…

2_run_local() calls provider.build_env_local() directly

3The returned dict (dataset, model, ...) is used immediately

local mode flow

env = provider.build_env_local()
# dict is returned immediately
forward(idx, **env)

build_worker_setup_fn()called: on each worker at startup

1Driver calls provider.build_worker_setup_fn() → returns a function

2That function is wrapped in DictReturnWorkerPlugin and sent to workers

3When each worker starts, setup() is called — dataset/model are built on the worker

distributed mode flow

setup_fn = provider.build_worker_setup_fn()
# nothing is initialized yet
# setup_fn is a closure
plugin = DictReturnWorkerPlugin(setup_fn)
# when a worker starts:
env = setup_fn() # runs on the worker

⚠

Why return a function instead of the env directly?
SLURM workers run on different machines. If the driver initialized the model and tried to send it, the full model weights would need to be serialized and transferred over the network. Instead, only the lightweight initialization recipe (closure) is sent, and each worker builds its own copy locally — significantly reducing transfer overhead.

Automatic env injection: `wrap_func_with_worker_env`

When calling forward() on a worker, you don't pass dataset or model explicitly. wrap_func_with_worker_env inspects the function signature and matches argument names against the worker's env keys — injecting them automatically.

Argument names matching env keys are injected automatically

 @staticmethod
def forward(idx, dataset, model, device, **env):
 # dataset, model, device → injected from worker env
 # idx → passed by client.map()

# Worker env (returned by setup_fn()):
env = { "dataset": ds, "model": md, "device": dev, ... }
 ↑ key names match forward's argument names → auto-injected

Implementation example: subclassing `InferenceProvider`

class EnvironmentProvider(ABC):
    """Abstract base — implement both methods."""

    @abstractmethod
    def build_env_local(self) -> dict:
        """Local execution: called on the driver immediately."""
        ...

    @abstractmethod
    def build_worker_setup_fn(self) -> Callable:
        """Distributed execution: the returned function is called on each worker."""
        ...

04 / shard lifecycle

Shard lifecycle

What happens internally between calling runner(range(200)) and getting the result back. Step through each stage.

① Indices received

Calling runner(range(200)) triggers BaseRunner.__call__(). All indices are converted to a list. If batch_size is set, they are grouped into batches here.

ℹ

Where is output_dir set? It is passed as a constructor argument to the Runner. All shard files, manifest.json, and final outputs are written under this directory.

runner = MyRunner(
    provider,
    output_dir="exp/decode",   # ← set here
    shard_subdir="train",       # shards go under output_dir/train/
    resume=True,               # skip already-done shards
)
# When using collect_stats(), output_dir is passed as a function
# argument and forwarded to the Runner automatically.

indices = range(200) → 200 indices

…

# First thing BaseRunner.__call__ does
indices = list(indices)
if self.batch_size is not None:
    indices = [
        list(indices[i : i + self.batch_size])
        for i in range(0, len(indices), self.batch_size)
    ]

05 / config builder

Config builder

Select an execution environment and fill in the values. The YAML config and the generated SLURM script update in real time.

Execution environment

env execution mode

n_workers number of workers (SLURM: job count)

parallel_config.yaml● live

parallel:
  env: local
  n_workers: 4
  options: {}

ℹ

Other cluster types (PBS, SGE, LSF, HTCondor, ...): Change env to "pbs", "sge", "lsf", or "htcondor". The available options vary by scheduler — see the dask-jobqueue documentation.

Provider & Runner API

Full subclass contract — EnvironmentProvider, BaseRunner, open_writers, write_record, merge, and the static forward() rule.

InferenceProvider

The stage-facing provider used by the base inference path. Subclass this when adding a new inference stage.

Data preparation pattern

How the same provider/runner pattern is applied for dataset preparation and collect_stats.

Parallel config reference

All YAML keys for parallel backends — env, n_workers, options — with backend-specific notes.

ESPnet3 Parallel

ESPnet3 Parallel

Four roles, one pipeline

The orchestrator (Driver)

The work unit (Worker)

The initialization blueprint

The execution environment

Execution modes: who runs where

Provider: two methods, two timings

Automatic env injection: wrap_func_with_worker_env

Implementation example: subclassing InferenceProvider

Shard lifecycle

Config builder

Related pages

Automatic env injection: `wrap_func_with_worker_env`

Implementation example: subclassing `InferenceProvider`