Cluster and parallel

About 4 min

Cluster and parallel

If you are coming from ESPnet2, this is one of the biggest mental-model changes.

In ESPnet2, parallel execution is mostly orchestrated by shell scripts such as:

egs2/<recipe>/asr1/asr.sh
egs2/<recipe>/asr1/local/data.sh

In ESPnet3, the shell layer is much thinner. Parallel work moves into Python through:

espnet3/parallel/parallel.py
espnet3/parallel/env_provider.py
espnet3/parallel/base_runner.py
espnet3/systems/base/inference_provider.py
espnet3/systems/base/inference_runner.py

ESPnet2: shell controls the parallelism

In ESPnet2, recipes usually expose shell variables such as:

nj
inference_nj
gpu_inference
train_cmd
cuda_cmd

Then the recipe script splits inputs and fans out jobs itself.

Typical patterns in asr.sh are:

utils/split_scp.pl ...
JOB=1:${_nj} ...
run.pl, queue.pl, slurm.pl

Conceptually, the script does this:

count input lines
choose _nj
split one SCP or text file into _nj shards
submit one shell job per shard
merge the outputs afterward

That is why ESPnet2 recipes often feel like:

one big shell controller
many stage-local shell loops
one-off splitting and merging logic per stage

Example: ESPnet2 decoding pattern

The common ESPnet2 decoding flow looks like this:

_nj=$(min "${inference_nj}" "$(<${key_file} wc -l)")
for n in $(seq "${_nj}"); do
    split_scps+=" ${_logdir}/keys.${n}.scp"
done
utils/split_scp.pl "${key_file}" ${split_scps}

${_cmd} JOB=1:"${_nj}" "${_logdir}"/asr_inference.JOB.log \
    python ... --key_file "${_logdir}"/keys.JOB.scp

The important point is not the exact command. The important point is where the responsibility lives:

the shell script decides the shard count
the shell script writes shard files
the shell script launches one job per shard

Example: ESPnet2 data preparation pattern

local/data.sh usually follows the same style.

One shell stage does:

download data
extract data
run Python helpers
sort files
split train/dev/test

For example, egs2/an4/asr1/local/data.sh:

downloads AN4
runs local/data_prep.py
sorts text, wav.scp, utt2spk
creates train_dev and train_nodev

So in ESPnet2, "parallel or cluster behavior" is usually recipe-shell behavior.

ESPnet3: Python owns the execution pattern

ESPnet3 moves that responsibility into Python objects.

The core split is:

EnvironmentProvider: build dataset/model/tokenizer/runtime env
BaseRunner: apply one static compute function over indices

This means the execution pattern is no longer:

"split files in shell, then call Python"

It becomes:

"describe the runtime env in Python, then let the runner execute locally or on a cluster"

Side-by-side comparison

Topic	ESPnet2	ESPnet3
Parallel control surface	shell vars like `nj`, `inference_nj`, `train_cmd`	YAML `parallel` block plus provider/runner code
Work splitting	shell-side SCP splitting	Python runner over indices
Worker environment	implicit in shell command line and filesystem	explicit env dict from provider
Cluster backend	`run.pl`, `queue.pl`, `slurm.pl` wrappers	Dask client built from `parallel.env` and `parallel.options`
Merge behavior	shell concatenation and stage-local scripts	runner hooks such as `merge()`
Reuse across local/HPC	often stage-specific shell logic	same Python code can run local or Dask

What replaces nj and inference_nj

In ESPnet3, the closest replacement is usually:

parallel.n_workers
optional runner batch_size

But the meaning is slightly different.

nj in ESPnet2 usually means:

"how many shell jobs should I split this file into?"

n_workers in ESPnet3 usually means:

"how many worker processes should the runtime create?"

So the old and new knobs are related, but not identical.

Parallel Config

See how n_workers and backend settings are expressed in YAML.

Parallel Runtime

See how ESPnet3 maps work locally or through Dask.

Config Diff

See where old nj-style recipe settings usually move.

What replaces run.pl / queue.pl

In ESPnet3, the backend choice is part of config.

Typical examples:

parallel:
  env: local
  n_workers: 8

parallel:
  env: local_gpu
  n_workers: 4

parallel:
  env: slurm
  n_workers: 16
  options:
    queue: batch
    cores: 4
    memory: 32GB

So instead of changing shell wrappers, you change the parallel config and keep the Python execution path the same.

Parallel Config

See local, local GPU, and cluster backend examples.

Provider and Runner

See how backend-independent work is written once in Python.

System and Stages

See how stage code receives config and launches stage behavior.

What replaces shell-side shard logic

In ESPnet3, shard logic lives in the runner layer.

That can mean:

iterating plain indices locally
mapping tasks through Dask
using reducer hooks to write shard outputs
merging shard outputs in merge()

The closest current examples are:

espnet3/systems/base/inference_runner.py
espnet3/components/data/collect_stats.py

Provider and Runner

See BaseRunner hooks, worker envs, and merge behavior.

Inference Provider

See the provider contract used by parallel inference.

Stats Collection

See how collect_stats uses dataloader and runner-style execution.

Data preparation: what changes the most

This is the place where ESPnet2 users often expect more shell.

In ESPnet2:

local/data.sh is often the center of gravity
stage logic is mostly shell + small Python helpers

In ESPnet3:

recipe-local dataset/builder.py owns source preparation and build checks
heavier inner loops can move into provider/runner code

So the mapping is roughly:

ESPnet2	ESPnet3
`local/data.sh`	`dataset/builder.py`
shell stage loop	`build()` plus optional provider/runner helper
split files in shell	iterate indices in a runner

Data Pipeline

See how local/data.sh maps to DatasetBuilder and dataset modules.

Parallel Data Prep

See when data preparation should use provider/runner execution.

Dataset Config

See how prepared data becomes train, valid, and test config.

A good migration rule

When reading an ESPnet2 recipe, ask:

Which part is only stage ordering?
Which part is only config?
Which part is the real per-item computation?

Then convert them like this:

stage ordering -> run.py stage list
config -> training.yaml, inference.yaml, metrics.yaml, ...
per-item computation -> dataset builder, provider, or runner

When you still do not need provider/runner

Do not over-apply the abstraction.

If a step is only:

one archive download
one extraction
one quick manifest rewrite

then plain builder.py code is often enough.

Use provider/runner when the work is actually parallel-shaped:

many files
many utterances
many download targets
one repeated compute kernel over indices

Data pipeline

See how dataset builders and recipe-local dataset modules replace old shell prep flows.

Parallel overview

Read the developer-facing provider and runner architecture.

Provider / Runner

See the core contract for worker env construction and execution.

Parallel config

See how local, local GPU, and HPC backends are configured in YAML.

Task to system

See how ESPnet2 task-level logic maps to ESPnet3 systems and stages.