ESPnet3 Parallel Configuration

Masao SomekiAbout 3 min

ESPnet3 Parallel Configuration

This page explains the parallel block used by Dask-based execution in ESPnet3.

Important

parallel is separate from Lightning training parallelism. Use trainer.devices, trainer.num_nodes, and trainer.strategy for model training. Use parallel for runner-based work built on espnet3.parallel.

Basic shape

Most configs use this shape:

parallel:
  env: local
  n_workers: 4
  options: {}

The three main fields are:

Field	Meaning
`env`	Which Dask backend to use
`n_workers`	How many workers or cluster jobs to request
`options`	Backend-specific arguments forwarded to the Dask cluster class

ESPnet3 reads this block through espnet3.parallel.parallel.set_parallel() and creates a client with build_client().

What env values are supported

These backends are implemented in espnet3/parallel/parallel.py.

`env`	Backend	Typical use
`local`	`dask.distributed.LocalCluster`	CPU work on one machine
`local_gpu`	`dask_cuda.LocalCUDACluster`	Multi-GPU work on one machine
`slurm`	`dask_jobqueue.SLURMCluster`	SLURM-based HPC
`pbs`	`dask_jobqueue.PBSCluster`	PBS-based HPC
`sge`	`dask_jobqueue.SGECluster`	SGE-based HPC
`lsf`	`dask_jobqueue.LSFCluster`	LSF-based HPC
`htcondor`	`dask_jobqueue.HTCondorCluster`	HTCondor-based HPC
`moab`	`dask_jobqueue.MoabCluster`	Moab-based HPC
`oar`	`dask_jobqueue.OARCluster`	OAR-based HPC
`ssh`	`dask.distributed.SSHCluster`	A small set of reachable machines
`kube`	`dask_kubernetes.KubeCluster`	Kubernetes

Note

If no parallel config is registered, ESPnet3 falls back to local with n_workers: 1.

Local CPU example

Use local when you want a simple multi-process setup on one machine.

parallel:
  env: local
  n_workers: 4
  options:
    threads_per_worker: 1

This is a good default for:

local development
CPU-heavy preprocessing
debugging provider / runner logic before moving to a cluster

Local GPU example

Use local_gpu when one worker should run on each local GPU.

parallel:
  env: local_gpu
  n_workers: 2
  options: {}

ESPnet3 uses dask_cuda.LocalCUDACluster for this backend.

Warning

For local_gpu, n_workers must not exceed the number of visible GPUs. ESPnet3 raises an error if it does.

This is a good fit for:

a single multi-GPU workstation
batched inference on one node
debugging GPU-bound runner code before moving to HPC

HPC example: SLURM

Use a JobQueue backend when workers should be launched through the scheduler.

parallel:
  env: slurm
  n_workers: 4
  options:
    queue: gpu
    account: my-lab
    cores: 8
    processes: 1
    memory: 32GB
    walltime: "04:00:00"
    interface: ib0
    log_directory: logs/dask
    job_script_prologue:
      - "module load cuda"
      - "source .pixi/envs/default/bin/activate"
    job_extra_directives:
      - "--gres=gpu:1"

In this mode, ESPnet3 forwards options directly to dask_jobqueue.SLURMCluster.

The same pattern works for pbs, sge, lsf, htcondor, moab, and oar. Only the scheduler-specific options change.

How to choose a backend

Use this rule of thumb:

Situation	Recommended backend
One laptop or CPU server	`local`
One GPU workstation	`local_gpu`
Managed cluster with SLURM/PBS/SGE/LSF/etc.	matching JobQueue backend
A few fixed machines without a scheduler	`ssh`

The key question is not "CPU or GPU?" It is "who launches the workers?"

Your machine launches them directly: local or local_gpu
The scheduler launches them: slurm, pbs, sge, lsf, ...

Where this block usually lives

In recipe configs, parallel is usually placed in one of these files:

training.yaml
inference.yaml
metrics.yaml
demo.yaml

Put it next to the stage or helper that will actually use it.

Examples:

inference runner config belongs in inference.yaml
a parallel metrics pipeline belongs in metrics.yaml
a custom data collection helper can keep its parallel block in training.yaml

Note

Not every stage reads parallel. This block matters only when the code path uses espnet3.parallel.

parallel vs. trainer parallelism

These two mechanisms solve different problems.

Use case	Config area
DDP / FSDP / multi-node training of the Lightning model	`trainer`
Parallel runner jobs, sharded inference, provider/runner workflows	`parallel`

If you are configuring Lightning itself, stay in trainer. If you are configuring BaseRunner, InferenceProvider, or Dask-backed utilities, use parallel.

Common mistakes

Mixing trainer settings and Dask settings

Do not put DDP or Lightning strategy settings under parallel.options. Those belong to trainer.

Using cluster-only options with env: local

Keys such as queue, account, walltime, and job_extra_directives are for JobQueue backends. They do not belong under local.

Forgetting extra packages

Different backends need different Python packages:

local: dask[distributed]
local_gpu: dask[distributed] + dask-cuda
slurm / pbs / sge / lsf / ...: dask[distributed] + dask-jobqueue
kube: dask[distributed] + dask-kubernetes

Reusing one config for every environment

It is often cleaner to keep small environment-specific overrides such as:

parallel_local.yaml
parallel_local_gpu.yaml
parallel_slurm.yaml

Then merge the one you need into the recipe config.

Useful references

Dask Cluster Deployment

Official Dask guide for local, HPC, cloud, and Kubernetes deployments.

Dask-Jobqueue

Official JobQueue docs for scheduler-backed clusters such as SLURM and PBS.

LocalCUDACluster

Official dask-cuda reference for one-worker-per-GPU local execution.

Config Overview

Return to the full config map across the pipeline.

Training Config

See where `parallel` fits inside training-oriented configs.

Inference Config

See where `parallel` fits inside inference-oriented configs.

Parallel Overview

Read the developer-facing overview of providers, runners, and clients.

Provider / Runner

See how custom parallel workloads are implemented in code.