ESPnet3 Parallel Configuration
ESPnet3 Parallel Configuration
This page explains the parallel block used by Dask-based execution in ESPnet3.
Important
parallel is separate from Lightning training parallelism. Use trainer.devices, trainer.num_nodes, and trainer.strategy for model training. Use parallel for runner-based work built on espnet3.parallel.
Basic shape
Most configs use this shape:
parallel:
env: local
n_workers: 4
options: {}The three main fields are:
| Field | Meaning |
|---|---|
env | Which Dask backend to use |
n_workers | How many workers or cluster jobs to request |
options | Backend-specific arguments forwarded to the Dask cluster class |
ESPnet3 reads this block through espnet3.parallel.parallel.set_parallel() and creates a client with build_client().
What env values are supported
These backends are implemented in espnet3/parallel/parallel.py.
env | Backend | Typical use |
|---|---|---|
local | dask.distributed.LocalCluster | CPU work on one machine |
local_gpu | dask_cuda.LocalCUDACluster | Multi-GPU work on one machine |
slurm | dask_jobqueue.SLURMCluster | SLURM-based HPC |
pbs | dask_jobqueue.PBSCluster | PBS-based HPC |
sge | dask_jobqueue.SGECluster | SGE-based HPC |
lsf | dask_jobqueue.LSFCluster | LSF-based HPC |
htcondor | dask_jobqueue.HTCondorCluster | HTCondor-based HPC |
moab | dask_jobqueue.MoabCluster | Moab-based HPC |
oar | dask_jobqueue.OARCluster | OAR-based HPC |
ssh | dask.distributed.SSHCluster | A small set of reachable machines |
kube | dask_kubernetes.KubeCluster | Kubernetes |
Note
If no parallel config is registered, ESPnet3 falls back to local with n_workers: 1.
Local CPU example
Use local when you want a simple multi-process setup on one machine.
parallel:
env: local
n_workers: 4
options:
threads_per_worker: 1This is a good default for:
- local development
- CPU-heavy preprocessing
- debugging provider / runner logic before moving to a cluster
Local GPU example
Use local_gpu when one worker should run on each local GPU.
parallel:
env: local_gpu
n_workers: 2
options: {}ESPnet3 uses dask_cuda.LocalCUDACluster for this backend.
Warning
For local_gpu, n_workers must not exceed the number of visible GPUs. ESPnet3 raises an error if it does.
This is a good fit for:
- a single multi-GPU workstation
- batched inference on one node
- debugging GPU-bound runner code before moving to HPC
HPC example: SLURM
Use a JobQueue backend when workers should be launched through the scheduler.
parallel:
env: slurm
n_workers: 4
options:
queue: gpu
account: my-lab
cores: 8
processes: 1
memory: 32GB
walltime: "04:00:00"
interface: ib0
log_directory: logs/dask
job_script_prologue:
- "module load cuda"
- "source .pixi/envs/default/bin/activate"
job_extra_directives:
- "--gres=gpu:1"In this mode, ESPnet3 forwards options directly to dask_jobqueue.SLURMCluster.
The same pattern works for pbs, sge, lsf, htcondor, moab, and oar. Only the scheduler-specific options change.
How to choose a backend
Use this rule of thumb:
| Situation | Recommended backend |
|---|---|
| One laptop or CPU server | local |
| One GPU workstation | local_gpu |
| Managed cluster with SLURM/PBS/SGE/LSF/etc. | matching JobQueue backend |
| A few fixed machines without a scheduler | ssh |
The key question is not "CPU or GPU?" It is "who launches the workers?"
- Your machine launches them directly:
localorlocal_gpu - The scheduler launches them:
slurm,pbs,sge,lsf, ...
Where this block usually lives
In recipe configs, parallel is usually placed in one of these files:
training.yamlinference.yamlmetrics.yamldemo.yaml
Put it next to the stage or helper that will actually use it.
Examples:
- inference runner config belongs in
inference.yaml - a parallel metrics pipeline belongs in
metrics.yaml - a custom data collection helper can keep its
parallelblock intraining.yaml
Note
Not every stage reads parallel. This block matters only when the code path uses espnet3.parallel.
parallel vs. trainer parallelism
These two mechanisms solve different problems.
| Use case | Config area |
|---|---|
| DDP / FSDP / multi-node training of the Lightning model | trainer |
| Parallel runner jobs, sharded inference, provider/runner workflows | parallel |
If you are configuring Lightning itself, stay in trainer. If you are configuring BaseRunner, InferenceProvider, or Dask-backed utilities, use parallel.
Common mistakes
Mixing trainer settings and Dask settings
Do not put DDP or Lightning strategy settings under parallel.options. Those belong to trainer.
Using cluster-only options with env: local
Keys such as queue, account, walltime, and job_extra_directives are for JobQueue backends. They do not belong under local.
Forgetting extra packages
Different backends need different Python packages:
local:dask[distributed]local_gpu:dask[distributed]+dask-cudaslurm/pbs/sge/lsf/ ...:dask[distributed]+dask-jobqueuekube:dask[distributed]+dask-kubernetes
Reusing one config for every environment
It is often cleaner to keep small environment-specific overrides such as:
parallel_local.yamlparallel_local_gpu.yamlparallel_slurm.yaml
Then merge the one you need into the recipe config.
Useful references
Dask Cluster Deployment
Official Dask guide for local, HPC, cloud, and Kubernetes deployments.
Dask-Jobqueue
Official JobQueue docs for scheduler-backed clusters such as SLURM and PBS.
LocalCUDACluster
Official dask-cuda reference for one-worker-per-GPU local execution.
Related pages
Config Overview
Return to the full config map across the pipeline.
Training Config
See where `parallel` fits inside training-oriented configs.
Inference Config
See where `parallel` fits inside inference-oriented configs.
Parallel Overview
Read the developer-facing overview of providers, runners, and clients.
Provider / Runner
See how custom parallel workloads are implemented in code.
