Multi-GPU
Multi-GPU
If you come from plain PyTorch, multi-GPU code in ESPnet3 usually feels easier because you do not need to write the distributed orchestration yourself.
The main rule is:
Important
ESPnet3 has two different parallel layers.
- model training parallelism ->
trainer - runner/provider parallelism ->
parallel
Do not mix them up.
1. Multi-GPU training
If your goal is:
- train one model on multiple GPUs
- use DDP / FSDP / multi-node Lightning strategies
then the main config is trainer.
Typical example:
num_device: 4
num_nodes: 1
trainer:
accelerator: gpu
devices: ${num_device}
num_nodes: ${num_nodes}
strategy: ddpThat is already the most common multi-GPU training path.
If you know raw PyTorch, this replaces a lot of manual work such as:
- process launching
- DDP setup
- rank/world-size handling
- trainer loop wiring
Training Config
See trainer.devices, num_nodes, strategy, precision, and accelerator settings.
Trainer
See how ESPnet3 delegates training orchestration to Lightning.
Training Loop
See what replaces a hand-written PyTorch training loop.
2. Multi-GPU or multi-worker helper execution
If your goal is instead:
- parallel inference
- runner-based preprocessing
- provider/runner workloads
- Dask-backed helper execution
then the main config is parallel.
Typical local multi-GPU example:
parallel:
env: local_gpu
n_workers: 4This means:
- use one Dask worker per visible GPU
- keep the Python compute path the same
- let ESPnet3 handle the worker setup
Parallel Config
See local_gpu, n_workers, and backend options.
Provider and Runner
See how worker envs and repeated compute functions are defined.
Inference Provider
See the provider pattern used by parallel inference.
Why this is simpler than hand-written multiprocessing
In raw PyTorch code, people often write:
torch.multiprocessing- manual worker setup
- custom queue logic
- device assignment code
- ad-hoc cluster wrappers
In ESPnet3, you usually do not need to write that by hand.
For training:
- Lightning handles process orchestration
For runner/provider workloads:
- ESPnet3 handles env construction and worker execution
So the main job becomes config, not orchestration code.
A small training example
For multi-GPU training on one node:
trainer:
accelerator: gpu
devices: 4
strategy: ddpFor multi-node training:
trainer:
accelerator: gpu
devices: 8
num_nodes: 2
strategy: ddpThis is the part that replaces a lot of manual DDP launcher logic.
A small provider/runner example
Suppose you have a runner-based workload and want one worker per local GPU.
parallel:
env: local_gpu
n_workers: 2Then your code can stay conceptually simple:
provider = MyProvider(cfg)
runner = MyRunner(provider)
runner(range(len(dataset)))ESPnet3 handles:
- worker startup
- env setup on each worker
- dispatch of
forward(idx, **env)
So you do not need to write your own multi-process wrapper first.
Local CPU vs local GPU vs cluster
The three common patterns are:
CPU-only local helper execution
parallel:
env: local
n_workers: 8
options:
threads_per_worker: 1Multi-GPU local helper execution
parallel:
env: local_gpu
n_workers: 4HPC-backed helper execution
parallel:
env: slurm
n_workers: 16
options:
queue: gpu
cores: 4
memory: 32GB
walltime: "04:00:00"The Python code can stay the same across all three. That is one of the main benefits.
Cluster Migration
See what replaces nj, run.pl, queue.pl, and shell scheduler options.
Parallel Runtime
See how local and Dask execution share the same runner path.
Parallel Data Prep
See when helper work should use provider/runner execution.
A common confusion
This is the confusion most PyTorch users hit first:
trainer.devices
Use this for model training.
parallel.n_workers
Use this for runner/provider workloads.
These are related to parallelism, but they do not control the same mechanism.
Related pages
Training Config
See where multi-GPU training is configured through Lightning.
Parallel Config
See how `local`, `local_gpu`, and HPC backends are configured.
Provider / Runner
Read the execution contract behind runner-based parallel workloads.
Inference Provider
See the stage-facing example of dataset/model env construction.
Model and system
See how custom models and custom stage flows fit into recipes.
