ESPnet3 Optimizer And Scheduler Configuration
ESPnet3 Optimizer And Scheduler Configuration
ESPnet3 wraps PyTorch Lightning so that optimizers and schedulers can be defined from YAML.
Current implementation:
espnet3.components.modeling.lightning_module.ESPnetLightningModule.configure_optimizersespnet3.components.modeling.optimization_spec
Two modes are supported:
- single optimizer path:
optimizer+scheduler - named multi-optimizer path:
optimizers+schedulers
What lives in configure_optimizers vs YAML
| Layer | You control via YAML | ESPnet3 ensures |
|---|---|---|
optimizer / optimizers | optimizer classes and hyperparameters | correct parameter grouping and uniqueness |
scheduler / schedulers | scheduler classes and decay settings | matching schedulers to optimizers |
| model parameter names | which parameters each optimizer sees | every trainable parameter is assigned exactly once |
1. Single optimizer path
Use optimizer and scheduler when the whole model shares one optimizer.
optimizer:
_target_: torch.optim.AdamW
lr: 0.001
weight_decay: 1.0e-2
scheduler:
_target_: torch.optim.lr_scheduler.CosineAnnealingLR
T_max: 100000
scheduler_interval: step
scheduler_monitor:Important points:
- use
torch.optim.*, nottorch.optimizer.* optimizeris instantiated with all trainable parametersscheduleris instantiated with that optimizerscheduler_intervalmust besteporepochscheduler_monitoris only needed for monitored epoch schedulers such asReduceLROnPlateau
Example with a monitored scheduler:
optimizer:
_target_: torch.optim.Adam
lr: 0.001
scheduler:
_target_: torch.optim.lr_scheduler.ReduceLROnPlateau
patience: 2
factor: 0.5
scheduler_interval: epoch
scheduler_monitor: valid/loss2. Named multi-optimizer path
Use optimizers and schedulers when different parameter groups need independent updates.
This is the normal path for GAN training.
optimizers:
generator:
optimizer:
_target_: torch.optim.Adam
lr: 0.0002
params: generator
accum_grad_steps: 1
step_every_n_iters: 1
gradient_clip_val: 1.0
gradient_clip_algorithm: norm
discriminator:
optimizer:
_target_: torch.optim.Adam
lr: 0.0002
params: discriminator
accum_grad_steps: 1
step_every_n_iters: 1
schedulers:
generator:
scheduler:
_target_: torch.optim.lr_scheduler.LinearLR
start_factor: 1.0
end_factor: 0.5
total_iters: 1000
interval: step
discriminator:
scheduler:
_target_: torch.optim.lr_scheduler.ReduceLROnPlateau
patience: 2
factor: 0.5
interval: epoch
monitor: valid/discriminator/lossImportant rules:
- names under
optimizersandschedulersmust match exactly - every optimizer entry must include
paramsandoptimizer - every trainable parameter must match exactly one optimizer
- top-level
scheduler_intervalandscheduler_monitorare not used here - per-optimizer grad settings live under
optimizers.<name>
Parameter routing
params is a dot-boundary-aware selector over parameter names.
That means the YAML decides which part of the model each optimizer updates.
If parameters are:
- missing from all optimizers
- or matched by more than one optimizer
ESPnet3 raises an error.
Per-optimizer grad controls
Each named optimizer may define:
accum_grad_stepsstep_every_n_itersgradient_clip_valgradient_clip_algorithm
These are enforced by ESPnet3's manual optimization path, not by Lightning's global trainer settings.
Scheduler stepping rules
Single optimizer path:
scheduler_interval: step|epoch- optional
scheduler_monitor
Named multi-optimizer path:
schedulers.<name>.interval: step|epochschedulers.<name>.monitor
Step-based schedulers are stepped immediately after that optimizer updates. Epoch-based schedulers are stepped at epoch end.
Model-side contract
Single optimizer path expects the model to return a plain tensor loss.
Named multi-optimizer path expects:
OptimizationStep- or
list[OptimizationStep]
This is how ESPnet3 knows which optimizer should update.
See Multiple optimizers and schedulers for the full model-side contract.
What not to mix
Do not mix:
optimizerwithoptimizersschedulerwithschedulers
ESPnet3 rejects mixed configuration.
