ESPnet3 Optimizer And Scheduler Configuration
About 1 min
ESPnet3 Optimizer And Scheduler Configuration
ESPnet3 wraps PyTorch Lightning so that optimizers and schedulers can be defined from YAML.
Current implementation:
espnet3.components.modeling.lightning_module.ESPnetLightningModule.configure_optimizersespnet3.components.modeling.optimization_spec
Two modes are supported:
- single optimizer path:
optimizer+scheduler - named multi-optimizer path:
optimizers+schedulers
What lives in configure_optimizers vs YAML
| Layer | You control via YAML | ESPnet3 ensures |
|---|---|---|
optimizer / optimizers | optimizer classes and hyperparameters | correct parameter grouping and uniqueness |
scheduler / schedulers | scheduler classes and decay settings | matching schedulers to optimizers |
| model parameter names | which parameters each optimizer sees | every trainable parameter is assigned exactly once |
1. Single optimizer path
Use optimizer and scheduler when the whole model shares one optimizer.
optimizer:
_target_: torch.optim.AdamW
lr: 0.001
weight_decay: 1.0e-2
scheduler:
_target_: torch.optim.lr_scheduler.CosineAnnealingLR
T_max: 100000
scheduler_interval: step
scheduler_monitor:Important points:
- use
torch.optim.*, nottorch.optimizer.* optimizeris instantiated with all trainable parametersscheduleris instantiated with that optimizerscheduler_intervalmust besteporepochscheduler_monitoris only needed for monitored epoch schedulers such asReduceLROnPlateau
Example with a monitored scheduler:
optimizer:
_target_: torch.optim.Adam
lr: 0.001
scheduler:
_target_: torch.optim.lr_scheduler.ReduceLROnPlateau
patience: 2
factor: 0.5
scheduler_interval: epoch
scheduler_monitor: valid/loss2. Named multi-optimizer path
Use optimizers and schedulers when different parameter groups need independent updates.
This is the normal path for GAN training and other cases where one shared loss and one shared optimizer are not enough.
Minimal shape:
optimizers:
generator:
optimizer:
_target_: torch.optim.Adam
lr: 0.0002
params: generator
accum_grad_steps: 1
step_every_n_iters: 1
gradient_clip_val: 1.0
gradient_clip_algorithm: norm
discriminator:
optimizer:
_target_: torch.optim.Adam
lr: 0.0002
params: discriminator
accum_grad_steps: 1
step_every_n_iters: 1
schedulers:
generator:
scheduler:
_target_: torch.optim.lr_scheduler.LinearLR
start_factor: 1.0
end_factor: 0.5
total_iters: 1000
interval: step
discriminator:
scheduler:
_target_: torch.optim.lr_scheduler.ReduceLROnPlateau
patience: 2
factor: 0.5
interval: epoch
monitor: valid/discriminator/lossGeneral rules:
- names under
optimizersandschedulersmust match exactly - every optimizer entry must include
paramsandoptimizer - top-level
scheduler_intervalandscheduler_monitorare not used here - all detailed per-optimizer routing and runtime rules are documented in Multiple optimizers and
OptimizationStep
What not to mix
Do not mix:
optimizerwithoptimizersschedulerwithschedulers
ESPnet3 rejects mixed configuration.
