espnet2.schedulers.cosine_anneal_warmup_restart.CosineAnnealingWarmupRestarts
espnet2.schedulers.cosine_anneal_warmup_restart.CosineAnnealingWarmupRestarts
class espnet2.schedulers.cosine_anneal_warmup_restart.CosineAnnealingWarmupRestarts(optimizer: Optimizer, first_cycle_steps: int, cycle_mult: float = 1.0, max_lr: float = 0.1, min_lr: float = 0.001, warmup_steps: int = 0, gamma: float = 1.0, last_epoch: int = -1)
Bases: _LRScheduler
, AbsBatchStepScheduler
Cosine Annealing Warmup Restart.
This scheduler implements the cosine annealing learning rate schedule with warmup and restarts, allowing for dynamic adjustment of the learning rate during training. It is similar to the official PyTorch CosineAnnealWarmRestarts, but includes a linear warmup phase and the ability to scale the maximum learning rate for each restart.
optimizer
Wrapped optimizer.
- Type: Optimizer
first_cycle_steps
First cycle step size.
- Type: int
cycle_mult
Cycle steps magnification. Default: 1.0.
- Type: float
max_lr
First cycle’s maximum learning rate. Default: 0.1.
- Type: float
min_lr
Minimum learning rate. Default: 0.001.
- Type: float
warmup_steps
Linear warmup step size. Default: 0.
- Type: int
gamma
Decrease rate of maximum learning rate by cycle. Default: 1.0.
- Type: float
last_epoch
The index of last epoch. Default: -1.
Type: int
Parameters:
- optimizer (torch.optim.Optimizer) – The optimizer for which to adjust the learning rate.
- first_cycle_steps (int) – Number of steps in the first cycle.
- cycle_mult (float , optional) – Factor to increase the cycle length. Default: 1.0.
- max_lr (float , optional) – Maximum learning rate for the first cycle. Default: 0.1.
- min_lr (float , optional) – Minimum learning rate. Default: 0.001.
- warmup_steps (int , optional) – Number of steps for the warmup phase. Default: 0.
- gamma (float , optional) – Factor to decrease the max learning rate each cycle. Default: 1.0.
- last_epoch (int , optional) – The index of the last epoch. Default: -1.
########### Examples
>>> from torch.optim import Adam
>>> optimizer = Adam(model.parameters(), lr=0.1)
>>> scheduler = CosineAnnealingWarmupRestarts(optimizer,
... first_cycle_steps=2000, warmup_steps=500, cycle_mult=1.0,
... max_lr=0.1, min_lr=0.001)
>>> for epoch in range(10000):
... train(...)
... scheduler.step(epoch)
NOTE
The learning rate will linearly increase from min_lr to max_lr during the warmup phase, then follow a cosine decay pattern until the end of the cycle.
get_lr()
Get the learning rate for the current step in the cycle.
This method calculates the learning rate based on the current step in the cycle, applying a linear warmup for the initial steps and a cosine decay for the remaining steps of the cycle.
base_lrs
The base learning rates for each parameter group.
Type: list
Returns: A list containing the learning rates for each parameter group.
Return type: list
########### Examples
>>> scheduler = CosineAnnealingWarmupRestarts(optimizer, 10, warmup_steps=5)
>>> scheduler.get_lr()
[0.001, 0.001] # Example output with min_lr = 0.001
>>> for epoch in range(15):
... scheduler.step(epoch)
... print(scheduler.get_lr())
[0.1, 0.1] # Learning rate after warmup and during decay
init_lr()
Initializes the learning rates for all parameter groups in the optimizer
to the minimum learning rate. This method is called during the initialization of the CosineAnnealingWarmupRestarts class to set the base learning rates before any training steps are performed.
This function modifies the learning rate of each parameter group in the optimizer to ensure that they all start at the specified minimum learning rate.
base_lrs
A list to store the base learning rates for each parameter group.
- Type: list
########### Examples
>>> optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
>>> scheduler = CosineAnnealingWarmupRestarts(optimizer,
... first_cycle_steps=10,
... warmup_steps=5)
>>> scheduler.init_lr()
>>> for param_group in optimizer.param_groups:
... print(param_group['lr']) # Outputs: 0.001
step(epoch=None)
Steps the learning rate according to the cosine annealing schedule with
warmup restarts. This method updates the learning rate of the optimizer based on the current epoch or step within the cycle. It supports both warmup steps and the cosine decay after the warmup period.
- Parameters:epoch (int , optional) – The current epoch. If None, it will use the last epoch plus one.
########### Examples
>>> scheduler = CosineAnnealingWarmupRestarts(optimizer,
... first_cycle_steps=10, warmup_steps=5)
>>> for epoch in range(30):
... scheduler.step(epoch)
... print(f"Epoch {epoch}: Learning rate: {scheduler.get_lr()}")
NOTE
This method must be called at the beginning of each epoch to update the learning rate based on the current cycle and step.