espnet2.schedulers.cosine_anneal_warmup_restart.CosineAnnealingWarmupRestarts

About 3 min

espnet2.schedulers.cosine_anneal_warmup_restart.CosineAnnealingWarmupRestarts

class espnet2.schedulers.cosine_anneal_warmup_restart.CosineAnnealingWarmupRestarts(optimizer: Optimizer, first_cycle_steps: int, cycle_mult: float = 1.0, max_lr: float = 0.1, min_lr: float = 0.001, warmup_steps: int = 0, gamma: float = 1.0, last_epoch: int = -1)

Bases: _LRScheduler, AbsBatchStepScheduler

Cosine Annealing Warmup Restart.

This scheduler implements the cosine annealing learning rate schedule with warmup and restarts, allowing for dynamic adjustment of the learning rate during training. It is similar to the official PyTorch CosineAnnealWarmRestarts, but includes a linear warmup phase and the ability to scale the maximum learning rate for each restart.

optimizer

Wrapped optimizer.

Type: Optimizer

first_cycle_steps

First cycle step size.

Type: int

cycle_mult

Cycle steps magnification. Default: 1.0.

Type: float

max_lr

First cycle’s maximum learning rate. Default: 0.1.

Type: float

min_lr

Minimum learning rate. Default: 0.001.

Type: float

warmup_steps

Linear warmup step size. Default: 0.

Type: int

gamma

Decrease rate of maximum learning rate by cycle. Default: 1.0.

Type: float

last_epoch

The index of last epoch. Default: -1.

Type: int
Parameters:
- optimizer (torch.optim.Optimizer) – The optimizer for which to adjust the learning rate.
- first_cycle_steps (int) – Number of steps in the first cycle.
- cycle_mult (float , optional) – Factor to increase the cycle length. Default: 1.0.
- max_lr (float , optional) – Maximum learning rate for the first cycle. Default: 0.1.
- min_lr (float , optional) – Minimum learning rate. Default: 0.001.
- warmup_steps (int , optional) – Number of steps for the warmup phase. Default: 0.
- gamma (float , optional) – Factor to decrease the max learning rate each cycle. Default: 1.0.
- last_epoch (int , optional) – The index of the last epoch. Default: -1.

########### Examples

>>> from torch.optim import Adam
>>> optimizer = Adam(model.parameters(), lr=0.1)
>>> scheduler = CosineAnnealingWarmupRestarts(optimizer,
...     first_cycle_steps=2000, warmup_steps=500, cycle_mult=1.0,
...     max_lr=0.1, min_lr=0.001)
>>> for epoch in range(10000):
...     train(...)
...     scheduler.step(epoch)

NOTE

The learning rate will linearly increase from min_lr to max_lr during the warmup phase, then follow a cosine decay pattern until the end of the cycle.

get_lr()

Get the learning rate for the current step in the cycle.

This method calculates the learning rate based on the current step in the cycle, applying a linear warmup for the initial steps and a cosine decay for the remaining steps of the cycle.

base_lrs

The base learning rates for each parameter group.

Type: list
Returns: A list containing the learning rates for each parameter group.
Return type: list

########### Examples

>>> scheduler = CosineAnnealingWarmupRestarts(optimizer, 10, warmup_steps=5)
>>> scheduler.get_lr()
[0.001, 0.001]  # Example output with min_lr = 0.001

>>> for epoch in range(15):
...     scheduler.step(epoch)
...     print(scheduler.get_lr())
[0.1, 0.1]  # Learning rate after warmup and during decay

init_lr()

Initializes the learning rates for all parameter groups in the optimizer

to the minimum learning rate. This method is called during the initialization of the CosineAnnealingWarmupRestarts class to set the base learning rates before any training steps are performed.

This function modifies the learning rate of each parameter group in the optimizer to ensure that they all start at the specified minimum learning rate.

base_lrs

A list to store the base learning rates for each parameter group.

Type: list

########### Examples

>>> optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
>>> scheduler = CosineAnnealingWarmupRestarts(optimizer,
...                                             first_cycle_steps=10,
...                                             warmup_steps=5)
>>> scheduler.init_lr()
>>> for param_group in optimizer.param_groups:
...     print(param_group['lr'])  # Outputs: 0.001

step(epoch=None)

Steps the learning rate according to the cosine annealing schedule with

warmup restarts. This method updates the learning rate of the optimizer based on the current epoch or step within the cycle. It supports both warmup steps and the cosine decay after the warmup period.

Parameters:epoch (int , optional) – The current epoch. If None, it will use the last epoch plus one.

########### Examples

>>> scheduler = CosineAnnealingWarmupRestarts(optimizer,
...     first_cycle_steps=10, warmup_steps=5)
>>> for epoch in range(30):
...     scheduler.step(epoch)
...     print(f"Epoch {epoch}: Learning rate: {scheduler.get_lr()}")

NOTE

This method must be called at the beginning of each epoch to update the learning rate based on the current cycle and step.