espnet3.components.multiple_optim.MultipleOptim

About 2 min

espnet3.components.multiple_optim.MultipleOptim

class espnet3.components.multiple_optim.MultipleOptim(optimizers: Iterable[Optimizer])

Bases: Optimizer

Wrapper around multiple optimizers that should be stepped together at a single time. This is a hack to avoid PyTorch Lightning calling training_step once for each optimizer, which increases training time and is not always necessary.

Modified from the reply in a GitHub Issue thread here: https://github.com/Lightning-AI/lightning/issues/3346#issuecomment-1036063687

Parameters:optimizers (list of optimizers)

property defaults : Dict[str, Tensor]

Default hyper-parameters merged from all optimizers.

Returns: Combined defaults dictionary from each : optimizer.
Return type: Dict[str, torch.Tensor]

Example

>>> opt = MultipleOptim([torch.optim.SGD([], lr=0.1)])
>>> "lr" in opt.defaults
True

load_state_dict(state_dict: List[Dict[str, Tensor | List[Dict[str, Tensor | float | bool | Any]]]]) → None

Loads the optimizer state.

Parameters:state_dict (dict) – Optimizer state. Should be an object returned from a call to state_dict()

property param_groups : List[Dict[str, Tensor | float | bool | Any]]

Parameter groups across all wrapped optimizers.

Returns: List of : parameter group dictionaries, concatenated in the same order as self.optimizers.
Return type: List[Dict[str, Union[torch.Tensor, float, bool, Any]]]

Example

>>> opt = MultipleOptim([torch.optim.SGD([], lr=0.1)])
>>> isinstance(opt.param_groups, list)
True

property state : Dict[str, Tensor]

Combined state for every wrapped optimizer.

Returns: Flattened mapping that merges the : state dictionaries of each optimizer in self.optimizers.
Return type: Dict[Any, torch.Tensor]

Example

>>> import torch
>>> opt1 = torch.optim.SGD([torch.zeros(1, requires_grad=True)], lr=0.1)
>>> opt2 = torch.optim.Adam([torch.zeros(1, requires_grad=True)], lr=0.1)
>>> wrapper = MultipleOptim([opt1, opt2])
>>> isinstance(wrapper.state, dict)
True

state_dict() → List[Dict[str, Tensor | List[Dict[str, Tensor | float | bool | Any]]]]

Returns the state of the optimizer as a dictionary.

It contains two entries:

state - a dict holding current optimization state. : Its content differs between optimizer classes.
param_groups - a list containing all parameter groups : where each parameter group is a dict

step(closure: Callable[[], Tensor] = None) → Tensor

Performs a single optimization step (parameter update).

Parameters:closure (function) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Notes

Unless otherwise specified, this function should not modify the .grad field of the parameters.

zero_grad(set_to_none: bool = False) → None

Sets the gradients of all optimized

``

torch.Tensor``s to zero.

Parameters:set_to_none (bool) –
Instead of setting to zero, set the grads to None. This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example:
1. When the user tries to access a gradient and perform manual ops : on it, a None attribute or a torch.Tensor full of <br/>
```
``
```
  <br/> 0``s will behave differently.
2. If the user requests zero_grad(set_to_none=True) followed by : a backward pass, .grad``s are guaranteed to be ``None for params that did not receive a gradient.
3. torch.optim optimizers have a different behavior if the : gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).