espnet3.components.multiple_optim.MultipleOptim
espnet3.components.multiple_optim.MultipleOptim
class espnet3.components.multiple_optim.MultipleOptim(optimizers: Iterable[Optimizer])
Bases: Optimizer
Wrapper around multiple optimizers that should be stepped together at a single time. This is a hack to avoid PyTorch Lightning calling training_step once for each optimizer, which increases training time and is not always necessary.
Modified from the reply in a GitHub Issue thread here: https://github.com/Lightning-AI/lightning/issues/3346#issuecomment-1036063687
- Parameters:optimizers (list of optimizers)
property defaults : Dict[str, Tensor]
Default hyper-parameters merged from all optimizers.
- Returns: Combined defaults dictionary from each : optimizer.
- Return type: Dict[str, torch.Tensor]
Example
>>> opt = MultipleOptim([torch.optim.SGD([], lr=0.1)])
>>> "lr" in opt.defaults
Trueload_state_dict(state_dict: List[Dict[str, Tensor | List[Dict[str, Tensor | float | bool | Any]]]]) β None
Loads the optimizer state.
- Parameters:state_dict (dict) β Optimizer state. Should be an object returned from a call to
state_dict()
property param_groups : List[Dict[str, Tensor | float | bool | Any]]
Parameter groups across all wrapped optimizers.
- Returns: List of : parameter group dictionaries, concatenated in the same order as
self.optimizers. - Return type: List[Dict[str, Union[torch.Tensor, float, bool, Any]]]
Example
>>> opt = MultipleOptim([torch.optim.SGD([], lr=0.1)])
>>> isinstance(opt.param_groups, list)
Trueproperty state : Dict[str, Tensor]
Combined state for every wrapped optimizer.
- Returns: Flattened mapping that merges the :
statedictionaries of each optimizer inself.optimizers. - Return type: Dict[Any, torch.Tensor]
Example
>>> import torch
>>> opt1 = torch.optim.SGD([torch.zeros(1, requires_grad=True)], lr=0.1)
>>> opt2 = torch.optim.Adam([torch.zeros(1, requires_grad=True)], lr=0.1)
>>> wrapper = MultipleOptim([opt1, opt2])
>>> isinstance(wrapper.state, dict)
Truestate_dict() β List[Dict[str, Tensor | List[Dict[str, Tensor | float | bool | Any]]]]
Returns the state of the optimizer as a dictionary.
It contains two entries:
state- a dict holding current optimization state. : Its content differs between optimizer classes.param_groups- a list containing all parameter groups : where each parameter group is a dict
step(closure: Callable[[], Tensor] = None) β Tensor
Performs a single optimization step (parameter update).
- Parameters:closure (function) β A closure that reevaluates the model and returns the loss. Optional for most optimizers.
Notes
Unless otherwise specified, this function should not modify the .grad field of the parameters.
zero_grad(set_to_none: bool = False) β None
Sets the gradients of all optimized
``torch.Tensor``s to zero.
Parameters:set_to_none (bool) β
Instead of setting to zero, set the grads to
None. This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example:- When the user tries to access a gradient and perform manual ops : on it, a
Noneattribute or atorch.Tensorfull of <br/><br/> 0``s will behave differently.`` - If the user requests
zero_grad(set_to_none=True)followed by : a backward pass,.grad``s are guaranteed to be ``Nonefor params that did not receive a gradient. torch.optimoptimizers have a different behavior if the : gradient is0orNone(in one case it does the step with a gradient of0and in the other it skips the step altogether).
- When the user tries to access a gradient and perform manual ops : on it, a
