espnet2.train.distributed_utils.get_node_rank

Less than 1 minute

espnet2.train.distributed_utils.get_node_rank

espnet2.train.distributed_utils.get_node_rank(prior=None, launcher: str | None = None) → int | None

Get Node Rank.

This function is used for “multiprocessing distributed” mode. The initial RANK equals the Node ID in this case, and the real Rank is set as (nGPU * NodeID) + LOCAL_RANK in torch.distributed.

Parameters:
- prior (Optional *[*int ]) – The prior rank to return if provided.
- launcher (Optional *[*str ]) – The launcher type, e.g., “slurm” or “mpi”.
Returns: The node rank or None if not determined.
Return type: Optional[int]
Raises:
- RuntimeError – If not launched by ‘srun’ when using ‘slurm’ launcher.
- RuntimeError – If ntasks_per_node does not equal SLURM_NTASKS.
- RuntimeError – If an unsupported launcher is specified.

Examples

>>> get_node_rank()  # Assuming proper environment variables are set
0  # returns the node rank for the current process.

NOTE

This function assumes that ntasks_per_node is 1. If this assumption is violated, the behavior may be undefined.