espnet2.train.distributed_utils.get_local_rank

Less than 1 minute

espnet2.train.distributed_utils.get_local_rank

espnet2.train.distributed_utils.get_local_rank(prior=None, launcher: str | None = None) → int | None

Get the local rank of the process.

The local rank corresponds to the GPU device ID when using distributed training. This function retrieves the local rank based on the provided prior value or environment variables. It is essential for configuring distributed training, particularly when multiple processes are launched across multiple GPUs.

Parameters:
- prior (Optional *[*int ]) – The local rank to use if provided. If None, the function will check environment variables for the local rank.
- launcher (Optional *[*str ]) – The type of launcher used to start the process. Supported values include “slurm” and “mpi”. If None, it will only check environment variables.
Returns: The local rank of the current process, or None if it cannot : be determined.
Return type: Optional[int]
Raises:
- RuntimeError – If the launcher is “slurm” and the process is not launched by ‘srun’.
- RuntimeError – If the launcher is “mpi” which is used for ‘multiprocessing-distributed’ mode.
- RuntimeError – If an unsupported launcher is specified.

Examples

Example usage in a distributed training setup

local_rank = get_local_rank() if local_rank is not None:

print(f”Local rank is: {local_rank}”)

else: : print(“Local rank could not be determined.”)

NOTE

The LOCAL_RANK environment variable should be set when using distributed training frameworks like PyTorch or when launching via SLURM.