espnet2.train.distributed_utils.get_local_rank
Less than 1 minute
espnet2.train.distributed_utils.get_local_rank
espnet2.train.distributed_utils.get_local_rank(prior=None, launcher: str | None = None) → int | None
Get the local rank of the process.
The local rank corresponds to the GPU device ID when using distributed training. This function retrieves the local rank based on the provided prior value or environment variables. It is essential for configuring distributed training, particularly when multiple processes are launched across multiple GPUs.
- Parameters:
- prior (Optional *[*int ]) – The local rank to use if provided. If None, the function will check environment variables for the local rank.
- launcher (Optional *[*str ]) – The type of launcher used to start the process. Supported values include “slurm” and “mpi”. If None, it will only check environment variables.
- Returns: The local rank of the current process, or None if it cannot : be determined.
- Return type: Optional[int]
- Raises:
- RuntimeError – If the launcher is “slurm” and the process is not launched by ‘srun’.
- RuntimeError – If the launcher is “mpi” which is used for ‘multiprocessing-distributed’ mode.
- RuntimeError – If an unsupported launcher is specified.
Examples
Example usage in a distributed training setup
local_rank = get_local_rank() if local_rank is not None:
print(f”Local rank is: {local_rank}”)
else: : print(“Local rank could not be determined.”)
NOTE
The LOCAL_RANK environment variable should be set when using distributed training frameworks like PyTorch or when launching via SLURM.