espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.reduce_exp
Less than 1 minute
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.reduce_exp
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.reduce_exp(acts: Tensor, denom, rows: int, cols: int, minus: bool, stream)
Helper method to call the Warp Reduction Kernel to perform exp reduction.
This function facilitates the reduction of an input activation matrix by applying an exponential operation followed by an addition operation, utilizing CUDA for efficient parallel processing.
NOTE
Efficient warp occurs at input shapes of 2 ^ K.
References
- Warp Primitives [https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/]
- Parameters:
- acts – Flatened activation matrix of shape [B * T * U * (V+1)].
- denom – Flatened output matrix of shape [B * T * U * (V+1)]. Data will be overwritten with the reduction results.
- rows – Vocabulary size (including blank token) - V+1. Represents the number of threads per block.
- cols – Flattened shape of activation matrix, without vocabulary dimension (B * T * U). Represents number of blocks per grid.
- minus – Bool flag indicating whether to add or subtract as reduction. If minus is set to True, it calls the _reduce_minus kernel; otherwise, it calls the _reduce_rows kernel.
- stream – CUDA Stream for managing execution of kernels.
- Returns: Returns True upon successful execution of the reduction.
- Return type: bool
Examples
>>> acts = torch.randn((B, T, U, V+1)).flatten()
>>> denom = torch.zeros((B, T, U, V+1)).flatten()
>>> reduce_exp(acts, denom, V+1, B*T*U, False, stream)