espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.reduce_exp

Less than 1 minute

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.reduce_exp

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.reduce.reduce_exp(acts: Tensor, denom, rows: int, cols: int, minus: bool, stream)

Helper method to call the Warp Reduction Kernel to perform exp reduction.

This function facilitates the reduction of an input activation matrix by applying an exponential operation followed by an addition operation, utilizing CUDA for efficient parallel processing.

NOTE

Efficient warp occurs at input shapes of 2 ^ K.

References

Warp Primitives [https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/]

Parameters:
- acts – Flatened activation matrix of shape [B * T * U * (V+1)].
- denom – Flatened output matrix of shape [B * T * U * (V+1)]. Data will be overwritten with the reduction results.
- rows – Vocabulary size (including blank token) - V+1. Represents the number of threads per block.
- cols – Flattened shape of activation matrix, without vocabulary dimension (B * T * U). Represents number of blocks per grid.
- minus – Bool flag indicating whether to add or subtract as reduction. If minus is set to True, it calls the _reduce_minus kernel; otherwise, it calls the _reduce_rows kernel.
- stream – CUDA Stream for managing execution of kernels.
Returns: Returns True upon successful execution of the reduction.
Return type: bool

Examples

>>> acts = torch.randn((B, T, U, V+1)).flatten()
>>> denom = torch.zeros((B, T, U, V+1)).flatten()
>>> reduce_exp(acts, denom, V+1, B*T*U, False, stream)