espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_betas_kernel
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_betas_kernel
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_betas_kernel(acts: Tensor, denom: Tensor, betas: Tensor, llBackward: Tensor, xlen: Tensor, ylen: Tensor, mlabels: Tensor, minibatch: int, maxT: int, maxU: int, alphabet_size: int, blank_: int)
Compute beta (backward variable) probabilities over the transduction step.
This kernel computes the backward variable probabilities (betas) for each sample in the minibatch during the transduction process. It leverages the log probabilities from the activation tensor and the denominator tensor to compute the betas, which are essential for backpropagation in RNN-T models.
- Parameters:
- acts – Tensor of shape [B, T, U, V+1] flattened. Represents the logprobs activation tensor.
- denom – Tensor of shape [B, T, U] flattened. Represents the denominator of the logprobs activation tensor across the entire vocabulary.
- betas – Zero tensor of shape [B, T, U]. Will be updated inside the kernel with the backward variable probabilities.
- llBackward – Zero tensor of shape [B]. Represents the log-likelihood of the backward pass. Returned as the backward pass loss that is reduced by the optimizer.
- xlen – Vector of length B which contains the actual acoustic sequence lengths in the padded activation tensor.
- ylen – Vector of length B which contains the actual target sequence lengths in the padded activation tensor.
- mlabels – Matrix of shape [B, U+1] (+1 here is due to <SOS> token
- usually the RNNT blank). The matrix contains the padded target transcription that must be predicted.
- minibatch – Int representing the batch size.
- maxT – The maximum possible acoustic sequence length. Represents T in the logprobs tensor.
- maxU – The maximum possible target sequence length. Represents U in the logprobs tensor.
- alphabet_size – The vocabulary dimension V+1 (inclusive of RNNT blank).
- blank – Index of the RNNT blank token in the vocabulary. Generally the first or last token in the vocab.
Updates: : Kernel in-place updates the following inputs:
- betas: backward variable scores.
- llBackward: log-likelihood of backward variable.
Examples
To compute betas in a typical usage scenario, one would first prepare the required tensors and then invoke the kernel:
``
`
python compute_betas_kernel(acts, denom, betas, llBackward, xlen, ylen,
mlabels, minibatch, maxT, maxU, alphabet_size,
blank_
)
``
`
NOTE
This kernel is intended to be executed on a CUDA-enabled GPU and requires appropriate configuration for thread and block sizes.