espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_alphas_kernel
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_alphas_kernel
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_alphas_kernel(acts: Tensor, denom: Tensor, alphas: Tensor, llForward: Tensor, xlen: Tensor, ylen: Tensor, mlabels: Tensor, minibatch: int, maxT: int, maxU: int, alphabet_size: int, blank_: int)
Compute alpha (forward variable) probabilities over the transduction step.
This kernel computes the forward probabilities for the RNNT (Recurrent Neural Network Transducer) model. It updates the alphas tensor, which holds the forward variable scores, and the llForward tensor, which holds the log-likelihood of the forward pass.
- Parameters:
- acts – Tensor of shape [B, T, U, V+1] flattened. Represents the logprobs activation tensor.
- denom – Tensor of shape [B, T, U] flattened. Represents the denominator of the logprobs activation tensor across the entire vocabulary.
- alphas – Zero tensor of shape [B, T, U]. Will be updated inside the kernel with the forward variable probabilities.
- llForward – Zero tensor of shape [B]. Represents the log-likelihood of the forward pass. Returned as the forward pass loss that is reduced by the optimizer.
- xlen – Vector of length B which contains the actual acoustic sequence lengths in the padded activation tensor.
- ylen – Vector of length B which contains the actual target sequence lengths in the padded activation tensor.
- mlabels – Matrix of shape [B, U+1] (+1 here is due to <SOS> token
- usually the RNNT blank). The matrix contains the padded target transcription that must be predicted.
- minibatch – Int representing the batch size.
- maxT – The maximum possible acoustic sequence length. Represents T in the logprobs tensor.
- maxU – The maximum possible target sequence length. Represents U in the logprobs tensor.
- alphabet_size – The vocabulary dimension V+1 (inclusive of RNNT blank).
- blank – Index of the RNNT blank token in the vocabulary. Generally the first or last token in the vocab.
Updates: : Kernel inplace updates the following inputs:
- alphas: forward variable scores.
- llForward: log-likelihood of forward variable.
Examples
To compute forward variables for a batch of transcriptions, you can call the kernel as follows:
``
`
python compute_alphas_kernel[blocks_per_grid, threads_per_block](
acts, denom, alphas, llForward, xlen, ylen, mlabels, minibatch, maxT, maxU, alphabet_size,
blank_
)
NOTE
This function should be called from a CUDA kernel context, and proper grid and block dimensions should be set based on the input tensor sizes.