espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_betas_kernel

Less than 1 minute

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_betas_kernel

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_betas_kernel(acts: Tensor, denom: Tensor, betas: Tensor, llBackward: Tensor, xlen: Tensor, ylen: Tensor, mlabels: Tensor, minibatch: int, maxT: int, maxU: int, alphabet_size: int, blank_: int)

Compute beta (backward variable) probabilities over the transduction step.

Parameters:
- acts – Tensor of shape [B, T, U, V+1] flattened. Represents the logprobs activation tensor.
- denom – Tensor of shape [B, T, U] flattened. Represents the denominator of the logprobs activation tensor across entire vocabulary.
- betas – Zero tensor of shape [B, T, U]. Will be updated inside the kernel with the backward variable probabilities.
- llBackward – Zero tensor of shape [B]. Represents the log-likelihood of the backward pass. Returned as the backward pass loss that is reduced by the optimizer.
- xlen – Vector of length B which contains the actual acoustic sequence lengths in the padded activation tensor.
- ylen – Vector of length B which contains the actual target sequence lengths in the padded activation tensor.
- mlabels – Matrix of shape [B, U+1] (+1 here is due to <SOS> token
  - usually the RNNT blank). The matrix contains the padded target transcription that must be predicted.
- minibatch – Int representing the batch size.
- maxT – The maximum possible acoustic sequence length. Represents T in the logprobs tensor.
- maxU – The maximum possible target sequence length. Represents U in the logprobs tensor.
- alphabet_size – The vocabulary dimension V+1 (inclusive of RNNT blank).
- blank – Index of the RNNT blank token in the vocabulary. Generally the first or last token in the vocab.

Updates: : Kernel inplace updates the following inputs:

betas: backward variable scores.
llBackward: log-likelihood of backward variable.