espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_multiblank_betas_kernel
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_multiblank_betas_kernel
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_multiblank_betas_kernel(acts: Tensor, denom: Tensor, sigma: float, betas: Tensor, llBackward: Tensor, xlen: Tensor, ylen: Tensor, mlabels: Tensor, minibatch: int, maxT: int, maxU: int, alphabet_size: int, blank_: int, big_blank_duration: Tensor, num_big_blanks: int)
Compute beta (backward variable) probabilities for multi-blank transducer loss.
This function computes the beta values, which are essential for calculating the backward probabilities in the multi-blank transducer model. The computation follows the principles outlined in the paper (https://arxiv.org/pdf/2211.03541), utilizing logit under-normalization.
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.acts
A tensor of shape [B, T, U, V + 1 + num-big-blanks] flattened, representing the logprobs activation tensor.
- Type: torch.Tensor
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.denom
A tensor of shape [B, T, U] flattened, representing the denominator of the logprobs activation tensor across the entire vocabulary.
- Type: torch.Tensor
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.sigma
Hyper-parameter for logit under-normalization technique for training multi-blank transducers.
- Type: float
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.betas
A zero tensor of shape [B, T, U] which will be updated with the backward variable probabilities.
- Type: torch.Tensor
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.llBackward
A zero tensor of shape [B] representing the log-likelihood of the backward pass, returned as the backward pass loss that is reduced by the optimizer.
- Type: torch.Tensor
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.xlen
A vector of length B that contains the actual acoustic sequence lengths in the padded activation tensor.
- Type: torch.Tensor
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.ylen
A vector of length B that contains the actual target sequence lengths in the padded activation tensor.
- Type: torch.Tensor
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.mlabels
A matrix of shape [B, U+1] containing the padded target transcription that must be predicted.
- Type: torch.Tensor
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.minibatch
An integer representing the batch size.
- Type: int
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.maxT
The maximum possible acoustic sequence length.
- Type: int
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.maxU
The maximum possible target sequence length.
- Type: int
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.alphabet_size
The vocabulary dimension V+1 (inclusive of RNNT blank).
- Type: int
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.blank_
The index of the RNNT standard blank token in the vocabulary.
- Type: int
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.big_blank_duration
A vector of supported big blank durations of the model.
- Type: torch.Tensor
espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.num_big_blanks
The number of big blanks in the model.
- Type: int
Updates: : This kernel in-place updates the following inputs:
- betas: backward variable scores.
- llBackward: log-likelihood of the backward variable.
Examples
>>> compute_multiblank_betas_kernel(acts, denom, sigma, betas, llBackward,
... xlen, ylen, mlabels, minibatch,
... maxT, maxU, alphabet_size, blank_,
... big_blank_duration, num_big_blanks)
NOTE
This kernel must be launched with B blocks, where each block has U threads.