espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_multiblank_betas_kernel

About 2 min

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_multiblank_betas_kernel

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.compute_multiblank_betas_kernel(acts: Tensor, denom: Tensor, sigma: float, betas: Tensor, llBackward: Tensor, xlen: Tensor, ylen: Tensor, mlabels: Tensor, minibatch: int, maxT: int, maxU: int, alphabet_size: int, blank_: int, big_blank_duration: Tensor, num_big_blanks: int)

Compute beta (backward variable) probabilities for multi-blank transducer loss.

This function computes the beta values, which are essential for calculating the backward probabilities in the multi-blank transducer model. The computation follows the principles outlined in the paper (https://arxiv.org/pdf/2211.03541), utilizing logit under-normalization.

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.acts

A tensor of shape [B, T, U, V + 1 + num-big-blanks] flattened, representing the logprobs activation tensor.

Type: torch.Tensor

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.denom

A tensor of shape [B, T, U] flattened, representing the denominator of the logprobs activation tensor across the entire vocabulary.

Type: torch.Tensor

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.sigma

Hyper-parameter for logit under-normalization technique for training multi-blank transducers.

Type: float

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.betas

A zero tensor of shape [B, T, U] which will be updated with the backward variable probabilities.

Type: torch.Tensor

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.llBackward

A zero tensor of shape [B] representing the log-likelihood of the backward pass, returned as the backward pass loss that is reduced by the optimizer.

Type: torch.Tensor

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.xlen

A vector of length B that contains the actual acoustic sequence lengths in the padded activation tensor.

Type: torch.Tensor

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.ylen

A vector of length B that contains the actual target sequence lengths in the padded activation tensor.

Type: torch.Tensor

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.mlabels

A matrix of shape [B, U+1] containing the padded target transcription that must be predicted.

Type: torch.Tensor

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.minibatch

An integer representing the batch size.

Type: int

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.maxT

The maximum possible acoustic sequence length.

Type: int

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.maxU

The maximum possible target sequence length.

Type: int

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.alphabet_size

The vocabulary dimension V+1 (inclusive of RNNT blank).

Type: int

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.blank_

The index of the RNNT standard blank token in the vocabulary.

Type: int

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.big_blank_duration

A vector of supported big blank durations of the model.

Type: torch.Tensor

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.num_big_blanks

The number of big blanks in the model.

Type: int

Updates: : This kernel in-place updates the following inputs:

betas: backward variable scores.
llBackward: log-likelihood of the backward variable.

Examples

>>> compute_multiblank_betas_kernel(acts, denom, sigma, betas, llBackward,
...                                   xlen, ylen, mlabels, minibatch,
...                                   maxT, maxU, alphabet_size, blank_,
...                                   big_blank_duration, num_big_blanks)

NOTE

This kernel must be launched with B blocks, where each block has U threads.