espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.logp

Less than 1 minute

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.logp

espnet2.asr.transducer.rnnt_multi_blank.utils.cuda_utils.gpu_rnnt_kernel.logp(denom: Tensor, acts: Tensor, maxT: int, maxU: int, alphabet_size: int, mb: int, t: int, u: int, v: int)

Compute the sum of log probability from the activation tensor and its denominator.

This function calculates the log probability for a given batch, acoustic timestep, target timestep, and vocabulary token index. It sums the log probabilities of the activations with the corresponding denominator for normalization, returning the result.

Parameters:
- denom – Tensor of shape [B, T, U] flattened. Represents the denominator of the log probabilities activation tensor across the entire vocabulary.
- acts – Tensor of shape [B, T, U, V+1] flattened. Represents the log probabilities activation tensor.
- maxT – The maximum possible acoustic sequence length. Represents T in the log probabilities tensor.
- maxU – The maximum possible target sequence length. Represents U in the log probabilities tensor.
- alphabet_size – The vocabulary dimension V+1 (inclusive of RNNT blank).
- mb – Batch indexer.
- t – Acoustic sequence timestep indexer.
- u – Target sequence timestep indexer.
- v – Vocabulary token indexer.
Returns: logprobs[mb, t, u, v] + denom[mb, t, u].
Return type: The sum of log probabilities at the specified indices, calculated as

Examples

>>> denom = torch.tensor([[[1.0, 2.0], [3.0, 4.0]]])
>>> acts = torch.tensor([[[[0.5, 0.5], [0.3, 0.7]], [[0.6, 0.4], [0.2, 0.8]]]])
>>> result = logp(denom, acts, maxT=2, maxU=2, alphabet_size=2, mb=0, t=0, u=0, v=1)
>>> print(result)  # Output: the computed log probability value.

NOTE

This function is intended to be used in CUDA kernels and is decorated with @cuda.jit for JIT compilation. It is optimized for performance in GPU environments.