espnet2.gan_tts.jets.alignments.viterbi_decode

Less than 1 minute

espnet2.gan_tts.jets.alignments.viterbi_decode

espnet2.gan_tts.jets.alignments.viterbi_decode(log_p_attn, text_lengths, feats_lengths)

Extract duration from an attention probability matrix.

This function computes the token durations from the given log probability attention matrix using the Viterbi algorithm. It extracts the most likely durations for each token based on the attention probabilities.

Parameters:
- log_p_attn (Tensor) – Batched log probability of attention matrix (B, T_feats, T_text).
- text_lengths (Tensor) – Text length tensor (B,).
- feats_lengths (Tensor) – Feature length tensor (B,).
Returns: Batched token duration extracted from log_p_attn (B, T_text). Tensor: Binarization loss tensor.
Return type: Tensor

Examples

>>> log_p_attn = torch.randn(2, 5, 10)  # Example attention matrix
>>> text_lengths = torch.tensor([10, 8])  # Lengths of each text
>>> feats_lengths = torch.tensor([5, 4])  # Lengths of each feature
>>> durations, loss = viterbi_decode(log_p_attn, text_lengths, feats_lengths)
>>> print(durations.shape)  # Should print: torch.Size([2, 10])

NOTE

The Viterbi algorithm is used to find the most likely sequence of states in a hidden Markov model, which in this case relates to the alignment between text and features.