espnet2.gan_tts.jets.alignments.viterbi_decode
Less than 1 minute
espnet2.gan_tts.jets.alignments.viterbi_decode
espnet2.gan_tts.jets.alignments.viterbi_decode(log_p_attn, text_lengths, feats_lengths)
Extract duration from an attention probability matrix.
This function computes the token durations from the given log probability attention matrix using the Viterbi algorithm. It extracts the most likely durations for each token based on the attention probabilities.
- Parameters:
- log_p_attn (Tensor) – Batched log probability of attention matrix (B, T_feats, T_text).
- text_lengths (Tensor) – Text length tensor (B,).
- feats_lengths (Tensor) – Feature length tensor (B,).
- Returns: Batched token duration extracted from log_p_attn (B, T_text). Tensor: Binarization loss tensor.
- Return type: Tensor
Examples
>>> log_p_attn = torch.randn(2, 5, 10) # Example attention matrix
>>> text_lengths = torch.tensor([10, 8]) # Lengths of each text
>>> feats_lengths = torch.tensor([5, 4]) # Lengths of each feature
>>> durations, loss = viterbi_decode(log_p_attn, text_lengths, feats_lengths)
>>> print(durations.shape) # Should print: torch.Size([2, 10])
NOTE
The Viterbi algorithm is used to find the most likely sequence of states in a hidden Markov model, which in this case relates to the alignment between text and features.