espnet2.svs.feats_extract.score_feats_extract.expand_to_frame
Less than 1 minute
espnet2.svs.feats_extract.score_feats_extract.expand_to_frame
espnet2.svs.feats_extract.score_feats_extract.expand_to_frame(expand_len, len_size, label, midi, duration)
Expand the phone-level features to frame-level features.
This function takes the expansion lengths for each phone and replicates the corresponding labels, midi, and duration features to create a frame-level representation. It returns the expanded sequences along with their lengths.
- Parameters:
- expand_len (List *[*List *[*int ] ]) – A list of lists containing the number of frames each phone should be expanded to for each sample in the batch.
- len_size (List *[*int ]) – A list containing the sizes of the phone sequences for each sample in the batch.
- label (torch.Tensor) – A tensor of shape (Batch, Max_Phone_Length) containing the phone labels.
- midi (torch.Tensor) – A tensor of shape (Batch, Max_Phone_Length) containing the midi values corresponding to the phones.
- duration (torch.Tensor) – A tensor of shape (Batch, Max_Phone_Length) containing the duration values corresponding to the phones.
- Returns: Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]: A tuple containing:
- Expanded label tensor of shape (Batch, Expanded_Length).
- Lengths of the expanded labels tensor.
- Expanded midi tensor of shape (Batch, Expanded_Length).
- Lengths of the expanded midi tensor.
- Expanded duration tensor of shape (Batch, Expanded_Length).
- Lengths of the expanded duration tensor.
Examples
>>> expand_len = [[2, 3], [1, 4]]
>>> len_size = [2, 2]
>>> label = torch.tensor([[1, 2], [3, 4]])
>>> midi = torch.tensor([[60, 62], [64, 65]])
>>> duration = torch.tensor([[100, 200], [300, 400]])
>>> result = expand_to_frame(expand_len, len_size, label, midi, duration)
>>> print(result)
(tensor([[1, 1, 2, 2, 2],
[3, 4, 4, 4, 4]]),
tensor([5, 5]),
tensor([[60, 60, 62, 62, 62],
[64, 65, 65, 65, 65]]),
tensor([5, 5]),
tensor([[100, 100, 200, 200, 200],
[300, 400, 400, 400, 400]]),
tensor([5, 5]))