espnet2.hubert.espnet_model.TorchAudioHubertPretrainModel
espnet2.hubert.espnet_model.TorchAudioHubertPretrainModel
class espnet2.hubert.espnet_model.TorchAudioHubertPretrainModel(vocab_size: int, token_list: Tuple[str, ...] | List[str], frontend: AbsFrontend | None, specaug: AbsSpecAug | None, normalize: AbsNormalize | None, preencoder: AbsPreEncoder | None, encoder: AbsEncoder, ignore_id: int = -1, **kwargs)
Bases: AbsESPnetModel
TorchAudio Hubert Pretrain model
Initialize internal Module state, shared by both nn.Module and ScriptModule.
collect_feats(speech: Tensor, speech_lengths: Tensor, text: Tensor, text_lengths: Tensor, **kwargs) β Dict[str, Tensor]
encode(speech: Tensor, speech_lengths: Tensor, y_pad: Tensor, y_pad_length: Tensor) β Tuple[Tensor, Tensor]
Frontend + Encoder. Note that this method is used by asr_inference.py
- Parameters:
- speech β (Batch, Length, β¦)
- speech_lengths β (Batch, )
- y_pad β (Batch, Length, β¦)
- y_pad_length β (Batch, )
forward(speech: Tensor, speech_lengths: Tensor, text: Tensor, text_lengths: Tensor, **kwargs) β Tuple[Tensor, Dict[str, Tensor], Tensor]
Frontend + Encoder + Calc loss
- Parameters:
- speech β (Batch, Length, β¦)
- speech_lengths β (Batch, )
- text β (Batch, Length)
- text_lengths β (Batch,)
- kwargs β βutt_idβ is among the input.
