espnet2.s2t.espnet_model.ESPnetS2TModel
espnet2.s2t.espnet_model.ESPnetS2TModel
class espnet2.s2t.espnet_model.ESPnetS2TModel(vocab_size: int, token_list: Tuple[str, ...] | List[str], frontend: AbsFrontend | None, specaug: AbsSpecAug | None, normalize: AbsNormalize | None, preencoder: AbsPreEncoder | None, encoder: AbsEncoder, postencoder: AbsPostEncoder | None, decoder: AbsDecoder | None, ctc: CTC, ctc_weight: float = 0.5, interctc_weight: float = 0.0, ignore_id: int = -1, lsm_weight: float = 0.0, length_normalized_loss: bool = False, report_cer: bool = True, report_wer: bool = True, sym_space: str = '<space>', sym_blank: str = '<blank>', sym_sos: str = '<sos>', sym_eos: str = '<eos>', sym_sop: str = '<sop>', sym_na: str = '<na>', extract_feats_in_collect_stats: bool = True)
Bases: AbsESPnetModel
CTC-attention hybrid Encoder-Decoder model
Initialize internal Module state, shared by both nn.Module and ScriptModule.
collect_feats(speech: Tensor, speech_lengths: Tensor, text: Tensor, text_lengths: Tensor, text_prev: Tensor, text_prev_lengths: Tensor, text_ctc: Tensor, text_ctc_lengths: Tensor, **kwargs) β Dict[str, Tensor]
encode(speech: Tensor, speech_lengths: Tensor) β Tuple[Tensor, Tensor]
Frontend + Encoder. Note that this method is used by s2t_inference.py
- Parameters:
- speech β (Batch, Length, β¦)
- speech_lengths β (Batch, )
forced_align(speech, speech_lengths, text, text_lengths)
Calculate frame-wise alignment from CTC probabilities.
- Parameters:
- speech β (Batch, Length, β¦)
- speech_lengths β (Batch,)
- text β (Batch, Length)
- text_lengths β (Batch,)
- Returns: Tuple(tensor, tensor): : - Label for each time step in the alignment path <br/> computed using forced alignment.
- Log probability scores of the labels for each time step.
- Return type: alignments
forward(speech: Tensor, speech_lengths: Tensor, text: Tensor, text_lengths: Tensor, text_prev: Tensor, text_prev_lengths: Tensor, text_ctc: Tensor, text_ctc_lengths: Tensor, **kwargs) β Tuple[Tensor, Dict[str, Tensor], Tensor]
Frontend + Encoder + Decoder + Calc loss
- Parameters:
- speech β (Batch, Length, β¦)
- speech_lengths β (Batch, )
- text β (Batch, Length)
- text_lengths β (Batch,)
- text_prev β (Batch, Length)
- text_prev_lengths β (Batch,)
- text_ctc β (Batch, Length)
- text_ctc_lengths β (Batch,)
- kwargs β βutt_idβ is among the input.
