espnet2.lid.espnet_model.ESPnetLIDModel
espnet2.lid.espnet_model.ESPnetLIDModel
class espnet2.lid.espnet_model.ESPnetLIDModel(frontend: AbsFrontend | None, specaug: AbsSpecAug | None, normalize: AbsNormalize | None, encoder: AbsEncoder | None, pooling: AbsPooling | None, projector: AbsProjector | None, loss: AbsLoss | None, extract_feats_in_collect_stats: bool | None = None)
Bases: AbsESPnetModel
ESPnet LID model
Support for language identification and language embedding extraction.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
collect_feats(speech: Tensor, speech_lengths: Tensor, lid_labels: Tensor | None = None, **kwargs) β Dict[str, Tensor]
encode_frame(feats: Tensor) β Tensor
extract_feats(speech: Tensor, speech_lengths: Tensor) β Tuple[Tensor, Tensor]
forward(speech: Tensor, speech_lengths: Tensor, lid_labels: Tensor | None = None, extract_embd: bool = False, **kwargs) β Tuple[Tensor, Tensor] | Tuple[Tensor, Dict[str, Tensor], Tensor] | Tensor
Forward pass of the LID model.
Processes raw speech through frontend, encoder, pooling, and loss modules.
- Parameters:
- speech β Input waveform tensor (batch_size, num_samples)
- speech_lengths β Lengths of each input in the batch (batch_size,)
- lid_labels β Ground truth language labels (batch_size,)
- extract_embd β If True, return language embeddings and predictions (inference mode)
- Returns: Tuple(lang_embd, pred_lids)
- If training:
Tuple(loss, stats_dict, batch_weight)
- Return type:
- If extract_embd=True (inference mode)
project_lang_embd(utt_level_feat: Tensor) β Tensor
