espnet2.asr.encoder.hubert_encoder.FairseqHubertEncoder
espnet2.asr.encoder.hubert_encoder.FairseqHubertEncoder
class espnet2.asr.encoder.hubert_encoder.FairseqHubertEncoder(input_size: int, hubert_url: str = './', hubert_dir_path: str = './', output_size: int = 256, normalize_before: bool = False, freeze_finetune_updates: int = 0, dropout_rate: float = 0.0, activation_dropout: float = 0.1, attention_dropout: float = 0.0, mask_length: int = 10, mask_prob: float = 0.75, mask_selection: str = 'static', mask_other: int = 0, apply_mask: bool = True, mask_channel_length: int = 64, mask_channel_prob: float = 0.5, mask_channel_other: int = 0, mask_channel_selection: str = 'static', layerdrop: float = 0.1, feature_grad_mult: float = 0.0)
Bases: AbsEncoder
FairSeq Hubert encoder module, used for loading pretrained weight and finetuning
- Parameters:
- input_size β input dim
- hubert_url β url to Hubert pretrained model
- hubert_dir_path β directory to download the Wav2Vec2.0 pretrained model.
- output_size β dimension of attention
- normalize_before β whether to use layer_norm before the first block
- freeze_finetune_updates β steps that freeze all layers except output layer before tuning the whole model (nessasary to prevent overfit).
- dropout_rate β dropout rate
- activation_dropout β dropout rate in activation function
- attention_dropout β dropout rate in attention
Hubert specific Args: : Please refer to: https://github.com/pytorch/fairseq/blob/master/fairseq/models/hubert/hubert.py
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(xs_pad: Tensor, ilens: Tensor, prev_states: Tensor = None) β Tuple[Tensor, Tensor, Tensor | None]
Forward Hubert ASR Encoder.
- Parameters:
- xs_pad β input tensor (B, L, D)
- ilens β input length (B)
- prev_states β Not to be used now.
- Returns: position embedded tensor and mask
output_size() β int
reload_pretrained_parameters()
