espnet2.asr.postencoder.hugging_face_transformers_postencoder.HuggingFaceTransformersPostEncoder
espnet2.asr.postencoder.hugging_face_transformers_postencoder.HuggingFaceTransformersPostEncoder
class espnet2.asr.postencoder.hugging_face_transformers_postencoder.HuggingFaceTransformersPostEncoder(input_size: int, model_name_or_path: str, length_adaptor_n_layers: int = 0, lang_token_id: int = -1)
Bases: AbsPostEncoder
Hugging Face Transformers PostEncoder.
This class wraps a Hugging Face transformer model for use as a post-encoder in speech recognition tasks. It initializes the transformer model and processes the input data through various layers, including a length adaptor if specified.
transformer
The transformer model used for encoding.
- Type: torch.nn.Module
lang_token_embed
The language token embedding if a language token ID is provided.
- Type: torch.Tensor
pretrained_params
A deep copy of the transformer model’s initial state dictionary for later reloading.
- Type: dict
length_adaptor
A sequence of layers for adapting the input length.
- Type: torch.nn.Sequential
length_adaptor
The ratio by which the input length is reduced when passing through the length adaptor.
- Type: int
use_inputs_embeds
Indicates whether to use input embeddings.
- Type: bool
extend_attention_mask
Indicates whether to extend the attention mask for certain model types.
Type: bool
Parameters:
- input_size (int) – The size of the input features.
- model_name_or_path (str) – The name or path of the pretrained model to use.
- length_adaptor_n_layers (int , optional) – The number of layers in the length adaptor. Defaults to 0.
- lang_token_id (int , optional) – The ID of the language token to use. Defaults to -1.
Raises:ImportError – If the transformers library is not available.
######
Example
>>> post_encoder = HuggingFaceTransformersPostEncoder(
... input_size=256,
... model_name_or_path='bert-base-uncased',
... length_adaptor_n_layers=2,
... lang_token_id=101
... )
>>> input_tensor = torch.randn(10, 20, 256) # (batch_size, seq_len, features)
>>> input_lengths = torch.tensor([20] * 10) # Lengths of each sequence
>>> output, output_lengths = post_encoder(input_tensor, input_lengths)
####### NOTE Ensure that the transformers library is installed to use this class. You can install it via: pip install transformers or follow the ESPnet installation instructions.
Initialize the module.
forward(input: Tensor, input_lengths: Tensor) → Tuple[Tensor, Tensor]
Hugging Face Transformers PostEncoder.
This class serves as a post-encoder that utilizes pre-trained models from Hugging Face’s Transformers library to process input sequences. It includes functionalities for length adaptation and handling language token embeddings.
transformer
The underlying transformer model used for encoding.
lang_token_embed
The embedding for the language token, if applicable.
pretrained_params
A deep copy of the transformer model’s state dictionary.
use_inputs_embeds
A boolean indicating whether to use input embeddings.
extend_attention_mask
A boolean indicating whether to extend the attention mask.
linear_in
A linear layer to project input features to the transformer’s hidden size.
length_adaptor
A sequential model for length adaptation.
length_adaptor
The ratio of input length to output length after adaptation.
- Parameters:
- input_size (int) – The size of the input features.
- model_name_or_path (str) – The model name or path to the pre-trained model.
- length_adaptor_n_layers (int , optional) – Number of convolutional layers for length adaptation. Defaults to 0.
- lang_token_id (int , optional) – The token ID for the language token. Defaults to -1.
- Raises:ImportError – If the transformers library is not available.
######
Example
>>> post_encoder = HuggingFaceTransformersPostEncoder(
... input_size=128,
... model_name_or_path='bert-base-uncased'
... )
>>> input_tensor = torch.rand(2, 128, 128) # Batch of 2
>>> input_lengths = torch.tensor([128, 128])
>>> output, output_lengths = post_encoder.forward(input_tensor, input_lengths)
####### NOTE The forward method expects input tensors of shape (batch_size, seq_len, input_size) and input_lengths of shape (batch_size,).
output_size() → int
Get the output size of the transformer model.
This method retrieves the hidden size of the transformer model, which is defined in its configuration. The output size is crucial for downstream tasks where the model’s output needs to match the expected dimensions.
- Returns: The hidden size of the transformer model.
- Return type: int
######
Example
>>> post_encoder = HuggingFaceTransformersPostEncoder(
... input_size=256,
... model_name_or_path='bert-base-uncased'
... )
>>> post_encoder.output_size()
768 # For BERT, the hidden size is typically 768.
reload_pretrained_parameters()
Reloads the pretrained parameters of the Hugging Face Transformers model.
This method restores the parameters of the transformer model to their initial state as defined when the instance of HuggingFaceTransformersPostEncoder was created. This can be useful when you want to reset the model’s weights to the pretrained values after fine-tuning or any modification.
Example
>>> post_encoder = HuggingFaceTransformersPostEncoder(
... input_size=128,
... model_name_or_path='bert-base-uncased'
... )
>>> # Fine-tuning or modifying the model parameters...
>>> post_encoder.reload_pretrained_parameters() # Reloads pretrained params
####### NOTE Ensure that the model has been initialized and pretrained parameters are stored before calling this method, or it will reload with the last saved state.
- Raises:
- RuntimeError – If the model is not properly initialized or the
- pretrained parameters are not available. –