espnet2.asr.encoder.hugging_face_transformers_encoder.HuggingFaceTransformersEncoder

About 2 min

espnet2.asr.encoder.hugging_face_transformers_encoder.HuggingFaceTransformersEncoder

class espnet2.asr.encoder.hugging_face_transformers_encoder.HuggingFaceTransformersEncoder(input_size: int, model_name_or_path: str, lang_token_id: int = -1)

Bases: AbsEncoder

Hugging Face Transformers Encoder for Automatic Speech Recognition.

This class serves as an encoder that utilizes pre-trained models from the Hugging Face Transformers library for processing input sequences in automatic speech recognition tasks. It supports optional language token integration and manages attention masks to ensure effective sequence processing.

transformer

The underlying transformer model for encoding.

Type: transformers.PreTrainedModel

pretrained_params

A copy of the model’s parameters for reloading purposes.

Type: dict

lang_token_id

The token ID for the language token, if used.

Type: int
Parameters:
- input_size (int) – The size of the input feature vector.
- model_name_or_path (str) – The model identifier from Hugging Face’s model hub or a local path to a model.
- lang_token_id (int , optional) – The token ID for the language token to prepend to inputs. Defaults to -1 (disabled).
Raises:ImportError – If the transformers library is not available.

######

Example

>>> encoder = HuggingFaceTransformersEncoder(
...     input_size=512,
...     model_name_or_path='bert-base-uncased',
...     lang_token_id=101
... )
>>> input_tensor = torch.randint(0, 1000, (3, 512))
>>> input_lengths = torch.tensor([512, 300, 250])
>>> output, lengths = encoder(input_tensor, input_lengths)
>>> print(output.shape)  # Output shape will depend on the model
>>> print(lengths)  # Adjusted input lengths after processing

######## NOTE Ensure that the transformers library is installed before using this class. You can install it via pip install transformers.

Initialize the module.

forward(input: Tensor, input_lengths: Tensor) → Tuple[Tensor, Tensor]

Forward pass through the Hugging Face Transformers encoder.

This method processes the input tensor through the transformer model, optionally prepending a language token if lang_token_id is specified. It also generates an attention mask based on the input lengths to ensure that padding tokens are ignored during the attention computation.

Parameters:
- input (torch.Tensor) – The input tensor of shape (batch_size, sequence_length) containing token IDs.
- input_lengths (torch.Tensor) – A tensor of shape (batch_size,) containing the lengths of each input sequence.
Returns: A tuple where the first element is the output tensor of shape (batch_size, sequence_length, hidden_size) from the transformer, and the second element is the updated input_lengths tensor.
Return type: Tuple[torch.Tensor, torch.Tensor]

######

Example

>>> encoder = HuggingFaceTransformersEncoder(768, "bert-base-uncased")
>>> input_tensor = torch.tensor([[101, 2009, 2003, 102]])
>>> input_lengths = torch.tensor([4])
>>> output, lengths = encoder.forward(input_tensor, input_lengths)
>>> print(output.shape)  # Output: torch.Size([1, 4, 768])
>>> print(lengths)  # Output: tensor([4])

######## NOTE Ensure that the transformers library is installed before using this method. If the library is not available, an ImportError will be raised during the initialization of the encoder.

Raises:ImportError – If the transformers library is not available.

output_size() → int

Get the output size of the transformer model.

This method retrieves the hidden size of the transformer model, which corresponds to the dimensionality of the output embeddings produced by the model.

Returns: The hidden size of the transformer model.
Return type: int

######

Example

>>> encoder = HuggingFaceTransformersEncoder(
...     input_size=128,
...     model_name_or_path='bert-base-uncased'
... )
>>> encoder.output_size()
768  # For BERT base model

######## NOTE The output size may vary depending on the specific transformer model architecture being used.

reload_pretrained_parameters()

Reload the pretrained parameters of the transformer model.

This method restores the model’s parameters to their initial state that was saved during the initialization of the encoder. It can be useful for resetting the model’s weights to the pretrained values after fine-tuning or training on a specific task.

Example

>>> encoder = HuggingFaceTransformersEncoder(input_size=512,
...     model_name_or_path='bert-base-uncased')
>>> # After some training or modifications
>>> encoder.reload_pretrained_parameters()
Pretrained Transformers model parameters reloaded!

######## NOTE Ensure that the transformers library is installed to utilize this functionality. If the model parameters have not been set (e.g., after initialization), calling this method will reload the parameters to their original pretrained state.

Raises:
- ValueError – If the pretrained parameters are not set or if
- there is a mismatch between the model architecture and the –
- loaded parameters. –