espnet2.lm.huggingface_pretrained_opt_lm.HuggingfaceOPTModel

About 3 min

espnet2.lm.huggingface_pretrained_opt_lm.HuggingfaceOPTModel

class espnet2.lm.huggingface_pretrained_opt_lm.HuggingfaceOPTModel(vocab_size: int, opt_name: str)

Bases: AbsLM

HuggingfaceOPTModel is a language model that utilizes the OPT architecture

from Hugging Face’s Transformers library. This model inherits from the abstract class AbsLM and implements methods for forward propagation and scoring of tokens.

pretrained_params

A copy of the pretrained parameters from the OPT model excluding the embedding layer weights.

Type: dict

decoder

The decoder model based on the OPT architecture.

Type: OPTModel

lm_head

A linear layer that projects the hidden states to the vocabulary size.

Type: nn.Linear
Parameters:
- vocab_size (int) – The size of the vocabulary for the language model.
- opt_name (str) – The name of the pretrained OPT model to load.
Raises:Exception – If the transformers library is not properly installed.

############# Examples

Initializing the model

model = HuggingfaceOPTModel(vocab_size=50265, opt_name=’facebook/opt-1.3b’)

Forward pass

input_ids = torch.tensor([[1, 2, 3], [4, 5, 0]]) # Example input logits, _ = model(input_ids, None)

Scoring a new token

y = torch.tensor([6]) # New token state = None # No previous state scores, new_state = model.score(y, state, input_ids)

Batch scoring

ys = torch.tensor([[1, 2], [3, 4]]) # Prefix tokens states = [None, None] # States for each prefix xs = torch.tensor([[0.1, 0.2], [0.3, 0.4]]) # Encoder features batch_scores, new_states = model.batch_score(ys, states, xs)

Reloading pretrained parameters

model.reload_pretrained_parameters()

Initialize internal Module state, shared by both nn.Module and ScriptModule.

batch_score(ys: Tensor, states: List[Any], xs: Tensor) → Tuple[Tensor, List[Any]]

Score new token batch.

This method computes the scores for a batch of new tokens based on the provided input sequences and their corresponding states. It leverages the decoder from the OPT model to perform batch decoding and returns the scores for the next token along with updated states.

Parameters:
- ys (torch.Tensor) – torch.int64 prefix tokens of shape (n_batch, ylen).
- states (List *[*Any ]) – Scorer states for prefix tokens, with each state corresponding to a batch entry.
- xs (torch.Tensor) – The encoder feature that generates ys, with shape (n_batch, xlen, n_feat).
Returns: Tuple containing: : - batchfied scores for next token with shape of (n_batch, vocab_size).
- next state list for ys.
Return type: tuple[torch.Tensor, List[Any]]

############# Examples

>>> model = HuggingfaceOPTModel(vocab_size=50265, opt_name='facebook/opt-1.3b')
>>> ys = torch.tensor([[1, 2, 3], [4, 5, 6]])
>>> states = [None, None]
>>> xs = torch.randn(2, 10, 768)  # Example encoder features
>>> scores, next_states = model.batch_score(ys, states, xs)
>>> print(scores.shape)  # Should output: torch.Size([2, 50265])

forward(input: Tensor, hidden: None) → Tuple[Tensor, None]

Compute LM loss value from buffer sequences.

This method takes input token IDs and computes the corresponding logits using the model’s decoder and a linear layer (lm_head).

Parameters:
- input (torch.Tensor) – Input token IDs of shape (batch, len).
- hidden (None) – Placeholder for future use; currently not used.
Returns: A tuple containing: : - logits (torch.Tensor): Output logits of shape (batch, len, vocab_size).
- None: Placeholder for hidden state; currently returns None.
Return type: Tuple[torch.Tensor, None]

############# Examples

>>> model = HuggingfaceOPTModel(vocab_size=1000, opt_name='facebook/opt-125m')
>>> input_tensor = torch.randint(0, 1000, (2, 10))  # (batch, len)
>>> logits, _ = model.forward(input_tensor, None)
>>> logits.shape
torch.Size([2, 10, 1000])

reload_pretrained_parameters()

Reloads the pretrained parameters into the decoder of the model.

This method updates the decoder’s state dictionary with the pretrained parameters stored in self.pretrained_params. It allows for the model to be re-initialized with pretrained weights without the need to re-instantiate the model itself.

Raises:RuntimeError – If the state_dict cannot be loaded into the decoder.

############# Examples

>>> model = HuggingfaceOPTModel(vocab_size=50265, opt_name="facebook/opt-1.3b")
>>> model.reload_pretrained_parameters()
INFO:root:Pretrained OPT model parameters reloaded!

NOTE

Ensure that the pretrained parameters have been correctly set before calling this method.

score(y: Tensor, state: Any, x: Tensor) → Tuple[Tensor, Any]

Score new token.

This method computes the scores for the next token based on the provided prefix tokens and the current state. It leverages the underlying OPT model to perform this scoring, returning the softmax scores and the updated state for subsequent predictions.

Parameters:
- y (torch.Tensor) – 1D torch.int64 prefix tokens.
- state – Scorer state for prefix tokens.
- x (torch.Tensor) – Encoder feature that generates ys.
Returns: A tuple containing: : - torch.float32 scores for the next token (vocab_size).
- Next state for ys, which can be used in future calls.
Return type: Tuple[torch.Tensor, Any]

############# Examples

>>> model = HuggingfaceOPTModel(vocab_size=50265, opt_name="facebook/opt-1.3b")
>>> prefix_tokens = torch.tensor([1, 2, 3])  # Example prefix tokens
>>> state = None  # Initial state
>>> encoder_features = torch.randn(1, 3, 768)  # Example encoder features
>>> scores, new_state = model.score(prefix_tokens, state, encoder_features)
>>> print(scores.shape)  # Should print: torch.Size([50265])

NOTE

Ensure that the state passed is compatible with the model’s caching mechanism for optimal performance.