espnet2.lm.espnet_model.ESPnetLanguageModel
espnet2.lm.espnet_model.ESPnetLanguageModel
class espnet2.lm.espnet_model.ESPnetLanguageModel(lm: AbsLM, vocab_size: int, ignore_id: int = 0)
Bases: AbsESPnetModel
The ESPnetLanguageModel class implements a language model using the ESPnet
framework. It is designed to compute the negative log likelihood of sequences of text, enabling the training and evaluation of language models.
lm
An instance of a language model that follows the AbsLM interface.
- Type:AbsLM
sos
The start-of-sequence token index.
- Type: int
eos
The end-of-sequence token index.
- Type: int
ignore_id
The token index to ignore during loss computation, default is 0, which may be shared with CTC-blank symbol for ASR.
Type: int
Parameters:
- lm (AbsLM) – The language model instance to be used.
- vocab_size (int) – The size of the vocabulary.
- ignore_id (int , optional) – The index of the token to ignore. Default is 0.
nll(text
torch.Tensor, text_lengths: torch.Tensor, max_length: Optional[int] = None) -> Tuple[torch.Tensor, torch.Tensor]: Computes the negative log likelihood of the input text.
batchify_nll(text
torch.Tensor, text_lengths: torch.Tensor, batch_size: int = 100) -> Tuple[torch.Tensor, torch.Tensor]: Computes the negative log likelihood in batches to avoid out-of-memory errors.
forward(text
torch.Tensor, text_lengths: torch.Tensor,
**
kwargs) -> Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor]: Computes the forward pass and returns the loss, statistics, and weight.
collect_feats(text
torch.Tensor, text_lengths: torch.Tensor,
**
kwargs) -> Dict[str, torch.Tensor]: Collects features from the input text. Currently, it returns an empty dictionary.
############# Examples
Create a language model
lm = ESPnetLanguageModel(lm_model_instance, vocab_size=1000)
Compute negative log likelihood
nll, lengths = lm.nll(text_tensor, text_lengths_tensor)
Compute batch negative log likelihood
nll_batch, lengths_batch = lm.batchify_nll(text_tensor, text_lengths_tensor)
Forward pass
loss, stats, weight = lm.forward(text_tensor, text_lengths_tensor)
######## NOTE This class assumes that the input tensors are on the same device as the model. It also assumes that the language model provided is compatible with the expected input format.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
#
batchify_nll(text
Compute negative log likelihood (nll) from transformer language model.
To avoid Out Of Memory (OOM) errors, this function separates the input into batches. It then calls the nll method for each batch and combines the results before returning them.
None
- Parameters:
- text – A tensor of shape (Batch, Length) containing the input text data.
- text_lengths – A tensor of shape (Batch,) indicating the lengths of each sequence in the batch.
- batch_size – An integer specifying the number of samples each batch contains when computing nll. Adjust this value to avoid OOM errors or to increase efficiency.
- Returns:
- nll: A tensor of shape (Total, Length) with the computed negative : log likelihood for each sequence.
- x_lengths: A tensor of shape (Total,) with the lengths of each : processed sequence.
- Return type: A tuple containing
############# Examples
>>> model = ESPnetLanguageModel(lm, vocab_size=100)
>>> text = torch.randint(0, 100, (250, 50))
>>> text_lengths = torch.randint(1, 51, (250,))
>>> nll, x_lengths = model.batchify_nll(text, text_lengths, batch_size=100)
######## NOTE The method uses the nll function to compute negative log likelihood for each batch of text, which is crucial for language model training.
#
collect_feats(text
Collect features from the input text tensor.
This method processes the input text and text lengths to collect features, which can be useful for various downstream tasks. The specific implementation details depend on the language model being used. Currently, this method returns an empty dictionary.
- Parameters:
- text (torch.Tensor) – A tensor of shape (Batch, Length) containing the input text data.
- text_lengths (torch.Tensor) – A tensor of shape (Batch,) containing the lengths of each input text in the batch.
- **kwargs – Additional keyword arguments that may be used by subclasses.
- Returns: A dictionary containing the collected features.
- Return type: Dict[str, torch.Tensor]
############# Examples
>>> model = ESPnetLanguageModel(lm, vocab_size=100)
>>> text = torch.tensor([[1, 2, 3], [4, 5, 6]])
>>> text_lengths = torch.tensor([3, 3])
>>> features = model.collect_feats(text, text_lengths)
>>> print(features) # Output: {}
#
forward(text
Forward pass for the ESPnetLanguageModel.
This method computes the negative log likelihood (NLL) of the input text and returns the loss, statistics, and the weight for the current batch. It first calls the nll method to obtain the NLL and the lengths of the output tokens, then calculates the loss as the sum of the NLL divided by the total number of tokens.
- Parameters:
- text (torch.Tensor) – Input tensor of shape (Batch, Length) representing the sequences of text.
- text_lengths (torch.Tensor) – Tensor of shape (Batch,) containing the lengths of each sequence in the batch.
- **kwargs – Additional keyword arguments (unused in this method).
- Returns: A tuple containing:
- loss (torch.Tensor): The computed loss for the batch.
- stats (Dict[str, torch.Tensor]): A dictionary containing statistics such as the loss.
- weight (torch.Tensor): The number of tokens processed in the batch.
- Return type: Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor]
############# Examples
>>> model = ESPnetLanguageModel(lm, vocab_size=1000)
>>> text = torch.randint(0, 999, (32, 10)) # Batch of 32 sequences of length 10
>>> text_lengths = torch.randint(1, 11, (32,)) # Random lengths for each sequence
>>> loss, stats, weight = model.forward(text, text_lengths)
>>> print(loss, stats, weight)
######## NOTE The loss is computed in a way that handles padding and ensures that the model can be used in a data-parallel setting.
#
nll(text
Compute negative log likelihood (nll).
This function is typically called within the batchify_nll method to calculate the negative log likelihood for a given batch of text data.
- Parameters:
- text (torch.Tensor) – A tensor of shape (Batch, Length) representing the input text sequences.
- text_lengths (torch.Tensor) – A tensor of shape (Batch,) that contains the lengths of each text sequence in the batch.
- max_length (Optional *[*int ]) – An optional integer to limit the maximum length of the sequences. If None, it will use the maximum length from text_lengths.
- Returns: A tuple containing: : - A tensor of shape (Batch, Length) representing the negative log likelihood for each sequence in the batch.
- A tensor of shape (Batch,) representing the lengths of the processed sequences after padding.
- Return type: Tuple[torch.Tensor, torch.Tensor]
############# Examples
>>> model = ESPnetLanguageModel(lm, vocab_size=100)
>>> text = torch.tensor([[1, 2, 3], [4, 5, 6]])
>>> text_lengths = torch.tensor([3, 3])
>>> nll, lengths = model.nll(text, text_lengths)
>>> print(nll.shape) # Output: (2, 4)
>>> print(lengths) # Output: tensor([4, 4])
######## NOTE This method uses padding to ensure that all sequences are of equal length for batch processing. The <sos> and <eos> tokens are used to denote the start and end of the sequences, respectively.
- Raises:
- ValueError – If the shapes of text and text_lengths do not
- align or if max_length is less than the maximum length in –
- text_lengths –