espnet2.lm.espnet_model.ESPnetLanguageModel

About 4 min

espnet2.lm.espnet_model.ESPnetLanguageModel

class espnet2.lm.espnet_model.ESPnetLanguageModel(lm: AbsLM, vocab_size: int, ignore_id: int = 0)

Bases: AbsESPnetModel

The ESPnetLanguageModel class implements a language model using the ESPnet

framework. It is designed to compute the negative log likelihood of sequences of text, enabling the training and evaluation of language models.

An instance of a language model that follows the AbsLM interface.

Type:AbsLM

sos

The start-of-sequence token index.

Type: int

eos

The end-of-sequence token index.

Type: int

ignore_id

The token index to ignore during loss computation, default is 0, which may be shared with CTC-blank symbol for ASR.

Type: int
Parameters:
- lm (AbsLM) – The language model instance to be used.
- vocab_size (int) – The size of the vocabulary.
- ignore_id (int , optional) – The index of the token to ignore. Default is 0.

nll(text

torch.Tensor, text_lengths: torch.Tensor, max_length: Optional[int] = None) -> Tuple[torch.Tensor, torch.Tensor]: Computes the negative log likelihood of the input text.

batchify_nll(text

torch.Tensor, text_lengths: torch.Tensor, batch_size: int = 100) -> Tuple[torch.Tensor, torch.Tensor]: Computes the negative log likelihood in batches to avoid out-of-memory errors.

forward(text

torch.Tensor, text_lengths: torch.Tensor,

**

kwargs) -> Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor]: Computes the forward pass and returns the loss, statistics, and weight.

collect_feats(text

torch.Tensor, text_lengths: torch.Tensor,

**

kwargs) -> Dict[str, torch.Tensor]: Collects features from the input text. Currently, it returns an empty dictionary.

############# Examples

Create a language model

lm = ESPnetLanguageModel(lm_model_instance, vocab_size=1000)

Compute negative log likelihood

nll, lengths = lm.nll(text_tensor, text_lengths_tensor)

Compute batch negative log likelihood

nll_batch, lengths_batch = lm.batchify_nll(text_tensor, text_lengths_tensor)

Forward pass

loss, stats, weight = lm.forward(text_tensor, text_lengths_tensor)

######## NOTE This class assumes that the input tensors are on the same device as the model. It also assumes that the language model provided is compatible with the expected input format.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

batchify_nll(text

: Tensor, text_lengths: Tensor, batch_size: int = 100) → Tuple[Tensor, Tensor]

Compute negative log likelihood (nll) from transformer language model.

To avoid Out Of Memory (OOM) errors, this function separates the input into batches. It then calls the nll method for each batch and combines the results before returning them.

None

Parameters:
- text – A tensor of shape (Batch, Length) containing the input text data.
- text_lengths – A tensor of shape (Batch,) indicating the lengths of each sequence in the batch.
- batch_size – An integer specifying the number of samples each batch contains when computing nll. Adjust this value to avoid OOM errors or to increase efficiency.
Returns:
- nll: A tensor of shape (Total, Length) with the computed negative : log likelihood for each sequence.
- x_lengths: A tensor of shape (Total,) with the lengths of each : processed sequence.
Return type: A tuple containing

############# Examples

>>> model = ESPnetLanguageModel(lm, vocab_size=100)
>>> text = torch.randint(0, 100, (250, 50))
>>> text_lengths = torch.randint(1, 51, (250,))
>>> nll, x_lengths = model.batchify_nll(text, text_lengths, batch_size=100)

######## NOTE The method uses the nll function to compute negative log likelihood for each batch of text, which is crucial for language model training.

collect_feats(text

: Tensor, text_lengths: Tensor, **kwargs) → Dict[str, Tensor]

Collect features from the input text tensor.

This method processes the input text and text lengths to collect features, which can be useful for various downstream tasks. The specific implementation details depend on the language model being used. Currently, this method returns an empty dictionary.

Parameters:
- text (torch.Tensor) – A tensor of shape (Batch, Length) containing the input text data.
- text_lengths (torch.Tensor) – A tensor of shape (Batch,) containing the lengths of each input text in the batch.
- **kwargs – Additional keyword arguments that may be used by subclasses.
Returns: A dictionary containing the collected features.
Return type: Dict[str, torch.Tensor]

############# Examples

>>> model = ESPnetLanguageModel(lm, vocab_size=100)
>>> text = torch.tensor([[1, 2, 3], [4, 5, 6]])
>>> text_lengths = torch.tensor([3, 3])
>>> features = model.collect_feats(text, text_lengths)
>>> print(features)  # Output: {}

forward(text

: Tensor, text_lengths: Tensor, **kwargs) → Tuple[Tensor, Dict[str, Tensor], Tensor]

Forward pass for the ESPnetLanguageModel.

This method computes the negative log likelihood (NLL) of the input text and returns the loss, statistics, and the weight for the current batch. It first calls the nll method to obtain the NLL and the lengths of the output tokens, then calculates the loss as the sum of the NLL divided by the total number of tokens.

Parameters:
- text (torch.Tensor) – Input tensor of shape (Batch, Length) representing the sequences of text.
- text_lengths (torch.Tensor) – Tensor of shape (Batch,) containing the lengths of each sequence in the batch.
- **kwargs – Additional keyword arguments (unused in this method).
Returns: A tuple containing:
- loss (torch.Tensor): The computed loss for the batch.
- stats (Dict[str, torch.Tensor]): A dictionary containing statistics such as the loss.
- weight (torch.Tensor): The number of tokens processed in the batch.
Return type: Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor]

############# Examples

>>> model = ESPnetLanguageModel(lm, vocab_size=1000)
>>> text = torch.randint(0, 999, (32, 10))  # Batch of 32 sequences of length 10
>>> text_lengths = torch.randint(1, 11, (32,))  # Random lengths for each sequence
>>> loss, stats, weight = model.forward(text, text_lengths)
>>> print(loss, stats, weight)

######## NOTE The loss is computed in a way that handles padding and ensures that the model can be used in a data-parallel setting.

nll(text

: Tensor, text_lengths: Tensor, max_length: int | None = None) → Tuple[Tensor, Tensor]

Compute negative log likelihood (nll).

This function is typically called within the batchify_nll method to calculate the negative log likelihood for a given batch of text data.

Parameters:
- text (torch.Tensor) – A tensor of shape (Batch, Length) representing the input text sequences.
- text_lengths (torch.Tensor) – A tensor of shape (Batch,) that contains the lengths of each text sequence in the batch.
- max_length (Optional *[*int ]) – An optional integer to limit the maximum length of the sequences. If None, it will use the maximum length from text_lengths.
Returns: A tuple containing: : - A tensor of shape (Batch, Length) representing the negative log likelihood for each sequence in the batch.
- A tensor of shape (Batch,) representing the lengths of the processed sequences after padding.
Return type: Tuple[torch.Tensor, torch.Tensor]

############# Examples

>>> model = ESPnetLanguageModel(lm, vocab_size=100)
>>> text = torch.tensor([[1, 2, 3], [4, 5, 6]])
>>> text_lengths = torch.tensor([3, 3])
>>> nll, lengths = model.nll(text, text_lengths)
>>> print(nll.shape)  # Output: (2, 4)
>>> print(lengths)     # Output: tensor([4, 4])

######## NOTE This method uses padding to ensure that all sequences are of equal length for batch processing. The <sos> and <eos> tokens are used to denote the start and end of the sequences, respectively.

Raises:
- ValueError – If the shapes of text and text_lengths do not
- align or if max_length is less than the maximum length in –
- text_lengths –