espnet2.slu.postdecoder.hugging_face_transformers_postdecoder.HuggingFaceTransformersPostDecoder

About 2 min

espnet2.slu.postdecoder.hugging_face_transformers_postdecoder.HuggingFaceTransformersPostDecoder

class espnet2.slu.postdecoder.hugging_face_transformers_postdecoder.HuggingFaceTransformersPostDecoder(model_name_or_path: str, output_size=256)

Bases: AbsPostDecoder

Hugging Face Transformers PostDecoder.

This class is responsible for decoding outputs from a pretrained Hugging Face Transformers model. It utilizes the transformers library to load models and tokenizers, and processes input sequences for downstream tasks in spoken language understanding (SLU).

model

The loaded Hugging Face model.

Type: transformers.AutoModel

tokenizer

The tokenizer for the model.

Type: transformers.AutoTokenizer

out_linear

Linear layer for output transformation.

Type: torch.nn.Linear

output_size_dim

Dimension of the output size.

Type: int
Parameters:
- model_name_or_path (str) – The model name or path to the pretrained model.
- output_size (int , optional) – The size of the output layer. Defaults to 256.
Raises:ImportError – If the transformers library is not installed.

########### Examples

>>> post_decoder = HuggingFaceTransformersPostDecoder(
...     model_name_or_path='bert-base-uncased',
...     output_size=128
... )
>>> input_ids, attention_mask, token_type_ids, position_ids = post_decoder.convert_examples_to_features(
...     ["Hello, how are you?"], max_seq_length=20
... )
>>> outputs = post_decoder.forward(input_ids[0], attention_mask[0],
...                                  token_type_ids[0], position_ids[0])
>>> print(outputs.shape)
torch.Size([20, 128])

Initialize the module.

convert_examples_to_features(data, max_seq_length)

Converts input text examples into features for model processing.

This method tokenizes input text examples and converts them into a format suitable for input into a transformer model. The output includes input IDs, attention masks, segment IDs, position IDs, and lengths of the input IDs.

Parameters:
- data (List *[*str ]) – A list of input text examples to be tokenized.
- max_seq_length (int) – The maximum sequence length for the tokenized inputs. Sequences longer than this will be truncated.
Returns: Tuple[List[List[int]], List[List[int]], List[List[int]], : > List[List[int]], List[int]]:
- A list of input IDs for each example.
- A list of attention masks for each example.
- A list of segment IDs for each example.
- A list of position IDs for each example.
- A list containing the lengths of the input ID sequences.
Raises:AssertionError – If the lengths of any of the generated features do not match max_seq_length.

########### Examples

>>> decoder = HuggingFaceTransformersPostDecoder("bert-base-uncased")
>>> data = ["Hello, world!", "This is a test."]
>>> features = decoder.convert_examples_to_features(data, max_seq_length=10)
>>> len(features)
(2, 2, 2, 2, 2)  # Corresponds to input_ids, attention_mask, segment_ids,
                  # position_ids, and input_id_length respectively.

NOTE

The method prepends the “[CLS]” token and appends the “[SEP]” token to each example. Padding is applied to ensure that all sequences have the same length as specified by max_seq_length.

forward(transcript_input_ids: LongTensor, transcript_attention_mask: LongTensor, transcript_token_type_ids: LongTensor, transcript_position_ids: LongTensor) → Tensor

Perform a forward pass through the model.

This method takes input tensors for the model and processes them through the Hugging Face Transformers model, followed by a linear transformation to produce the final output.

Parameters:
- transcript_input_ids (torch.LongTensor) – The input token IDs for the transcripts.
- transcript_attention_mask (torch.LongTensor) – The attention mask to avoid attending to padding tokens.
- transcript_token_type_ids (torch.LongTensor) – Token type IDs to distinguish between different segments.
- transcript_position_ids (torch.LongTensor) – Position IDs to indicate the position of tokens in the input sequence.
Returns: The output tensor after applying the linear transformation to the model’s last hidden state.
Return type: torch.Tensor

########### Examples

>>> model = HuggingFaceTransformersPostDecoder("bert-base-uncased")
>>> input_ids = torch.tensor([[101, 2023, 2003, 1037, 3391, 102]])
>>> attention_mask = torch.tensor([[1, 1, 1, 1, 1, 1]])
>>> token_type_ids = torch.tensor([[0, 0, 0, 0, 0, 0]])
>>> position_ids = torch.tensor([[0, 1, 2, 3, 4, 5]])
>>> output = model.forward(input_ids, attention_mask, token_type_ids,
...                        position_ids)
>>> print(output.shape)  # Should output the shape of the transformed tensor.

Raises:ValueError – If the input tensors are not of compatible shapes.

output_size() → int

Get the output size of the post-decoder.

This method retrieves the size of the output layer of the Hugging Face Transformers PostDecoder, which is defined during initialization. The output size is typically used to determine the dimensionality of the output tensor produced by the forward pass of the model.

Returns: The output size specified during the initialization of the HuggingFaceTransformersPostDecoder.
Return type: int

########### Examples

>>> decoder = HuggingFaceTransformersPostDecoder("bert-base-uncased", output_size=128)
>>> decoder.output_size()
128

NOTE

The output size can be adjusted according to the requirements of the downstream task, such as classification or regression.