espnet3.systems.asr.tokenizers.sentencepiece.add_special_tokens
Less than 1 minute
espnet3.systems.asr.tokenizers.sentencepiece.add_special_tokens
espnet3.systems.asr.tokenizers.sentencepiece.add_special_tokens(tokenizer, converter, embedding, special_tokens, insert_after=None)
Add special tokens to the tokenizer.
For detailed usage, please refer to the demo notebook for espnet3 with SLU task.
Parameters:
- tokenizer – Sentencepiece tokenizer.
- converter – Sentencepiece converter.
- embedding – nn.Embedding object.
- special_tokens (list) – List of special tokens.
- insert_after (str | None) – If provided and found in the current token list, insert new tokens right after it. If None or not found, append new tokens to the end of the list.
Returns: Tuple( : tokenizer: new tokenizer, converter: new converter, embedding: new embedding,
)
Raises:ValueError – If
insert_afteris not present in the token list.
Example
>>> new_tok, new_conv, new_emb = add_special_tokens(
... tokenizer,
... converter,
... embedding,
... ["<new_token>"],
... )