espnet2.text.phoneme_tokenizer.Phonemizer

Less than 1 minute

espnet2.text.phoneme_tokenizer.Phonemizer

class espnet2.text.phoneme_tokenizer.Phonemizer(backend, word_separator: str | None = None, syllable_separator: str | None = None, phone_separator: str | None = ' ', strip=False, split_by_single_token: bool = False, **phonemizer_kwargs)

Bases: object

Phonemizer module for various languages.

This is a wrapper module for the [phonemizer library, which provides phonemization capabilities for different languages. You can define various g2p (grapheme-to-phoneme) modules by specifying options for the phonemizer.

See available options: : https://github.com/bootphon/phonemizer/blob/master/phonemizer/phonemize.py#L32

SEE ALSO

https://github.com/bootphon/phonemizer

backend

The backend to use for phonemization.

Type: str

word_separator

Custom word separator.

Type: Optional[str]

syllable_separator

Custom syllable separator.

Type: Optional[str]

phone_separator

Custom phone separator (default is “ “).

Type: Optional[str]

strip

Whether to strip whitespace from the output.

Type: bool

split_by_single_token

Whether to split the output by single tokens.

Type: bool
Parameters:
- backend (str) – The backend for phonemization (e.g., “espeak”).
- word_separator (Optional *[*str ]) – Custom word separator.
- syllable_separator (Optional *[*str ]) – Custom syllable separator.
- phone_separator (Optional *[*str ]) – Custom phone separator.
- strip (bool) – Whether to strip whitespace from the output.
- split_by_single_token (bool) – Whether to split the output by single tokens.
- **phonemizer_kwargs – Additional keyword arguments for phonemizer.

Examples

>>> phonemizer = Phonemizer(backend='espeak')
>>> phonemes = phonemizer("Hello, world!")
>>> print(phonemes)
['h', 'ɛ', 'l', 'oʊ', ' ', 'w', 'ɜ', 'r', 'l', 'd']