espnet2.text.phoneme_tokenizer.Jaso
Less than 1 minute
espnet2.text.phoneme_tokenizer.Jaso
class espnet2.text.phoneme_tokenizer.Jaso(space_symbol=' ', no_space=False)
Bases: object
A class for converting Korean text into Jamo characters.
This class takes Korean text as input and converts it into its corresponding Jamo characters. It also provides options to handle spaces in the output.
PUNC
A string of punctuation characters.
- Type: str
SPACE
A string representing a space character.
- Type: str
JAMO_LEADS
A string of Jamo leading characters.
- Type: str
JAMO_VOWELS
A string of Jamo vowel characters.
- Type: str
JAMO_TAILS
A string of Jamo tail characters.
- Type: str
VALID_CHARS
A string of valid characters, including Jamo characters, punctuation, and spaces.
Type: str
Parameters:
- space_symbol (str) – The symbol to use for spaces in the output. Defaults to a regular space.
- no_space (bool) – If True, spaces will be removed from the output.
Examples
>>> jaso = Jaso(space_symbol='<space>', no_space=False)
>>> jaso("안녕하세요")
['ᄋ', 'ᅡ', 'ᄂ', 'ᅣ', 'ᄉ', 'ᅥ', 'ᄒ', 'ᅡ', 'ᄋ', 'ᅭ']
>>> jaso_no_space = Jaso(no_space=True)
>>> jaso_no_space("안녕하세요")
['ᄋ', 'ᅡ', 'ᄂ', 'ᅣ', 'ᄉ', 'ᅥ', 'ᄒ', 'ᅡ', 'ᄋ', 'ᅭ']
JAMO_LEADS
JAMO_TAILS
JAMO_VOWELS
PUNC
SPACE
VALID_CHARS