espnet2.text.korean_cleaner.KoreanCleaner
espnet2.text.korean_cleaner.KoreanCleaner
class espnet2.text.korean_cleaner.KoreanCleaner
Bases: object
KoreanCleaner is a utility class for normalizing Korean text by converting
numbers and English letters to their corresponding Korean representations.
This class contains methods to normalize numbers to Korean words and to convert English letters to their phonetic Korean equivalents.
- normalize_text
Strips the input text and normalizes both numbers and English text.
- _normalize_numbers
Converts digits to their Korean word representations.
- _normalize_english_text
Converts English letters to their Korean phonetic equivalents.
####### Examples
>>> cleaner = KoreanCleaner()
>>> cleaner.normalize_text("Hello 123")
'헬로에이치 일 이 삼'
>>> cleaner.normalize_text("My number is 456.")
'마이 넘버 이 사 육.'
>>> cleaner.normalize_text("2023 is the year.")
'이 공 이 삼 년은 더 년이다.'
None
- Parameters:text (str) – The input text to be normalized.
- Returns: The normalized text after converting numbers and English letters.
- Return type: str
- Raises:None –
classmethod normalize_text(text)
Normalize the input text by stripping whitespace, converting numbers to their
Korean equivalents, and transforming English letters to their Korean phonetic representations.
This method processes the input string through several stages to ensure that numbers and English text are appropriately normalized for further use.
None
- Parameters:text (str) – The input string that needs to be normalized.
- Returns: The normalized string after processing.
- Return type: str
####### Examples
>>> KoreanCleaner.normalize_text("Hello 123")
'에이치이이영삼'
>>> KoreanCleaner.normalize_text(" Test 456 ")
'티이영사사'
NOTE
The normalization includes: : - Stripping leading and trailing whitespace.
- Converting digits to Korean words (e.g., ‘1’ to ‘일’).
- Converting uppercase English letters to their Korean phonetic representations (e.g., ‘A’ to ‘에이’).