espnet2.train.preprocessor.SLUPreprocessor

About 2 min

espnet2.train.preprocessor.SLUPreprocessor

Bases: CommonPreprocessor

Preprocessor for Spoken Language Understanding (SLU) tasks.

This class processes audio and text data for SLU tasks, allowing for the handling of different tokenization strategies, data augmentation, and normalization techniques. It inherits from CommonPreprocessor and extends its functionalities specifically for SLU.

transcript_tokenizer

Tokenizer for the transcript text.

Type: Tokenizer

transcript_token_id_converter

Converts transcript tokens to IDs.

Type:TokenIDConverter
Parameters:
- train (bool) – Whether to use in training mode.
- token_type (Optional *[*str ]) – Type of tokenization (e.g., ‘word’, ‘bpe’).
- token_list (Union *[*Path , str , Iterable *[*str ] ]) – Path or list of tokens for the main text.
- transcript_token_list (Union *[*Path , str , Iterable *[*str ] ]) – Path or list of tokens for the transcript text.
- bpemodel (Union *[*Path , str , Iterable *[*str ] ]) – Path to the BPE model.
- text_cleaner (Collection *[*str ]) – Text cleaning strategies to apply.
- g2p_type (Optional *[*str ]) – Type of grapheme-to-phoneme model to use.
- unk_symbol (str) – Symbol for unknown tokens (default: “<unk>”).
- space_symbol (str) – Symbol representing spaces (default: “<space>”).
- non_linguistic_symbols (Union *[*Path , str , Iterable *[*str ] ]) – Path or list of non-linguistic symbols.
- delimiter (Optional *[*str ]) – Delimiter for tokenization.
- rir_scp (Optional *[*str ]) – Path to the RIR SCP file for reverberation.
- rir_apply_prob (float) – Probability of applying RIR (default: 1.0).
- noise_scp (Optional *[*str ]) – Path to the noise SCP file.
- noise_apply_prob (float) – Probability of applying noise (default: 1.0).
- noise_db_range (str) – Range of noise levels in dB (default: “3_10”).
- short_noise_thres (float) – Threshold for short noise (default: 0.5).
- speech_volume_normalize (float) – Volume normalization factor.
- speech_name (str) – Key for speech data in the input dictionary.
- text_name (str) – Key for text data in the input dictionary.
- fs (int) – Sampling rate for the audio data.
- data_aug_effects (List) – List of data augmentation effects to apply.
- data_aug_num (List *[*int ]) – Number of augmentations to apply.
- data_aug_prob (float) – Probability of applying data augmentation.

Examples

>>> preprocessor = SLUPreprocessor(
...     train=True,
...     token_type='word',
...     token_list='path/to/token_list.txt',
...     transcript_token_list='path/to/transcript_token_list.txt',
...     bpemodel='path/to/bpemodel',
...     text_cleaner=['cleaner1', 'cleaner2'],
...     fs=16000
... )
>>> processed_data = preprocessor(uid='sample_uid', data={'speech': audio_data, 'text': 'sample text'})
>>> print(processed_data['text'])  # Output: token IDs of the processed text

NOTE

The transcript_token_list is optional; if provided, a separate tokenizer and ID converter will be initialized for transcripts.
Make sure the paths provided in token_list and transcript_token_list are valid and accessible.