espnet2.train.preprocessor.SpkPreprocessor

About 2 min

espnet2.train.preprocessor.SpkPreprocessor

class espnet2.train.preprocessor.SpkPreprocessor(train: bool, target_duration: float, spk2utt: str | None = None, sample_rate: int = 16000, num_eval: int = 10, rir_scp: str | None = None, rir_apply_prob: float = 1.0, noise_info: List[Tuple[float, str, Tuple[int, int], Tuple[float, float]]] | None = None, noise_apply_prob: float = 1.0, short_noise_thres: float = 0.5)

Bases: CommonPreprocessor

Preprocessor for Speaker tasks.

This class is responsible for preprocessing audio data for speaker-related tasks, including applying data augmentation techniques such as RIR convolution and noise addition. It also maps speaker labels to integer values for training and evaluation.

train

Whether to use in training mode.

Type: bool

spk2utt

Path to the spk2utt file.

Type: str

target_duration

Target duration in seconds.

Type: float

sample_rate

Sampling rate.

Type: int

num_eval

Number of utterances to be used for evaluation.

Type: int

rir_scp

Path to the RIR scp file.

Type: str

rir_apply_prob

Probability of applying RIR.

Type: float

noise_info

List of tuples of noise information. Each tuple represents a noise type. Each tuple consists of (prob, noise_scp, num_to_mix, db_range).

prob (float) is the probability of applying the noise type.
noise_scp (str) is the path to the noise scp file.
num_to_mix (Tuple[int, int]) is the range of the number of noises : to be mixed.
db_range (Tuple[float, float]) is the range of noise levels in dB.

Type: List[Tuple[float, str, Tuple[int, int], Tuple[float, float]]]

noise_apply_prob

Probability of applying noise.

Type: float

short_noise_thres

Threshold of short noise.

Type: float
Parameters:
- train (bool) – Whether to use in training mode.
- target_duration (float) – Target duration in seconds.
- spk2utt (Optional *[*str ]) – Path to the spk2utt file.
- sample_rate (int) – Sampling rate.
- num_eval (int) – Number of utterances to be used for evaluation.
- rir_scp (Optional *[*str ]) – Path to the RIR scp file.
- rir_apply_prob (float) – Probability of applying RIR.
- noise_info (Optional *[*List *[*Tuple *[*float , str , Tuple *[*int , int ] , Tuple *[*float , float ] ] ] ]) – List of noise information tuples.
- noise_apply_prob (float) – Probability of applying noise.
- short_noise_thres (float) – Threshold for short noise.
Raises:ValueError – If the noise information is incorrectly formatted.

Examples

Creating an instance of SpkPreprocessor

spk_preprocessor = SpkPreprocessor(

train=True, target_duration=5.0, spk2utt=’path/to/spk2utt’, sample_rate=16000, num_eval=10, rir_scp=’path/to/rir.scp’, noise_info=[(0.5, ‘path/to/noise.scp’, (1, 3), (0.0, 5.0))], noise_apply_prob=0.8, short_noise_thres=0.5

)

Using the call method to preprocess data

processed_data = spk_preprocessor(uid=’sample_uid’, data={‘speech’: audio_data, ‘spk_labels’: ‘speaker1’})

espnet2.train.preprocessor.SpkPreprocessor

espnet2.train.preprocessor.SpkPreprocessor

Examples

Creating an instance of SpkPreprocessor

Using the __call__ method to preprocess data

Using the call method to preprocess data