espnet2.train.preprocessor.SpkPreprocessor
espnet2.train.preprocessor.SpkPreprocessor
class espnet2.train.preprocessor.SpkPreprocessor(train: bool, target_duration: float, spk2utt: str | None = None, sample_rate: int = 16000, num_eval: int = 10, rir_scp: str | None = None, rir_apply_prob: float = 1.0, noise_info: List[Tuple[float, str, Tuple[int, int], Tuple[float, float]]] | None = None, noise_apply_prob: float = 1.0, short_noise_thres: float = 0.5)
Bases: CommonPreprocessor
Preprocessor for Speaker tasks.
This class is responsible for preprocessing audio data for speaker-related tasks, including applying data augmentation techniques such as RIR convolution and noise addition. It also maps speaker labels to integer values for training and evaluation.
train
Whether to use in training mode.
- Type: bool
spk2utt
Path to the spk2utt file.
- Type: str
target_duration
Target duration in seconds.
- Type: float
sample_rate
Sampling rate.
- Type: int
num_eval
Number of utterances to be used for evaluation.
- Type: int
rir_scp
Path to the RIR scp file.
- Type: str
rir_apply_prob
Probability of applying RIR.
- Type: float
noise_info
List of tuples of noise information. Each tuple represents a noise type. Each tuple consists of (prob, noise_scp, num_to_mix, db_range).
- prob (float) is the probability of applying the noise type.
- noise_scp (str) is the path to the noise scp file.
- num_to_mix (Tuple[int, int]) is the range of the number of noises : to be mixed.
- db_range (Tuple[float, float]) is the range of noise levels in dB.
- Type: List[Tuple[float, str, Tuple[int, int], Tuple[float, float]]]
noise_apply_prob
Probability of applying noise.
- Type: float
short_noise_thres
Threshold of short noise.
Type: float
Parameters:
- train (bool) – Whether to use in training mode.
- target_duration (float) – Target duration in seconds.
- spk2utt (Optional *[*str ]) – Path to the spk2utt file.
- sample_rate (int) – Sampling rate.
- num_eval (int) – Number of utterances to be used for evaluation.
- rir_scp (Optional *[*str ]) – Path to the RIR scp file.
- rir_apply_prob (float) – Probability of applying RIR.
- noise_info (Optional *[*List *[*Tuple *[*float , str , Tuple *[*int , int ] , Tuple *[*float , float ] ] ] ]) – List of noise information tuples.
- noise_apply_prob (float) – Probability of applying noise.
- short_noise_thres (float) – Threshold for short noise.
Raises:ValueError – If the noise information is incorrectly formatted.
Examples
Creating an instance of SpkPreprocessor
spk_preprocessor = SpkPreprocessor(
train=True, target_duration=5.0, spk2utt=’path/to/spk2utt’, sample_rate=16000, num_eval=10, rir_scp=’path/to/rir.scp’, noise_info=[(0.5, ‘path/to/noise.scp’, (1, 3), (0.0, 5.0))], noise_apply_prob=0.8, short_noise_thres=0.5
)
Using the __call__ method to preprocess data
processed_data = spk_preprocessor(uid=’sample_uid’, data={‘speech’: audio_data, ‘spk_labels’: ‘speaker1’})