espnet2.train.preprocessor.DynamicMixingPreprocessor
espnet2.train.preprocessor.DynamicMixingPreprocessor
class espnet2.train.preprocessor.DynamicMixingPreprocessor(train: bool, source_scp: str | None = None, ref_num: int = 2, dynamic_mixing_gain_db: float = 0.0, speech_name: str = 'speech_mix', speech_ref_name_prefix: str = 'speech_ref', mixture_source_name: str | None = None, utt2spk: str | None = None, categories: List | None = None)
Bases: AbsPreprocessor
Dynamic Mixing Preprocessor for speech data.
This preprocessor is responsible for dynamically mixing multiple speech sources based on the provided configuration. It allows the mixing of different speech utterances while applying random gain, and is particularly useful for tasks such as speech enhancement and separation.
source_scp
Path to the source SCP file containing speech files.
- Type: Optional[str]
ref_num
Number of reference utterances to mix.
- Type: int
dynamic_mixing_gain_db
Maximum gain in decibels to apply to each source.
- Type: float
speech_name
Key used to identify the mixed speech in the output data.
- Type: str
speech_ref_name_prefix
Prefix for reference speech names in the output data.
- Type: str
mixture_source_name
Key to select source utterances from the data loader.
- Type: Optional[str]
utt2spk
Path to the mapping of utterances to speakers.
- Type: Optional[str]
categories
List of categories for the utterances.
Type: Optional[List]
Parameters:
- train (bool) – Indicates whether the preprocessor is used for training.
- source_scp (Optional *[*str ]) – Path to the source SCP file.
- ref_num (int) – Number of reference utterances to mix.
- dynamic_mixing_gain_db (float) – Maximum gain for dynamic mixing.
- speech_name (str) – Name for the mixed speech output.
- speech_ref_name_prefix (str) – Prefix for reference speech output.
- mixture_source_name (Optional *[*str ]) – Key for source utterances.
- utt2spk (Optional *[*str ]) – Path to utterance-to-speaker mapping.
- categories (Optional *[*List ]) – List of categories for utterances.
Raises:ValueError – If source_scp is not provided.
Examples
>>> preprocessor = DynamicMixingPreprocessor(
... train=True,
... source_scp="path/to/source.scp",
... ref_num=2,
... dynamic_mixing_gain_db=3.0
... )
>>> mixed_data = preprocessor(uid="example_uid", data={"speech_ref1": np.random.rand(16000)})
>>> print(mixed_data.keys()) # Outputs: dict_keys(['speech_mix', 'speech_ref1', 'speech_ref2'])