espnet2.train.preprocessor.DynamicMixingPreprocessor

About 1 min

espnet2.train.preprocessor.DynamicMixingPreprocessor

class espnet2.train.preprocessor.DynamicMixingPreprocessor(train: bool, source_scp: str | None = None, ref_num: int = 2, dynamic_mixing_gain_db: float = 0.0, speech_name: str = 'speech_mix', speech_ref_name_prefix: str = 'speech_ref', mixture_source_name: str | None = None, utt2spk: str | None = None, categories: List | None = None)

Bases: AbsPreprocessor

Dynamic Mixing Preprocessor for speech data.

This preprocessor is responsible for dynamically mixing multiple speech sources based on the provided configuration. It allows the mixing of different speech utterances while applying random gain, and is particularly useful for tasks such as speech enhancement and separation.

source_scp

Path to the source SCP file containing speech files.

Type: Optional[str]

ref_num

Number of reference utterances to mix.

Type: int

dynamic_mixing_gain_db

Maximum gain in decibels to apply to each source.

Type: float

speech_name

Key used to identify the mixed speech in the output data.

Type: str

speech_ref_name_prefix

Prefix for reference speech names in the output data.

Type: str

mixture_source_name

Key to select source utterances from the data loader.

Type: Optional[str]

utt2spk

Path to the mapping of utterances to speakers.

Type: Optional[str]

Examples

>>> preprocessor = DynamicMixingPreprocessor(
...     train=True,
...     source_scp="path/to/source.scp",
...     ref_num=2,
...     dynamic_mixing_gain_db=3.0
... )
>>> mixed_data = preprocessor(uid="example_uid", data={"speech_ref1": np.random.rand(16000)})
>>> print(mixed_data.keys())  # Outputs: dict_keys(['speech_mix', 'speech_ref1', 'speech_ref2'])