espnet2.train.dataset.multi_columns_sound_loader

Less than 1 minute

espnet2.train.dataset.multi_columns_sound_loader

espnet2.train.dataset.multi_columns_sound_loader(path, float_dtype=None, allow_multi_rates=False)

Loads audio data from multi-column files into a structured format.

This function allows for loading audio data where each utterance may have multiple audio files associated with it. It can handle different sampling rates if allowed.

Parameters:
- path (str) – The path to the multi-column audio file.
- float_dtype (str , optional) – The desired data type for audio data (default: None).
- allow_multi_rates (bool , optional) – Whether to allow different sampling rates across audio files (default: False).
Returns: An adapter that provides access to the loaded audio data.
Return type:AdapterForSoundScpReader

Examples

>>> loader = multi_columns_sound_loader("path/to/multi_columns.scp")
>>> audio_data = loader["utterance_id_A"]  # Load audio for a specific utterance

NOTE

The audio signal is normalized to the range of [-1, 1].

Raises:RuntimeError – If there is an issue with loading the audio files or if sampling rates do not match when allow_multi_rates is False.