espnet2.train.dataset.multi_columns_sound_loader
Less than 1 minute
espnet2.train.dataset.multi_columns_sound_loader
espnet2.train.dataset.multi_columns_sound_loader(path, float_dtype=None, allow_multi_rates=False)
Loads audio data from multi-column files into a structured format.
This function allows for loading audio data where each utterance may have multiple audio files associated with it. It can handle different sampling rates if allowed.
- Parameters:
- path (str) – The path to the multi-column audio file.
- float_dtype (str , optional) – The desired data type for audio data (default: None).
- allow_multi_rates (bool , optional) – Whether to allow different sampling rates across audio files (default: False).
- Returns: An adapter that provides access to the loaded audio data.
- Return type:AdapterForSoundScpReader
Examples
>>> loader = multi_columns_sound_loader("path/to/multi_columns.scp")
>>> audio_data = loader["utterance_id_A"] # Load audio for a specific utterance
NOTE
The audio signal is normalized to the range of [-1, 1].
- Raises:RuntimeError – If there is an issue with loading the audio files or if sampling rates do not match when allow_multi_rates is False.