espnet2.train.dataset.sound_loader

Less than 1 minute

espnet2.train.dataset.sound_loader

espnet2.train.dataset.sound_loader(path, float_dtype=None, multi_columns=False, allow_multi_rates=False)

Load sound data from a specified file path.

This function reads a sound file list where each line contains an utterance ID and the corresponding audio file path. The audio signal is normalized to the range [-1, 1]. It uses the SoundScpReader class to handle the file loading and returns an adapter for sound data.

The expected format of the input file is as follows: : utterance_id_A /some/where/a.wav utterance_id_B /some/where/a.flac

Parameters:
- path (str) – The path to the sound file list.
- float_dtype (str , optional) – The data type to which the audio array should be cast (e.g., ‘float32’). If None, no casting is performed.
- multi_columns (bool , optional) – If True, enables loading of audio files that are organized in multiple columns in the file.
- allow_multi_rates (bool , optional) – If True, allows loading of audio files with different sampling rates. Otherwise, raises an error if mismatched rates are detected.
Returns: An adapter that allows access to the loaded sound data.
Return type:AdapterForSoundScpReader
Raises:
- RuntimeError – If the audio file paths are not correctly formatted
- or if there are mismatched sampling rates when allow_multi_rates –
- is set to False. –

Examples

>>> sound_data = sound_loader('path/to/wav.scp', float_dtype='float32')
>>> audio_array = sound_data['utterance_id_A']
>>> print(audio_array.shape)
(N,)

NOTE

The SoundScpReader does not support pipe-fashion inputs, such as “cat a.wav

”.