espnet2.train.dataset.variable_columns_sound_loader

Less than 1 minute

espnet2.train.dataset.variable_columns_sound_loader

espnet2.train.dataset.variable_columns_sound_loader(path, float_dtype=None, allow_multi_rates=False)

Load audio files with variable numbers of columns from a specified path.

The function reads a file containing lines formatted as: : utterance_id_A /some/where/a1.wav /some/where/a2.wav /some/where/a3.wav utterance_id_B /some/where/b1.flac /some/where/b2.flac

Each line corresponds to an utterance identifier followed by paths to audio files. The audio signals are normalized to a range of [-1, 1].

Parameters:
- path (str) – The path to the input file containing audio file paths.
- float_dtype (Optional *[*str ]) – The desired data type for the audio data. Default is None, which keeps the original data type.
- allow_multi_rates (bool) – Flag to allow multiple sampling rates. If False, an error is raised if different sampling rates are detected. Default is False.
Returns: An adapter that provides access to the loaded audio data.
Return type:AdapterForSoundScpReader
Raises:RuntimeError – If the input file format is incorrect or if the audio files cannot be loaded.

Examples

>>> loader = variable_columns_sound_loader("path/to/your/file.scp")
>>> audio_data = loader["utterance_id_A"]
>>> print(audio_data)  # Output will be a numpy array of audio samples

NOTE

The audio files must be accessible and correctly formatted as specified above.