espnet2.train.dataset.variable_columns_sound_loader
Less than 1 minute
espnet2.train.dataset.variable_columns_sound_loader
espnet2.train.dataset.variable_columns_sound_loader(path, float_dtype=None, allow_multi_rates=False)
Load audio files with variable numbers of columns from a specified path.
The function reads a file containing lines formatted as: : utterance_id_A /some/where/a1.wav /some/where/a2.wav /some/where/a3.wav utterance_id_B /some/where/b1.flac /some/where/b2.flac
Each line corresponds to an utterance identifier followed by paths to audio files. The audio signals are normalized to a range of [-1, 1].
- Parameters:
- path (str) – The path to the input file containing audio file paths.
- float_dtype (Optional *[*str ]) – The desired data type for the audio data. Default is None, which keeps the original data type.
- allow_multi_rates (bool) – Flag to allow multiple sampling rates. If False, an error is raised if different sampling rates are detected. Default is False.
- Returns: An adapter that provides access to the loaded audio data.
- Return type:AdapterForSoundScpReader
- Raises:RuntimeError – If the input file format is incorrect or if the audio files cannot be loaded.
Examples
>>> loader = variable_columns_sound_loader("path/to/your/file.scp")
>>> audio_data = loader["utterance_id_A"]
>>> print(audio_data) # Output will be a numpy array of audio samples
NOTE
The audio files must be accessible and correctly formatted as specified above.