espnet2.train.dataset.kaldi_loader

Less than 1 minute

espnet2.train.dataset.kaldi_loader

espnet2.train.dataset.kaldi_loader(path, float_dtype=None, max_cache_fd: int = 0, allow_multi_rates=False)

Load audio data from a Kaldi-style SCP file.

This function reads audio data from a specified Kaldi SCP file and returns an adapter that allows for easy access to the loaded sound data. The audio signal is normalized to the range of [-1, 1]. The data can be loaded with support for different data types and sampling rates.

Parameters:
- path (str) – The path to the Kaldi SCP file.
- float_dtype (str , optional) – The desired data type for the audio array. Defaults to None, which means no type conversion will be applied.
- max_cache_fd (int , optional) – The maximum number of file descriptors to cache. Defaults to 0 (no caching).
- allow_multi_rates (bool , optional) – If True, allows audio samples with different sampling rates. Defaults to False.
Returns: An adapter that provides access to the loaded sound data.
Return type:AdapterForSoundScpReader
Raises:
- RuntimeError – If there are issues loading the data or if the data
- format is unexpected. –

Examples

>>> adapter = kaldi_loader("path/to/scp_file.scp")
>>> audio_data = adapter["utterance_id_A"]
>>> print(audio_data.shape)
(num_samples, )

NOTE

This function is intended for use with audio data in Kaldi’s format, which may include various audio file types and sampling rates.