espnet2.train.dataset.AdapterForSoundScpReader
espnet2.train.dataset.AdapterForSoundScpReader
class espnet2.train.dataset.AdapterForSoundScpReader(loader, dtype: str | None = None, allow_multi_rates: bool = False)
Bases: Mapping
Adapter for SoundScpReader that provides a mapping interface to access audio
data.
This adapter is designed to handle the output of SoundScpReader, allowing for flexible retrieval of audio samples while managing data types and sampling rates.
loader
The underlying loader instance, typically a SoundScpReader.
dtype
Desired data type for the audio samples. If None, no conversion is done.
rate
The current sampling rate of the audio data.
allow_multi_rates
If True, allows audio samples with different sampling rates.
- Parameters:
- loader – An instance of SoundScpReader or similar loader.
- dtype (Union *[*None , str ]) – Data type to which the audio samples should be cast.
- allow_multi_rates (bool) – Whether to allow multiple sampling rates in the data.
- Returns: The audio data corresponding to the requested key.
- Return type: np.ndarray
- Raises:
- RuntimeError – If there is a mismatch in sampling rates when allow_multi_rates
- is set to False or if an unexpected data type is encountered. –
####### Examples
>>> from espnet2.fileio.sound_scp import SoundScpReader
>>> loader = SoundScpReader('path/to/wav.scp')
>>> adapter = AdapterForSoundScpReader(loader, dtype='float32')
>>> audio_data = adapter['utterance_id_A']
>>> print(audio_data.shape)
(NSample, Channel)
NOTE
This adapter assumes that the underlying loader returns either a tuple of (sampling rate, audio array) or just the audio array directly.
keys()
AdapterForSoundScpReader class to adapt the SoundScpReader for use as a mapping.
This class acts as a wrapper around the SoundScpReader, enabling it to behave like a dictionary. It provides access to audio data stored in a sound SCP file, handling potential issues with different sampling rates and data types.
loader
An instance of the loader that provides access to audio data.
dtype
The desired data type for the audio arrays (e.g., ‘float32’).
rate
The sampling rate of the audio data.
allow_multi_rates
A flag indicating if multiple sampling rates are allowed.
- Parameters:
- loader – The loader instance that retrieves audio data.
- dtype (Union *[*None , str ]) – Optional; desired data type for audio arrays.
- allow_multi_rates (bool) – Optional; if True, allows multiple sampling rates.
- Returns: The audio data corresponding to the provided key.
- Return type: np.ndarray
####### Examples
>>> adapter = AdapterForSoundScpReader(loader)
>>> keys = adapter.keys() # Access keys in the loader
>>> audio_data = adapter['utterance_id_A'] # Retrieve audio data for an ID
- Raises:
- RuntimeError – If the retrieved data is of an unexpected type or if there
- is a mismatch in sampling rates when allow_multi_rates is False. –