espnet2.train.dataset.AdapterForSoundScpReader

About 2 min

espnet2.train.dataset.AdapterForSoundScpReader

class espnet2.train.dataset.AdapterForSoundScpReader(loader, dtype: str | None = None, allow_multi_rates: bool = False)

Bases: Mapping

Adapter for SoundScpReader that provides a mapping interface to access audio

data.

This adapter is designed to handle the output of SoundScpReader, allowing for flexible retrieval of audio samples while managing data types and sampling rates.

loader

The underlying loader instance, typically a SoundScpReader.

dtype

Desired data type for the audio samples. If None, no conversion is done.

rate

The current sampling rate of the audio data.

allow_multi_rates

If True, allows audio samples with different sampling rates.

Parameters:
- loader – An instance of SoundScpReader or similar loader.
- dtype (Union *[*None , str ]) – Data type to which the audio samples should be cast.
- allow_multi_rates (bool) – Whether to allow multiple sampling rates in the data.
Returns: The audio data corresponding to the requested key.
Return type: np.ndarray
Raises:
- RuntimeError – If there is a mismatch in sampling rates when allow_multi_rates
- is set to False or if an unexpected data type is encountered. –

####### Examples

>>> from espnet2.fileio.sound_scp import SoundScpReader
>>> loader = SoundScpReader('path/to/wav.scp')
>>> adapter = AdapterForSoundScpReader(loader, dtype='float32')
>>> audio_data = adapter['utterance_id_A']
>>> print(audio_data.shape)
(NSample, Channel)

NOTE

This adapter assumes that the underlying loader returns either a tuple of (sampling rate, audio array) or just the audio array directly.

keys()

AdapterForSoundScpReader class to adapt the SoundScpReader for use as a mapping.

This class acts as a wrapper around the SoundScpReader, enabling it to behave like a dictionary. It provides access to audio data stored in a sound SCP file, handling potential issues with different sampling rates and data types.

loader

An instance of the loader that provides access to audio data.

dtype

The desired data type for the audio arrays (e.g., ‘float32’).

rate

The sampling rate of the audio data.

allow_multi_rates

A flag indicating if multiple sampling rates are allowed.

Parameters:
- loader – The loader instance that retrieves audio data.
- dtype (Union *[*None , str ]) – Optional; desired data type for audio arrays.
- allow_multi_rates (bool) – Optional; if True, allows multiple sampling rates.
Returns: The audio data corresponding to the provided key.
Return type: np.ndarray

####### Examples

>>> adapter = AdapterForSoundScpReader(loader)
>>> keys = adapter.keys()  # Access keys in the loader
>>> audio_data = adapter['utterance_id_A']  # Retrieve audio data for an ID

Raises:
- RuntimeError – If the retrieved data is of an unexpected type or if there
- is a mismatch in sampling rates when allow_multi_rates is False. –