espnet2.fileio.multi_sound_scp.MultiSoundScpReader

About 4 min

espnet2.fileio.multi_sound_scp.MultiSoundScpReader

class espnet2.fileio.multi_sound_scp.MultiSoundScpReader(fname, dtype=None, always_2d: bool = False, stack_axis=0, pad=nan)

Bases: Mapping

Reader class for ‘wav.scp’ containing multiple sounds.

This class is useful when loading variable numbers of audio files for different samples. It reads a ‘wav.scp’ file where each line maps a unique key to multiple audio file paths. The audio files associated with a key are loaded and can be stacked along a specified axis, with support for padding to ensure uniform length.

fname

Path to the ‘wav.scp’ file.

Type: str

dtype

Data type for audio arrays, defaults to None.

Type: str or None

always_2d

If True, ensures that audio arrays are always 2D.

Type: bool

stack_axis

Axis along which to stack the audio arrays.

Type: int

pad

Value used for padding shorter arrays.

Type: float
Parameters:
- fname (str) – Path to the ‘wav.scp’ file.
- dtype (str or None) – Data type for audio arrays.
- always_2d (bool) – If True, ensures that audio arrays are always 2D.
- stack_axis (int) – Axis along which to stack the audio arrays.
- pad (float) – Value used for padding shorter arrays.
Returns: A tuple containing the sampling rate and the stacked audio arrays.
Return type: Tuple[int, np.ndarray]
Raises:KeyError – If the specified key does not exist in the data.

######

Example

wav.scp is a text file that looks like the following:

key1 /some/path/a1.wav /another/path/a2.wav /yet/another/path/a3.wav key2 /some/path/b1.wav /another/path/b2.wav key3 /some/path/c1.wav /another/path/c2.wav /yet/another/path/c3.wav key4 /some/path/d1.wav …

>>> reader = MultiSoundScpReader('wav.scp', stack_axis=0)
>>> rate, stacked_arrays = reader['key1']
>>> assert stacked_arrays.shape[0] == 3

######## NOTE All audios in each sample must have the same sampling rates. Audios of different lengths in each sample will be right-padded with np.nan to the same length.

get_path(key)

Retrieve the file paths associated with a given key.

This method looks up the specified key in the loaded data and returns the corresponding list of audio file paths. It is useful for accessing the raw paths without loading the audio data.

Parameters:key (str) – The key for which to retrieve the file paths.
Returns: A list of file paths associated with the specified key.
Return type: List[str]
Raises:KeyError – If the key is not found in the data.

######

Example

>>> reader = MultiSoundScpReader('wav.scp')
>>> paths = reader.get_path('key1')
>>> assert len(paths) == 3
>>> assert paths == ['/some/path/a1.wav',
...                  '/another/path/a2.wav',
...                  '/yet/another/path/a3.wav']

######## NOTE The returned paths correspond to the audio files listed under the specified key in the ‘wav.scp’ file.

keys()

Reader class for ‘wav.scp’ containing multiple sounds.

This class is useful for loading variable numbers of audio files associated with different samples. Each key in the ‘wav.scp’ file corresponds to a list of audio file paths that can be read and processed together.

The ‘wav.scp’ file should be formatted as follows:

key1 /some/path/a1.wav /another/path/a2.wav /yet/another/path/a3.wav key2 /some/path/b1.wav /another/path/b2.wav key3 /some/path/c1.wav /another/path/c2.wav /yet/another/path/c3.wav key4 /some/path/d1.wav …

Example

>>> reader = MultiSoundScpReader('wav.scp', stack_axis=0)
>>> rate, stacked_arrays = reader['key1']
>>> assert stacked_arrays.shape[0] == 3

######## NOTE All audio files in each sample must have the same sampling rates. Audio files of different lengths will be right-padded with np.nan to ensure they have the same length.

fname

The filename of the ‘wav.scp’ file.

Type: str

dtype

Data type of the audio arrays (e.g., ‘float32’).

Type: str or None

always_2d

If True, ensures all arrays are at least 2D.

Type: bool

stack_axis

The axis along which to stack the arrays.

Type: int

pad

Value used for padding shorter arrays.

Type: float

data

A dictionary mapping keys to lists of audio file paths.

Type: dict
Parameters:
- fname (str) – The filename of the ‘wav.scp’ file.
- dtype (str , optional) – Data type of the audio arrays. Defaults to None.
- always_2d (bool , optional) – If True, ensures all arrays are at least 2D. Defaults to False.
- stack_axis (int , optional) – The axis along which to stack the arrays. Defaults to 0.
- pad (float , optional) – Value used for padding shorter arrays. Defaults to np.nan.
Raises:
- KeyError – If the requested key is not found in the data.
- AssertionError – If the sampling rates of audio files do not match.

__getitem__(key)

Retrieves the audio data for the given key.

pad

_to_same_length(arrays, pad=np.nan, axis=0)

Pads arrays to the same length.

get_path(key)

Returns the list of audio file paths for the given key.

__contains__(item)

Checks if the item is a key in the data.

__len__()

Returns the number of keys in the data.

__iter__()

Returns an iterator over the keys in the data.

keys()

Returns the keys in the data.

pad

_to_same_length(arrays, pad=nan, axis=0)

Right-pad arrays to the same length.

This method takes a list of numpy arrays and pads them to the length of the longest array along the specified axis. The padding is done using the specified value.

Parameters:
- arrays (List *[*np.ndarray ]) – List of arrays to pad.
- pad (float) – Value to pad with. Defaults to np.nan.
- axis (int) – Axis along which to pad the arrays. Defaults to 0.
Returns: Padded array containing the input arrays stacked along the specified axis.
Return type: np.ndarray

######

Example

>>> a1 = np.array([1, 2, 3])
>>> a2 = np.array([4, 5])
>>> padded = pad_to_same_length([a1, a2], pad=0, axis=0)
>>> print(padded)
[[1 2 3]
 [4 5 0]]

######## NOTE This method assumes that the input arrays are all at least 1D and will raise an error if any array has a shape of 0 along the specified axis.

Raises:
- ValueError – If any array in the input list has a shape of 0
- along the specified axis. –