espnet2.train.dataset.AbsDataset

About 1 min

espnet2.train.dataset.AbsDataset

class espnet2.train.dataset.AbsDataset

Bases: Dataset, ABC

Abstract base class for dataset in ESPnet.

This class defines the basic interface for datasets used in ESPnet. It requires implementation of methods for checking dataset names, retrieving dataset names, and accessing individual items by unique ID.

None

has_name(name)

Checks if the dataset contains a specific name.

names()

Returns a tuple of all dataset names.

__getitem__(uid)

Retrieves the data corresponding to the given unique ID.

Raises:NotImplementedError – If any of the abstract methods are called without an implementation.

######### Examples

This is an abstract class and cannot be instantiated directly. Derived classes must implement the abstract methods.

class MyDataset(AbsDataset): : def has_name(self, name): : … <br/> def names(self): : … <br/> def __getitem__(self, uid): : …

abstract has_name(name) → bool

Checks if a dataset has a specific name.

Parameters:name (str) – The name to check for existence in the dataset.
Returns: True if the name exists in the dataset, False otherwise.
Return type: bool

######### Examples

>>> dataset = ESPnetDataset([('wav.scp', 'input', 'sound')])
>>> dataset.has_name('input')
True
>>> dataset.has_name('output')
False

abstract names() → Tuple[str, ...]

AdapterForSoundScpReader class provides a mapping interface to access audio data

from a Sound SCP file format. This adapter ensures that audio data can be accessed in a unified manner, while also managing potential issues with varying sampling rates.

loader

The underlying loader that retrieves audio data.

dtype

The desired data type of the audio array (e.g., ‘float32’).

Type: str or None

rate

The sampling rate of the audio.

Type: int or None

allow_multi_rates

Flag to allow different sampling rates.

Type: bool
Parameters:
- loader – A loader instance that implements the required interface.
- dtype – Optional; the data type for the audio array.
- allow_multi_rates – Optional; whether to allow audio files with different rates.
Returns: The audio data corresponding to the provided key.
Return type: np.ndarray

######### Examples

>>> audio_reader = AdapterForSoundScpReader(loader)
>>> audio_data = audio_reader['utterance_id_A']
>>> print(audio_data.shape)
(NSample, Channel) or (Nsample,)

Raises:
- RuntimeError – If the data format is unexpected or if sampling rates are
- mismatched. –