espnet2.train.dataset.AbsDataset
espnet2.train.dataset.AbsDataset
class espnet2.train.dataset.AbsDataset
Bases: Dataset
, ABC
Abstract base class for dataset in ESPnet.
This class defines the basic interface for datasets used in ESPnet. It requires implementation of methods for checking dataset names, retrieving dataset names, and accessing individual items by unique ID.
None
has_name(name)
Checks if the dataset contains a specific name.
names()
Returns a tuple of all dataset names.
__getitem__(uid)
Retrieves the data corresponding to the given unique ID.
- Raises:NotImplementedError – If any of the abstract methods are called without an implementation.
######### Examples
This is an abstract class and cannot be instantiated directly. Derived classes must implement the abstract methods.
class MyDataset(AbsDataset): : def has_name(self, name): : … <br/> def names(self): : … <br/> def __getitem__(self, uid): : …
abstract has_name(name) → bool
Checks if a dataset has a specific name.
- Parameters:name (str) – The name to check for existence in the dataset.
- Returns: True if the name exists in the dataset, False otherwise.
- Return type: bool
######### Examples
>>> dataset = ESPnetDataset([('wav.scp', 'input', 'sound')])
>>> dataset.has_name('input')
True
>>> dataset.has_name('output')
False
abstract names() → Tuple[str, ...]
AdapterForSoundScpReader class provides a mapping interface to access audio data
from a Sound SCP file format. This adapter ensures that audio data can be accessed in a unified manner, while also managing potential issues with varying sampling rates.
loader
The underlying loader that retrieves audio data.
dtype
The desired data type of the audio array (e.g., ‘float32’).
- Type: str or None
rate
The sampling rate of the audio.
- Type: int or None
allow_multi_rates
Flag to allow different sampling rates.
Type: bool
Parameters:
- loader – A loader instance that implements the required interface.
- dtype – Optional; the data type for the audio array.
- allow_multi_rates – Optional; whether to allow audio files with different rates.
Returns: The audio data corresponding to the provided key.
Return type: np.ndarray
######### Examples
>>> audio_reader = AdapterForSoundScpReader(loader)
>>> audio_data = audio_reader['utterance_id_A']
>>> print(audio_data.shape)
(NSample, Channel) or (Nsample,)
- Raises:
- RuntimeError – If the data format is unexpected or if sampling rates are
- mismatched. –