espnet2.train.dataset.ESPnetDataset

About 3 min

espnet2.train.dataset.ESPnetDataset

class espnet2.train.dataset.ESPnetDataset(path_name_type_list: Collection[Tuple[str, str, str]], preprocess: Callable[[str, Dict[str, ndarray]], Dict[str, ndarray]] | None = None, float_dtype: str = 'float32', int_dtype: str = 'long', max_cache_size: float | int | str = 0.0, max_cache_fd: int = 0, allow_multi_rates: bool = False)

Bases: AbsDataset

Pytorch Dataset class for ESPNet.

This class provides an interface for loading and managing datasets in the ESPNet framework. It supports various types of data loaders and allows for preprocessing of the loaded data.

loader_dict (Dict[str, Mapping[str, Union[np.ndarray, torch.Tensor,

str, numbers.Number]]]): A dictionary mapping data names to their respective loaders.

preprocess (Optional[Callable[[str, Dict[str, np.ndarray]],

Dict[str, np.ndarray]]]): An optional preprocessing function that can be applied to the data after loading.

float_dtype

The data type for floating point values.

Type: str

int_dtype

The data type for integer values.

Type: str

max_cache_size

The maximum size of the cache.

Type: Union[float, int, str]

max_cache_fd

The maximum number of file descriptors to cache.

Type: int

allow_multi_rates

Whether to allow audio data with different sampling rates.

Type: bool
Parameters:
- path_name_type_list (Collection *[*Tuple *[*str , str , str ] ]) – A list of tuples, each containing the path to the data file, the name for that data, and the type of the data.
- **(Optional[Callable[****[**str (preprocess) – Dict[str, np.ndarray]]]): Optional preprocessing function.
- Dict**[**str – Dict[str, np.ndarray]]]): Optional preprocessing function.
- np.ndarray**]****]** – Dict[str, np.ndarray]]]): Optional preprocessing function.

:param : Dict[str, np.ndarray]]]): Optional preprocessing function. :param float_dtype: Data type for floating point values (default: “float32”). :type float_dtype: str :param int_dtype: Data type for integer values (default: “long”). :type int_dtype: str :param max_cache_size: Maximum cache size (default: 0.0). :type max_cache_size: Union[float, int, str] :param max_cache_fd: Maximum number of cached file descriptors (default: 0). :type max_cache_fd: int :param allow_multi_rates: Allow multiple sampling rates for audio (default: False). :type allow_multi_rates: bool

Raises:
- ValueError – If path_name_type_list is empty.
- RuntimeError – If a name is duplicated in path_name_type_list or if any path has no samples.

######### Examples

>>> dataset = ESPnetDataset([('wav.scp', 'input', 'sound'),
...                          ('token_int', 'output', 'text_int')],
...                         )
>>> uttid, data = dataset['uttid']
>>> print(data)
{'input': per_utt_array, 'output': per_utt_array}

NOTE

Ensure that the data types specified are compatible with the data being loaded.

has_name(name) → bool

Checks if a given name exists in the dataset.

This method verifies whether the specified name is present among the dataset’s loaders.

Parameters:name (str) – The name to check for existence in the dataset.
Returns: True if the name exists in the dataset, False otherwise.
Return type: bool

######### Examples

>>> dataset = ESPnetDataset([('wav.scp', 'input', 'sound'),
...                          ('token_int', 'output', 'text_int')])
>>> dataset.has_name('input')
True
>>> dataset.has_name('output')
True
>>> dataset.has_name('non_existent')
False

names() → Tuple[str, ...]

Pytorch Dataset class for ESPNet.

This class allows loading and processing of various types of datasets, including audio, text, and numerical data. It provides a unified interface for accessing the data and applying preprocessing.

Parameters:
- path_name_type_list (Collection *[*Tuple *[*str , str , str ] ]) – A list of tuples where each tuple contains the path to the dataset file, the name of the dataset, and the type of data (e.g., ‘sound’, ‘text_int’).
- **(Optional[Callable[****[**str (preprocess) – Dict[str, np.ndarray]]]): A function for preprocessing the data after loading.
- Dict**[**str – Dict[str, np.ndarray]]]): A function for preprocessing the data after loading.
- np.ndarray**]****]** – Dict[str, np.ndarray]]]): A function for preprocessing the data after loading.

:param : after loading.

Parameters:
- float_dtype (str) – The desired data type for floating point values (default: “float32”).
- int_dtype (str) – The desired data type for integer values (default: “long”).
- max_cache_size (Union *[*float , int , str ]) – Maximum cache size for caching loaded data (default: 0.0).
- max_cache_fd (int) – Maximum number of file descriptors for caching (default: 0).
- allow_multi_rates (bool) – Flag to allow audio data with different sampling rates (default: False).
Raises:
- ValueError – If the path_name_type_list is empty.
- RuntimeError – If there are duplicated names in the dataset or if a loader type is not supported.

######### Examples

>>> dataset = ESPnetDataset([('wav.scp', 'input', 'sound'),
...                          ('token_int', 'output', 'text_int')],
...                         )
>>> uttid, data = dataset['uttid']
>>> # Access input and output data
>>> input_data = data['input']
>>> output_data = data['output']