espnet2.train.dataset.ESPnetDataset
espnet2.train.dataset.ESPnetDataset
class espnet2.train.dataset.ESPnetDataset(path_name_type_list: Collection[Tuple[str, str, str]], preprocess: Callable[[str, Dict[str, ndarray]], Dict[str, ndarray]] | None = None, float_dtype: str = 'float32', int_dtype: str = 'long', max_cache_size: float | int | str = 0.0, max_cache_fd: int = 0, allow_multi_rates: bool = False)
Bases: AbsDataset
Pytorch Dataset class for ESPNet.
This class provides an interface for loading and managing datasets in the ESPNet framework. It supports various types of data loaders and allows for preprocessing of the loaded data.
loader_dict (Dict[str, Mapping[str, Union[np.ndarray, torch.Tensor,
str, numbers.Number]]]): A dictionary mapping data names to their respective loaders.
preprocess (Optional[Callable[[str, Dict[str, np.ndarray]],
Dict[str, np.ndarray]]]): An optional preprocessing function that can be applied to the data after loading.
float_dtype
The data type for floating point values.
- Type: str
int_dtype
The data type for integer values.
- Type: str
max_cache_size
The maximum size of the cache.
- Type: Union[float, int, str]
max_cache_fd
The maximum number of file descriptors to cache.
- Type: int
allow_multi_rates
Whether to allow audio data with different sampling rates.
Type: bool
Parameters:
- path_name_type_list (Collection *[*Tuple *[*str , str , str ] ]) – A list of tuples, each containing the path to the data file, the name for that data, and the type of the data.
- **(Optional[Callable[****[**str (preprocess) – Dict[str, np.ndarray]]]): Optional preprocessing function.
- Dict**[**str – Dict[str, np.ndarray]]]): Optional preprocessing function.
- np.ndarray**]****]** – Dict[str, np.ndarray]]]): Optional preprocessing function.
:param : Dict[str, np.ndarray]]]): Optional preprocessing function. :param float_dtype: Data type for floating point values (default: “float32”). :type float_dtype: str :param int_dtype: Data type for integer values (default: “long”). :type int_dtype: str :param max_cache_size: Maximum cache size (default: 0.0). :type max_cache_size: Union[float, int, str] :param max_cache_fd: Maximum number of cached file descriptors (default: 0). :type max_cache_fd: int :param allow_multi_rates: Allow multiple sampling rates for audio (default: False). :type allow_multi_rates: bool
- Raises:
- ValueError – If path_name_type_list is empty.
- RuntimeError – If a name is duplicated in path_name_type_list or if any path has no samples.
######### Examples
>>> dataset = ESPnetDataset([('wav.scp', 'input', 'sound'),
... ('token_int', 'output', 'text_int')],
... )
>>> uttid, data = dataset['uttid']
>>> print(data)
{'input': per_utt_array, 'output': per_utt_array}
NOTE
Ensure that the data types specified are compatible with the data being loaded.
has_name(name) → bool
Checks if a given name exists in the dataset.
This method verifies whether the specified name is present among the dataset’s loaders.
- Parameters:name (str) – The name to check for existence in the dataset.
- Returns: True if the name exists in the dataset, False otherwise.
- Return type: bool
######### Examples
>>> dataset = ESPnetDataset([('wav.scp', 'input', 'sound'),
... ('token_int', 'output', 'text_int')])
>>> dataset.has_name('input')
True
>>> dataset.has_name('output')
True
>>> dataset.has_name('non_existent')
False
names() → Tuple[str, ...]
Pytorch Dataset class for ESPNet.
This class allows loading and processing of various types of datasets, including audio, text, and numerical data. It provides a unified interface for accessing the data and applying preprocessing.
- Parameters:
- path_name_type_list (Collection *[*Tuple *[*str , str , str ] ]) – A list of tuples where each tuple contains the path to the dataset file, the name of the dataset, and the type of data (e.g., ‘sound’, ‘text_int’).
- **(Optional[Callable[****[**str (preprocess) – Dict[str, np.ndarray]]]): A function for preprocessing the data after loading.
- Dict**[**str – Dict[str, np.ndarray]]]): A function for preprocessing the data after loading.
- np.ndarray**]****]** – Dict[str, np.ndarray]]]): A function for preprocessing the data after loading.
:param : after loading.
- Parameters:
- float_dtype (str) – The desired data type for floating point values (default: “float32”).
- int_dtype (str) – The desired data type for integer values (default: “long”).
- max_cache_size (Union *[*float , int , str ]) – Maximum cache size for caching loaded data (default: 0.0).
- max_cache_fd (int) – Maximum number of file descriptors for caching (default: 0).
- allow_multi_rates (bool) – Flag to allow audio data with different sampling rates (default: False).
- Raises:
- ValueError – If the path_name_type_list is empty.
- RuntimeError – If there are duplicated names in the dataset or if a loader type is not supported.
######### Examples
>>> dataset = ESPnetDataset([('wav.scp', 'input', 'sound'),
... ('token_int', 'output', 'text_int')],
... )
>>> uttid, data = dataset['uttid']
>>> # Access input and output data
>>> input_data = data['input']
>>> output_data = data['output']