espnet3.components.data.dataset.DatasetWithTransform
Less than 1 minute
espnet3.components.data.dataset.DatasetWithTransform
class espnet3.components.data.dataset.DatasetWithTransform(dataset, transform, preprocessor, use_espnet_preprocessor=False)
Bases: object
Lightweight wrapper for applying a transform function to dataset items.
This class wraps a dataset and applies a user-defined transform followed by a preprocessor function. It also supports ESPnet-style UID handling, where the preprocessor receives both a UID and the sample.
- Parameters:
- dataset (Any) – A dataset implementing
__getitem__and__len__. - transform (Callable) – A function applied to each sample before preprocessor.
- preprocessor (Callable) – A function applied after the transform. If
use_espnet_preprocessoris True, it must accept(uid, sample)as arguments. Otherwise, it must accept a singlesample. - use_espnet_preprocessor (bool) – Whether to include the UID when calling the preprocessor. Required for ESPnet’s
AbsPreprocessorcompatibility.
- dataset (Any) – A dataset implementing
Example
>>> def transform(sample):
... return {
... "text": sample["text"].upper()
... }
>>>
>>> def preprocess(uid, sample):
... return {
... "text": f"[uid={uid}] " + sample["text"]
... }
>>>
>>> wrapped = DatasetWithTransform(
... my_dataset,
... transform,
... preprocess,
... use_espnet_preprocessor=True
... )
>>> uid_sample = wrapped[0]
>>> print(uid_sample["text"])
[uid=0] HELLO- Raises:
- TypeError – If
preprocessoris not callable. - TypeError – If
transformis not callable.
- TypeError – If
Initialize DatasetWithTransform.
