espnet3.components.dataset.DatasetWithTransform
Less than 1 minute
espnet3.components.dataset.DatasetWithTransform
class espnet3.components.dataset.DatasetWithTransform(dataset, transform, preprocessor, use_espnet_preprocessor=False)
Bases: object
Lightweight wrapper for applying a transform function to dataset items.
This class wraps a dataset and applies a user-defined transform followed by a preprocessor function. It also supports ESPnet-style UID handling, where the preprocessor receives both a UID and the sample.
- Parameters:
- dataset (Any) β A dataset implementing __getitem__ and __len__.
- transform (Callable) β A function applied to each sample before preprocessor.
- preprocessor (Callable) β A function applied after the transform. If use_espnet_preprocessor is True, it must accept (uid, sample) as arguments. Otherwise, it must accept a single sample.
- use_espnet_preprocessor (bool) β Whether to include the UID when calling the preprocessor. Required for ESPnetβs AbsPreprocessor compatibility.
Example
>>> def transform(sample):
... return {
... "text": sample["text"].upper()
... }
>>>
>>> def preprocessor(uid, sample):
... return {
... "text": f"[uid={uid}] " + sample["text"]
... }
>>>
>>> wrapped = DatasetWithTransform(
... my_dataset,
... transform,
... preprocessor,
... use_espnet_preprocessor=True
... )
>>> uid_sample = wrapped[0]
>>> print(uid_sample["text"])
[uid=0] HELLO- Raises:
- TypeError β If preprocessor is not callable.
- TypeError β If transform is not callable.
