espnet2.iterators.category_iter_factory.CategoryIterFactory

About 3 min

espnet2.iterators.category_iter_factory.CategoryIterFactory

class espnet2.iterators.category_iter_factory.CategoryIterFactory(dataset, batches: AbsSampler | Sequence[Sequence[Any]], num_iters_per_epoch: int = None, seed: int = 0, sampler_args: dict = None, shuffle: bool = False, num_workers: int = 0, collate_fn=None, pin_memory: bool = False)

Bases: AbsIterFactory

Build iterator for each epoch.

This class creates a PyTorch DataLoader with the following features:

The random seed is determined by the epoch number, ensuring reproducibility

when resuming training.

The number of samples for each epoch can be restricted, controlling the interval between training and evaluation.

dataset

The dataset to be used by the DataLoader.

Type: Any

sampler

An instance of a sampler that defines how to sample data from the dataset.

Type:AbsSampler

num_iters_per_epoch

The number of iterations to run per epoch.

Type: int

sampler

_args

Arguments to configure the sampler.

Type: dict

shuffle

Whether to shuffle the data at the beginning of each epoch.

Type: bool

seed

The seed for random number generation.

Type: int

num_workers

The number of subprocesses to use for data loading.

Type: int

collate_fn

Function to merge a list of samples to form a mini-batch.

Type: callable

pin_memory

If True, the data loader will copy Tensors into CUDA pinned memory before returning them.

Type: bool
Parameters:
- dataset (Any) – The dataset from which to load the data.
- batches (Union [AbsSampler , Sequence *[*Sequence *[*Any ] ] ]) – Either a sampler or a sequence of batches to use for sampling.
- num_iters_per_epoch (int , optional) – The number of iterations to run per epoch. Defaults to None.
- seed (int , optional) – The random seed. Defaults to 0.
- sampler_args (dict , optional) – Additional arguments for the sampler. Defaults to None.
- shuffle (bool , optional) – Whether to shuffle the dataset. Defaults to False.
- num_workers (int , optional) – Number of worker processes for data loading. Defaults to 0.
- collate_fn (callable , optional) – Function to merge a list of samples to form a mini-batch. Defaults to None.
- pin_memory (bool , optional) – If True, data loader will copy Tensors into CUDA pinned memory. Defaults to False.
Returns: A PyTorch DataLoader instance configured for the dataset and sampler.
Return type: DataLoader

####### Examples

Create an instance of CategoryIterFactory

iter_factory = CategoryIterFactory(

dataset=my_dataset, batches=my_batches, num_iters_per_epoch=100, seed=42, shuffle=True

)

Build the DataLoader for epoch 1

data_loader = iter_factory.build_iter(epoch=1)

NOTE

This class is designed to work with PyTorch DataLoader and provides additional control over the sampling process.

Raises:
- RuntimeError – If the batch size is less than the world size during
- distributed training. –

build_iter(epoch: int, shuffle: bool | None = None) → DataLoader

Build iterator for each epoch.

This class simply creates a PyTorch DataLoader with the following features:

The random seed is determined according to the number of epochs,

ensuring reproducibility when resuming training.

It allows restriction on the number of samples for one epoch, controlling the interval between training and evaluation.

dataset

The dataset to be loaded.

Type: Any

sampler

The sampler to sample batches from the dataset.

Type:AbsSampler

num_iters_per_epoch

The number of iterations per epoch.

Type: int, optional

sampler

_args

Arguments for the sampler.

Type: dict, optional

shuffle

Whether to shuffle the data at every epoch.

Type: bool

seed

Random seed for reproducibility.

Type: int

num_workers

Number of workers for data loading.

Type: int

collate_fn

Function to collate data samples.

Type: callable, optional

pin_memory

Whether to pin memory for faster data transfer.

Type: bool
Parameters:
- dataset (Any) – The dataset to be used for the iterator.
- batches (Union [AbsSampler , Sequence *[*Sequence *[*Any ] ] ]) – Batches of data or an instance of AbsSampler.
- num_iters_per_epoch (int , optional) – Number of iterations per epoch.
- seed (int , optional) – Seed for random number generation. Default is 0.
- sampler_args (dict , optional) – Additional arguments for the sampler.
- shuffle (bool , optional) – Whether to shuffle the dataset. Default is False.
- num_workers (int , optional) – Number of worker threads for loading data.
- collate_fn (callable , optional) – Function to merge a list of samples into a batch.
- pin_memory (bool , optional) – If True, the data loader will copy tensors into CUDA pinned memory before returning them.
Returns: A PyTorch DataLoader for the specified dataset and batches.
Return type: DataLoader

####### Examples

factory = CategoryIterFactory(dataset=my_dataset, batches=my_batches, : num_iters_per_epoch=100, seed=42)

dataloader = factory.build_iter(epoch=1, shuffle=True)

Raises:RuntimeError – If the batch size is less than the world size in distributed training.