espnet2.iterators.category_iter_factory.CategoryIterFactory
espnet2.iterators.category_iter_factory.CategoryIterFactory
class espnet2.iterators.category_iter_factory.CategoryIterFactory(dataset, batches: AbsSampler | Sequence[Sequence[Any]], num_iters_per_epoch: int = None, seed: int = 0, sampler_args: dict = None, shuffle: bool = False, num_workers: int = 0, collate_fn=None, pin_memory: bool = False)
Bases: AbsIterFactory
Build iterator for each epoch.
This class creates a PyTorch DataLoader with the following features:
- The random seed is determined by the epoch number, ensuring reproducibility
when resuming training.
- The number of samples for each epoch can be restricted, controlling the interval between training and evaluation.
dataset
The dataset to be used by the DataLoader.
- Type: Any
sampler
An instance of a sampler that defines how to sample data from the dataset.
- Type:AbsSampler
num_iters_per_epoch
The number of iterations to run per epoch.
- Type: int
sampler
Arguments to configure the sampler.
- Type: dict
shuffle
Whether to shuffle the data at the beginning of each epoch.
- Type: bool
seed
The seed for random number generation.
- Type: int
num_workers
The number of subprocesses to use for data loading.
- Type: int
collate_fn
Function to merge a list of samples to form a mini-batch.
- Type: callable
pin_memory
If True, the data loader will copy Tensors into CUDA pinned memory before returning them.
Type: bool
Parameters:
- dataset (Any) – The dataset from which to load the data.
- batches (Union [AbsSampler , Sequence *[*Sequence *[*Any ] ] ]) – Either a sampler or a sequence of batches to use for sampling.
- num_iters_per_epoch (int , optional) – The number of iterations to run per epoch. Defaults to None.
- seed (int , optional) – The random seed. Defaults to 0.
- sampler_args (dict , optional) – Additional arguments for the sampler. Defaults to None.
- shuffle (bool , optional) – Whether to shuffle the dataset. Defaults to False.
- num_workers (int , optional) – Number of worker processes for data loading. Defaults to 0.
- collate_fn (callable , optional) – Function to merge a list of samples to form a mini-batch. Defaults to None.
- pin_memory (bool , optional) – If True, data loader will copy Tensors into CUDA pinned memory. Defaults to False.
Returns: A PyTorch DataLoader instance configured for the dataset and sampler.
Return type: DataLoader
####### Examples
Create an instance of CategoryIterFactory
iter_factory = CategoryIterFactory(
dataset=my_dataset, batches=my_batches, num_iters_per_epoch=100, seed=42, shuffle=True
)
Build the DataLoader for epoch 1
data_loader = iter_factory.build_iter(epoch=1)
NOTE
This class is designed to work with PyTorch DataLoader and provides additional control over the sampling process.
- Raises:
- RuntimeError – If the batch size is less than the world size during
- distributed training. –
build_iter(epoch: int, shuffle: bool | None = None) → DataLoader
Build iterator for each epoch.
This class simply creates a PyTorch DataLoader with the following features:
- The random seed is determined according to the number of epochs,
ensuring reproducibility when resuming training.
- It allows restriction on the number of samples for one epoch, controlling the interval between training and evaluation.
dataset
The dataset to be loaded.
- Type: Any
sampler
The sampler to sample batches from the dataset.
- Type:AbsSampler
num_iters_per_epoch
The number of iterations per epoch.
- Type: int, optional
sampler
Arguments for the sampler.
- Type: dict, optional
shuffle
Whether to shuffle the data at every epoch.
- Type: bool
seed
Random seed for reproducibility.
- Type: int
num_workers
Number of workers for data loading.
- Type: int
collate_fn
Function to collate data samples.
- Type: callable, optional
pin_memory
Whether to pin memory for faster data transfer.
Type: bool
Parameters:
- dataset (Any) – The dataset to be used for the iterator.
- batches (Union [AbsSampler , Sequence *[*Sequence *[*Any ] ] ]) – Batches of data or an instance of AbsSampler.
- num_iters_per_epoch (int , optional) – Number of iterations per epoch.
- seed (int , optional) – Seed for random number generation. Default is 0.
- sampler_args (dict , optional) – Additional arguments for the sampler.
- shuffle (bool , optional) – Whether to shuffle the dataset. Default is False.
- num_workers (int , optional) – Number of worker threads for loading data.
- collate_fn (callable , optional) – Function to merge a list of samples into a batch.
- pin_memory (bool , optional) – If True, the data loader will copy tensors into CUDA pinned memory before returning them.
Returns: A PyTorch DataLoader for the specified dataset and batches.
Return type: DataLoader
####### Examples
factory = CategoryIterFactory(dataset=my_dataset, batches=my_batches, : num_iters_per_epoch=100, seed=42)
dataloader = factory.build_iter(epoch=1, shuffle=True)
- Raises:RuntimeError – If the batch size is less than the world size in distributed training.