espnet2.iterators.sequence_iter_factory.SequenceIterFactory

About 3 min

espnet2.iterators.sequence_iter_factory.SequenceIterFactory

class espnet2.iterators.sequence_iter_factory.SequenceIterFactory(dataset, batches: AbsSampler | Sequence[Sequence[Any]], num_iters_per_epoch: int | None = None, seed: int = 0, shuffle: bool = False, shuffle_within_batch: bool = False, num_workers: int = 0, collate_fn=None, pin_memory: bool = False)

Bases: AbsIterFactory

Build iterator for each epoch.

This class creates a PyTorch DataLoader with additional features for reproducibility and control over the number of samples per epoch. The following points are noteworthy:

The random seed is determined based on the epoch number, ensuring reproducibility when resuming training.
It allows restriction on the number of samples for each epoch, facilitating control over the interval between training and evaluation.

dataset

The dataset to be used for the DataLoader.

Type: Any

sampler

An instance of a sampler that defines the order in which data is drawn.

Type:AbsSampler

num_iters_per_epoch

The number of iterations to perform in one epoch.

Type: Optional[int]

seed

The seed for random number generation.

Type: int

shuffle

Whether to shuffle the data at the beginning of each epoch.

Type: bool

shuffle

_within_batch

Whether to shuffle the data within each batch.

Type: bool

num_workers

The number of worker processes for data loading.

Type: int

collate_fn

Function to merge a list of samples into a batch.

Type: Optional[callable]

pin_memory

If True, the data loader will copy Tensors into pinned memory before returning them.

Type: bool
Parameters:
- dataset (Any) – The dataset from which to load the data.
- batches (Union [AbsSampler , Sequence *[*Sequence *[*Any ] ] ]) – The batches of data to be sampled.
- num_iters_per_epoch (Optional *[*int ]) – The number of iterations per epoch (default is None).
- seed (int) – Random seed (default is 0).
- shuffle (bool) – Whether to shuffle the data (default is False).
- shuffle_within_batch (bool) – Whether to shuffle data within each batch (default is False).
- num_workers (int) – Number of subprocesses to use for data loading (default is 0).
- collate_fn (Optional *[*callable ]) – Function to merge a list of samples into a batch (default is None).
- pin_memory (bool) – If True, the data loader will copy Tensors into pinned memory (default is False).
Returns: A PyTorch DataLoader instance configured with the specified parameters.
Return type: DataLoader

####### Examples

>>> factory = SequenceIterFactory(dataset, batches, num_iters_per_epoch=10)
>>> data_loader = factory.build_iter(epoch=1)

NOTE

The build_iter method can be called multiple times with different epoch values to generate iterators for different training phases.

build_iter(epoch: int, shuffle: bool | None = None) → DataLoader

Build iterator for each epoch.

This class simply creates a PyTorch DataLoader with the following features:

The random seed is determined based on the epoch number, ensuring

reproducibility when resuming from a training session.

Allows restriction on the number of samples per epoch, controlling the interval between training and evaluation.

dataset

The dataset to load data from.

Type: Any

sampler

The sampler used to sample batches of data.

Type:AbsSampler

num_iters_per_epoch

Maximum number of iterations per epoch.

Type: Optional[int]

shuffle

Whether to shuffle the data.

Type: bool

shuffle

_within_batch

Whether to shuffle data within each batch.

Type: bool

seed

Seed for random number generation.

Type: int

num_workers

Number of subprocesses to use for data loading.

Type: int

collate_fn

Function to merge a list of samples.

Type: Optional[callable]

pin_memory

If True, the data loader will copy Tensors into CUDA pinned memory.

Type: bool
Parameters:
- epoch (int) – The current epoch number.
- shuffle (Optional *[*bool ]) – If provided, overrides the default shuffle setting.
Returns: A PyTorch DataLoader object configured for the current epoch.
Return type: DataLoader
Raises:
- AssertionError – If the length of batches does not match
- num_iters_per_epoch when it is specified. –

####### Examples

>>> factory = SequenceIterFactory(dataset=my_dataset, batches=my_batches)
>>> data_loader = factory.build_iter(epoch=1, shuffle=True)

NOTE

Ensure that the dataset and batches are compatible with the DataLoader’s requirements.