espnet2.iterators.sequence_iter_factory.SequenceIterFactory
espnet2.iterators.sequence_iter_factory.SequenceIterFactory
class espnet2.iterators.sequence_iter_factory.SequenceIterFactory(dataset, batches: AbsSampler | Sequence[Sequence[Any]], num_iters_per_epoch: int | None = None, seed: int = 0, shuffle: bool = False, shuffle_within_batch: bool = False, num_workers: int = 0, collate_fn=None, pin_memory: bool = False)
Bases: AbsIterFactory
Build iterator for each epoch.
This class creates a PyTorch DataLoader with additional features for reproducibility and control over the number of samples per epoch. The following points are noteworthy:
- The random seed is determined based on the epoch number, ensuring reproducibility when resuming training.
- It allows restriction on the number of samples for each epoch, facilitating control over the interval between training and evaluation.
dataset
The dataset to be used for the DataLoader.
- Type: Any
sampler
An instance of a sampler that defines the order in which data is drawn.
- Type:AbsSampler
num_iters_per_epoch
The number of iterations to perform in one epoch.
- Type: Optional[int]
seed
The seed for random number generation.
- Type: int
shuffle
Whether to shuffle the data at the beginning of each epoch.
- Type: bool
shuffle
Whether to shuffle the data within each batch.
- Type: bool
num_workers
The number of worker processes for data loading.
- Type: int
collate_fn
Function to merge a list of samples into a batch.
- Type: Optional[callable]
pin_memory
If True, the data loader will copy Tensors into pinned memory before returning them.
Type: bool
Parameters:
- dataset (Any) – The dataset from which to load the data.
- batches (Union [AbsSampler , Sequence *[*Sequence *[*Any ] ] ]) – The batches of data to be sampled.
- num_iters_per_epoch (Optional *[*int ]) – The number of iterations per epoch (default is None).
- seed (int) – Random seed (default is 0).
- shuffle (bool) – Whether to shuffle the data (default is False).
- shuffle_within_batch (bool) – Whether to shuffle data within each batch (default is False).
- num_workers (int) – Number of subprocesses to use for data loading (default is 0).
- collate_fn (Optional *[*callable ]) – Function to merge a list of samples into a batch (default is None).
- pin_memory (bool) – If True, the data loader will copy Tensors into pinned memory (default is False).
Returns: A PyTorch DataLoader instance configured with the specified parameters.
Return type: DataLoader
####### Examples
>>> factory = SequenceIterFactory(dataset, batches, num_iters_per_epoch=10)
>>> data_loader = factory.build_iter(epoch=1)
NOTE
The build_iter method can be called multiple times with different epoch values to generate iterators for different training phases.
build_iter(epoch: int, shuffle: bool | None = None) → DataLoader
Build iterator for each epoch.
This class simply creates a PyTorch DataLoader with the following features:
- The random seed is determined based on the epoch number, ensuring
reproducibility when resuming from a training session.
- Allows restriction on the number of samples per epoch, controlling the interval between training and evaluation.
dataset
The dataset to load data from.
- Type: Any
sampler
The sampler used to sample batches of data.
- Type:AbsSampler
num_iters_per_epoch
Maximum number of iterations per epoch.
- Type: Optional[int]
shuffle
Whether to shuffle the data.
- Type: bool
shuffle
Whether to shuffle data within each batch.
- Type: bool
seed
Seed for random number generation.
- Type: int
num_workers
Number of subprocesses to use for data loading.
- Type: int
collate_fn
Function to merge a list of samples.
- Type: Optional[callable]
pin_memory
If True, the data loader will copy Tensors into CUDA pinned memory.
Type: bool
Parameters:
- epoch (int) – The current epoch number.
- shuffle (Optional *[*bool ]) – If provided, overrides the default shuffle setting.
Returns: A PyTorch DataLoader object configured for the current epoch.
Return type: DataLoader
Raises:
- AssertionError – If the length of batches does not match
- num_iters_per_epoch when it is specified. –
####### Examples
>>> factory = SequenceIterFactory(dataset=my_dataset, batches=my_batches)
>>> data_loader = factory.build_iter(epoch=1, shuffle=True)
NOTE
Ensure that the dataset and batches are compatible with the DataLoader’s requirements.