espnet2.samplers.length_batch_sampler.LengthBatchSampler
espnet2.samplers.length_batch_sampler.LengthBatchSampler
class espnet2.samplers.length_batch_sampler.LengthBatchSampler(batch_bins: int, shape_files: Tuple[str, ...] | List[str], min_batch_size: int = 1, sort_in_batch: str = 'descending', sort_batch: str = 'ascending', drop_last: bool = False, padding: bool = True)
Bases: AbsSampler
LengthBatchSampler is a sampler that creates batches of data based on the
length of sequences. It groups sequences such that the total number of bins (or the total length) in each batch does not exceed a specified limit, allowing for efficient processing of variable-length inputs.
batch_bins
The maximum number of bins allowed in each batch.
- Type: int
shape_files
A list or tuple of file paths that contain the shape information of the sequences.
- Type: Union[Tuple[str, …], List[str]]
sort_in_batch
Determines the sorting order of sequences within each batch (‘ascending’ or ‘descending’).
- Type: str
sort_batch
Determines the sorting order of the batches (‘ascending’ or ‘descending’).
- Type: str
drop_last
If True, drop the last incomplete batch.
- Type: bool
batch_list
A list of tuples, each containing keys representing sequences in a batch.
Type: List[Tuple[str, …]]
Parameters:
- batch_bins (int) – Maximum number of bins allowed in each batch.
- shape_files (Union *[*Tuple *[*str , ... ] , List *[*str ] ]) – Paths to files containing shape information.
- min_batch_size (int , optional) – Minimum number of sequences in a batch. Defaults to 1.
- sort_in_batch (str , optional) – Sorting order of sequences within each batch. Defaults to ‘descending’.
- sort_batch (str , optional) – Sorting order of the batches. Defaults to ‘ascending’.
- drop_last (bool , optional) – If True, drop the last incomplete batch. Defaults to False.
- padding (bool , optional) – If True, padding is applied to the sequences. Defaults to True.
Returns: None
Raises:
- ValueError – If sort_batch or sort_in_batch is not ‘ascending’ or ‘descending’.
- RuntimeError – If keys are mismatched between shape files or if no batches can be created.
Examples
Create a LengthBatchSampler instance
sampler = LengthBatchSampler(
batch_bins=1000, shape_files=(“shape1.txt”, “shape2.txt”), min_batch_size=2, sort_in_batch=”ascending”, sort_batch=”descending”
)
Iterate through the batches
for batch in sampler:
print(batch)
NOTE
This sampler is particularly useful in scenarios involving variable-length sequences, such as in speech processing tasks.