espnet2.samplers.build_batch_sampler.build_batch_sampler
espnet2.samplers.build_batch_sampler.build_batch_sampler
espnet2.samplers.build_batch_sampler.build_batch_sampler(type: str, batch_size: int, batch_bins: int, shape_files: Tuple[str, ...] | List[str], sort_in_batch: str = 'descending', sort_batch: str = 'ascending', drop_last: bool = False, min_batch_size: int = 1, fold_lengths: Sequence[int] = (), padding: bool = True, utt2category_file: str | None = None) → AbsSampler
Helper function to instantiate various types of BatchSampler.
This function creates and returns an instance of a specified batch sampler based on the provided parameters. It supports multiple sampler types, each designed for different batching strategies.
espnet2.samplers.build_batch_sampler.type
The mini-batch type. Options include “unsorted”, “sorted”, “folded”, “numel”, “length”, or “catbel”.
- Type: str
espnet2.samplers.build_batch_sampler.batch_size
The mini-batch size. Used for “unsorted”, “sorted”, “folded”, and “catbel” modes.
- Type: int
espnet2.samplers.build_batch_sampler.batch_bins
Used for “numel” mode.
- Type: int
espnet2.samplers.build_batch_sampler.shape_files
Text files describing the length and dimension of each feature, e.g., “uttA 1330,80”.
- Type: Union[Tuple[str, …], List[str]]
espnet2.samplers.build_batch_sampler.sort_in_batch
Sorting order for samples within each batch.
- Type: str
espnet2.samplers.build_batch_sampler.sort_batch
Sorting order for batches.
- Type: str
espnet2.samplers.build_batch_sampler.drop_last
Whether to drop the last incomplete batch.
- Type: bool
espnet2.samplers.build_batch_sampler.min_batch_size
Minimum batch size used for “numel” or “folded” mode.
- Type: int
espnet2.samplers.build_batch_sampler.fold_lengths
Used for “folded” mode to specify fold lengths.
- Type: Sequence[int]
espnet2.samplers.build_batch_sampler.padding
Whether sequences are input as a padded tensor (used for “numel” mode).
- Type: bool
espnet2.samplers.build_batch_sampler.utt2category_file
Optional file to categorize utterances.
Type: Optional[str]
Parameters:
- type – Mini-batch type. Options: “unsorted”, “sorted”, “folded”, “numel”, “length”, or “catbel”.
- batch_size – The mini-batch size.
- batch_bins – Number of bins used for “numel” mode.
- shape_files – Text files describing feature dimensions.
- sort_in_batch – Sorting order for samples in each batch.
- sort_batch – Sorting order for batches.
- drop_last – Whether to drop the last incomplete batch.
- min_batch_size – Minimum batch size for “numel” or “folded” mode.
- fold_lengths – Lengths for folding (used in “folded” mode).
- padding – Whether to pad sequences for “numel” mode.
- utt2category_file – Optional file for categorizing utterances.
Returns: An instance of a batch sampler.
Return type:AbsSampler
Raises:ValueError – If no shape files are provided or if the number of fold_lengths does not match the number of shape_files.
Examples
Create an unsorted batch sampler
sampler = build_batch_sampler(
type=”unsorted”, batch_size=32, batch_bins=0, shape_files=[“shapes.txt”],
)
Create a sorted batch sampler
sampler = build_batch_sampler(
type=”sorted”, batch_size=32, batch_bins=0, shape_files=[“shapes.txt”], sort_in_batch=”ascending”, sort_batch=”descending”,
)
Create a folded batch sampler
sampler = build_batch_sampler(
type=”folded”, batch_size=64, batch_bins=0, shape_files=[“shapes1.txt”, “shapes2.txt”], fold_lengths=[10, 20],
)