espnet3.components.data.collect_stats.collect_stats

Less than 1 minute

espnet3.components.data.collect_stats.collect_stats

espnet3.components.data.collect_stats.collect_stats(model_config, dataset_config, dataloader_config, mode: str, output_dir: Path, task: str | None = None, parallel_config: DictConfig | None = None, write_collected_feats: bool = False, batch_size: int = 4)

Entry point for collecting dataset statistics used for feature normalization.

Runs the runner-based collection once, optionally configuring parallel execution via espnet3.parallel.set_parallel() when parallel_config is provided.

Parameters:
- model_config – Configuration object used to instantiate the model that extracts features from the input examples.
- dataset_config – Configuration of the dataset organizer providing the split specified by mode.
- dataloader_config – Dataloader configuration.
- mode – Name of the dataset split to process (train or valid).
- output_dir – Directory where aggregated statistics and optionally collected features are written.
- task – Name of the ESPnet task. If None, model_config should be directly instantiable.
- parallel_config – Configuration for parallel execution.
- write_collected_feats – Whether to persist the raw collected features.
- batch_size – Number of dataset items processed per batch.
Returns: Aggregated statistics are saved under output_dir / mode.
Return type: None

Example

>>> collect_stats(
...     model_config=model_cfg,
...     dataset_config=dataset_cfg,
...     dataloader_config=dataloader_cfg,
...     mode="train",
...     output_dir=Path("exp/stats"),
...     task=None,
...     parallel_config=None,
...     batch_size=4,
... )