espnet3.components.data.collect_stats.collect_stats
Less than 1 minute
espnet3.components.data.collect_stats.collect_stats
espnet3.components.data.collect_stats.collect_stats(model_config, dataset_config, dataloader_config, mode: str, output_dir: Path, task: str | None = None, parallel_config: DictConfig | None = None, write_collected_feats: bool = False, batch_size: int = 4)
Entry point for collecting dataset statistics used for feature normalization.
Runs the runner-based collection once, optionally configuring parallel execution via espnet3.parallel.set_parallel() when parallel_config is provided.
- Parameters:
- model_config – Configuration object used to instantiate the model that extracts features from the input examples.
- dataset_config – Configuration of the dataset organizer providing the split specified by
mode. - dataloader_config – Dataloader configuration.
- mode – Name of the dataset split to process (
trainorvalid). - output_dir – Directory where aggregated statistics and optionally collected features are written.
- task – Name of the ESPnet task. If
None,model_configshould be directly instantiable. - parallel_config – Configuration for parallel execution.
- write_collected_feats – Whether to persist the raw collected features.
- batch_size – Number of dataset items processed per batch.
- Returns: Aggregated statistics are saved under
output_dir / mode. - Return type: None
Example
>>> collect_stats(
... model_config=model_cfg,
... dataset_config=dataset_cfg,
... dataloader_config=dataloader_cfg,
... mode="train",
... output_dir=Path("exp/stats"),
... task=None,
... parallel_config=None,
... batch_size=4,
... )