espnet3.components.data.collect_stats.CollectStatsInferenceProvider
espnet3.components.data.collect_stats.CollectStatsInferenceProvider
class espnet3.components.data.collect_stats.CollectStatsInferenceProvider(model_config, dataset_config, dataloader_config, mode: str, task: str | None = None, shard_idx: int | None = None, params: Dict[str, Any] | None = None)
Bases: EnvironmentProvider
Build collect-stats execution environments.
This provider prepares the dataset, collate function, device, and model for CollectStatsRunner. It supports both local execution and worker setup for parallel jobs.
Examples
provider = CollectStatsInferenceProvider( : model_config=model_config, dataset_config=dataset_config, dataloader_config=dataloader_config, mode=”train”, task=”asr”,
)
Initialize the provider configuration.
- Parameters:
- model_config – Config used to instantiate the model.
- dataset_config – Config used to instantiate the dataset organizer.
- dataloader_config – Dataloader config, including optional
collate_fnsettings. - mode – Dataset split name such as
trainorvalid. - task – ESPnet task name. When set, the model is resolved through the espnet2 task bridge.
- shard_idx – Optional shard index applied to shardable datasets.
- params – Extra config values merged into the provider config. This is typically used for flags such as
write_collected_feats.
build_env_local() → Dict[str, Any]
Build the local execution environment once on the driver.
- Returns: Environment mapping with keys
collate_fn,dataset,device,model, andwrite_collected_feats. - Return type: dict
build_worker_setup_fn()
Return a worker setup function for parallel collect-stats jobs.
- Returns: A zero-argument function that builds and returns the same environment dict as
build_env_local(). Called once per parallel worker process. - Return type: Callable
