espnet2.speechlm.dataloader.multimodal_loader.LhotseAudioReader
Less than 1 minute
espnet2.speechlm.dataloader.multimodal_loader.LhotseAudioReader
class espnet2.speechlm.dataloader.multimodal_loader.LhotseAudioReader(manifest_dir: str, valid_ids: list = None)
Bases: object
Dict-like lazy audio reader using Lhotse manifests.
This reader supports both single-channel and multi-channel audio data:
- Single-channel audio (MonoCut): Returns shape [1, num_samples]
- Multi-channel audio (MultiCut): Returns shape [num_channels, num_samples]
The output shape is consistent regardless of the input type, always returning a 2D array with shape [num_channels, num_samples].
- Parameters:
- manifest_dir β Directory containing Lhotse manifest files (recordings.jsonl.gz and optionally cuts.jsonl.gz)
- valid_ids β List of valid IDs to keep (optional, keeps all if None)
items()
Return iterator over (id, item) pairs.
keys()
Return iterator over IDs.
values()
Return iterator over items.
