espnet2.speechlm.dataloader.multimodal_loader.TextReader
Less than 1 minute
espnet2.speechlm.dataloader.multimodal_loader.TextReader
class espnet2.speechlm.dataloader.multimodal_loader.TextReader(text_file: str, valid_ids: list = None)
Bases: object
Dict-like text reader supporting plain and JSONL formats.
Plain format: <id> <text content> JSONL format: {βidβ: β<id>β, βtextβ: β<text content>β}
Format is determined by file suffix (.jsonl for JSONL, otherwise plain).
- Parameters:
- text_file β Path to text file (plain or JSONL format)
- valid_ids β List of valid IDs to keep (optional, keeps all if None)
items()
Return iterator over (id, text) pairs.
keys()
Return iterator over IDs.
values()
Return iterator over texts.
