espnet2.fileio.rttm.RttmReader
espnet2.fileio.rttm.RttmReader
class espnet2.fileio.rttm.RttmReader(fname: str)
Bases: Mapping
Reader class for ‘rttm.scp’.
This class provides functionality to read RTTM (Rich Transcription Time Markup) files, specifically tailored for the ESPnet framework. The RTTM format supported by this class extends the standard format by using sample numbers instead of absolute time and includes an END label to represent the duration of a recording.
The standard RTTM format can be found at: https://catalog.ldc.upenn.edu/docs/LDC2004T12/RTTM-format-v13.pdf
fname
The filename of the RTTM file to be read.
- Type: str
data
Parsed RTTM data where keys are utterance IDs and values are tuples containing speaker list, speaker events, and maximum duration.
Type: Dict[str, List[Tuple[str, float, float]]]
Parameters:fname (str) – The path to the RTTM file.
####### Examples
>>> reader = RttmReader('rttm')
>>> spk_label = reader["file1"]
The RTTM file may contain lines such as: SPEAKER file1 1 0 1023 <NA> <NA> spk1 <NA> SPEAKER file1 2 4000 3023 <NA> <NA> spk2 <NA> SPEAKER file1 3 500 4023 <NA> <NA> spk1 <NA> END file1 <NA> 4023 <NA> <NA> <NA> <NA>
This example shows how to instantiate the reader and access the speaker labels for a given file.
NOTE
The reader currently supports only speaker information. Ensure that the RTTM file is formatted correctly to avoid assertion errors.
- Raises:AssertionError – If the RTTM line does not have exactly 9 fields or if the label type is not “SPEAKER” or “END”.
keys()
Read and parse RTTM (Rich Transcription Time Marked) files.
This module provides functionality to read RTTM files and extract speaker information. The load_rttm_text function reads the file and returns a dictionary containing speaker events associated with each utterance.
Note: This implementation currently only supports speaker information.
- RttmReader
A class for reading RTTM files.
- Parameters:path (Union *[*Path , str ]) – The file path to the RTTM file to be read.
- Returns: A dictionary where each key is an utterance ID and the value is a list of tuples containing speaker ID, start time, and end time.
- Return type: Dict[str, List[Tuple[str, float, float]]]
- Raises:
- AssertionError – If the RTTM line does not contain exactly 9 fields or if
- the label type is not "SPEAKER" or "END". –
####### Examples
>>> data = load_rttm_text("path/to/rttm/file.rttm")
>>> print(data)
{'file1': (['spk1', 'spk2'], [(spk1, start1, end1), (spk2, start2, end2)], max_duration)}
RttmReader class: : Reader class for ‘rttm.scp’. <br/> Examples: : SPEAKER file1 1 0 1023 <NA> <NA> spk1 <NA> SPEAKER file1 2 4000 3023 <NA> <NA> spk2 <NA> SPEAKER file1 3 500 4023 <NA> <NA> spk1 <NA> END file1 <NA> 4023 <NA> <NA> <NA> <NA> <br/> This is an extended version of the standard RTTM format for espnet. The differences include: 1. Use of sample number instead of absolute time. 2. Inclusion of an END label to represent the duration of a recording. 3. Replacement of duration (5th field) with end time. (For standard RTTM, see https://catalog.ldc.upenn.edu/docs/LDC2004T12/RTTM-format-v13.pdf) <br/> Examples: : python >>> reader = RttmReader('path/to/rttm/file.rttm') >>> spk_label = reader["file1"]