espnet2.speechlm.definitions.SpeechLMTask
espnet2.speechlm.definitions.SpeechLMTask
class espnet2.speechlm.definitions.SpeechLMTask(encoder_entries: List[Tuple[str, str, str]], decoder_entries: List[Tuple[str, str, str]], target_entries: List[Tuple[str, str, str]], use_task_identifier: bool = True)
Bases: object
Dataclass representing a speech language model task in the ESPnet2 framework.
The SpeechLMTask defines the structure of a task that involves encoding and decoding entries for speech language models. It contains information about the encoder and decoder entries as well as the target entries that are used during the training process. This dataclass also provides the option to include a task identifier.
encoder_entries
A list of tuples where each tuple contains the file name, entry modality, and data type for the encoder.
- Type: List[Tuple[str, str, str]]
decoder_entries
A list of tuples similar to encoder_entries, but for the decoder.
- Type: List[Tuple[str, str, str]]
target_entries
A list of tuples that represent the entries used to compute the loss, typically the same as decoder_entries.
- Type: List[Tuple[str, str, str]]
use_task_identifier
A flag indicating whether to use a task identifier. Defaults to True.
- Type: bool
find_modality_type()
Concatenates and returns a string representation of all modality types from encoder and decoder entries, used in shell data preparation scripts.
####### Examples
Create a speech language model task for text-to-speech (TTS):
``
`
python tts_task = SpeechLMTask(
encoder_entries=[(“text”, “g2p”, “text”), (“utt2spk”, “spk”, “text”)], decoder_entries=[(“wav.scp”, “codec”, “kaldi_ark”)], target_entries=[(“wav.scp”, “codec”, “kaldi_ark”)],
)
Access the modality types:
python modality_types = tts_task.find_modality_type
NOTE
The find_modality_type method concatenates all modality entries into a single string, which can be useful for data preparation and debugging.
decoder_entries
encoder_entries
property find_modality_type
Combines the encoder and decoder entries into a single string representation.
This property is used in the shell data preparation script to gather all modality information from the encoder and decoder entries of the SpeechLMTask instance. It concatenates the entries into a single string where each entry is formatted as a comma-separated value.
encoder_entries
A list of tuples representing the encoder entries. Each tuple contains:
- file_name: The name of the file.
- entry_modality: The modality type of the entry.
- data_type: The data type for loading the entry.
- Type: List[Tuple[str, str, str]]
decoder_entries
A list of tuples representing the decoder entries, formatted similarly to encoder_entries.
- Type: List[Tuple[str, str, str]]
target_entries
A list of tuples representing the target entries, formatted similarly to encoder_entries and decoder_entries.
- Type: List[Tuple[str, str, str]]
use_task_identifier
A flag indicating whether to use the task identifier.
Type: bool
Returns: A string representation of all encoder and decoder entries, formatted as comma-separated values.
Return type: str
####### Examples
>>> task = SpeechLMTask(
... encoder_entries=[("text", "g2p", "text")],
... decoder_entries=[("wav.scp", "codec", "kaldi_ark")],
... target_entries=[("wav.scp", "codec", "kaldi_ark")]
... )
>>> print(task.find_modality_type())
text,g2p,text wav.scp,codec,kaldi_ark
target_entries
use_task_identifier