espnet2.tasks.tts2.TTS2Task

About 6 min

espnet2.tasks.tts2.TTS2Task

class espnet2.tasks.tts2.TTS2Task

Bases: AbsTask

Text-to-speech (TTS) task class for ESPnet2.

This class is responsible for managing the TTS task, including setting up arguments, building models, and handling data processing. It extends the abstract base task class AbsTask.

num_optimizers

Number of optimizers to use. Default is 1.

Type: int

class_choices_list

List of class choices for various components in the TTS task.

Type: List[ClassChoices]

trainer

Trainer class used for training procedures.

Type:Trainer

add_task_arguments(parser

argparse.ArgumentParser): Adds command-line arguments specific to the TTS task.

build_collate_fn(args

argparse.Namespace, train: bool) -> Callable: Builds a collate function for batching data.

build_preprocess_fn(args

argparse.Namespace, train: bool) -> Optional[Callable]: Builds a preprocessing function for input data.

required_data_names(train

bool = True, inference: bool = False) -> Tuple[str, …]: Returns the names of the required data for the task.

optional_data_names(train

bool = True, inference: bool = False) -> Tuple[str, …]: Returns the names of the optional data for the task.

build_model(args

argparse.Namespace) -> ESPnetTTS2Model: Constructs the TTS model based on the provided arguments.

build_vocoder_from_file(

vocoder_config_file: Union[Path, str] = None, vocoder_file: Union[Path, str] = None, model: Optional[ESPnetTTS2Model] = None, device: str = “cpu”

)

Builds a vocoder from the specified configuration and model files.

################### Examples

To add task-specific arguments to a parser: : parser = argparse.ArgumentParser() TTS2Task.add_task_arguments(parser)

To build a model: : args = … # Namespace with necessary arguments model = TTS2Task.build_model(args)

########## NOTE Ensure that the necessary configuration files and data are available when using this class.

classmethod add_task_arguments(parser: ArgumentParser)

Add task-related arguments to the argument parser.

This method defines and adds various arguments related to the TTS task that are necessary for model configuration and data preprocessing. The arguments include options for specifying source and target token lists, model configurations, and preprocessing preferences.

Parameters:parser (argparse.ArgumentParser) – The argument parser instance to which the task-related arguments will be added.

########## NOTE Use ‘_’ instead of ‘-’ to avoid confusion in argument names.

################### Examples

To use this method, you can create an argument parser and call add_task_arguments:

python import argparse parser = argparse.ArgumentParser() TTS2Task.add_task_arguments(parser) args = parser.parse_args()

Raises:RuntimeError – If there is an issue with argument parsing or required arguments are missing.

classmethod build_collate_fn(args: Namespace, train: bool) → Callable[[Collection[Tuple[str, Dict[str, ndarray]]]], Tuple[List[str], Dict[str, Tensor]]]

Build a collate function for batching data during training or evaluation.

This method constructs a collate function that is used to combine multiple data samples into a single batch. The collate function will pad sequences to the maximum length in the batch and convert the data into appropriate tensor formats.

Parameters:
- args (argparse.Namespace) – The command-line arguments parsed into a namespace object.
- train (bool) – A flag indicating whether the collate function is being built for training or evaluation.
Returns: Callable[[Collection[Tuple[str, Dict[str, np.ndarray]]]], : > Tuple[List[str], Dict[str, torch.Tensor]]]: <br/> A callable that takes a collection of data samples and returns a tuple containing a list of keys and a dictionary of tensorized data.

################### Examples

>>> collate_fn = TTS2Task.build_collate_fn(args, train=True)
>>> batch = collate_fn(data_samples)
>>> print(batch)

########## NOTE The function uses the CommonCollateFn for the actual implementation, which handles padding and tensor conversion.

classmethod build_model(args: Namespace) → ESPnetTTS2Model

Builds and returns an instance of the ESPnetTTS2Model using the provided

arguments.

This method constructs a text-to-speech model by first processing the source and target token lists, extracting discrete features, and configuring various components like pitch and energy extraction. The resulting model is an instance of ESPnetTTS2Model.

Parameters:args (argparse.Namespace) – The parsed command-line arguments containing configurations for model building, including paths to token lists and various extraction settings.
Returns: An instance of the ESPnetTTS2Model configured with : the specified parameters.
Return type:ESPnetTTS2Model
Raises:RuntimeError – If the source or target token lists are not provided in the expected format (must be str or dict).

################### Examples

>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument("--src_token_list", type=str, default="src_tokens.txt")
>>> parser.add_argument("--tgt_token_list", type=str, default="tgt_tokens.txt")
>>> args = parser.parse_args()
>>> model = TTS2Task.build_model(args)

########## NOTE Ensure that the token lists specified in the arguments exist and are properly formatted to avoid runtime errors.

classmethod build_preprocess_fn(args: Namespace, train: bool) → Callable[[str, Dict[str, array]], Dict[str, ndarray]] | None

Build a preprocessing function based on the provided arguments.

This function constructs a preprocessing function that will be used to process the input data for the TTS task. It leverages the CommonPreprocessor if preprocessing is enabled through the arguments.

Parameters:
- args (argparse.Namespace) – The parsed arguments containing configurations for preprocessing.
- train (bool) – A flag indicating whether the function is being built for training or not.
Returns: A preprocessing function that takes a string and a dictionary of numpy arrays as input and returns a dictionary of numpy arrays. Returns None if preprocessing is not enabled.
Return type: Optional[Callable[[str, Dict[str, np.array]], Dict[str, np.ndarray]]]

################### Examples

To create a preprocessing function for training:

>>> args = argparse.Namespace(use_preprocessor=True, src_token_type='phn', ...)
>>> preprocess_fn = TTS2Task.build_preprocess_fn(args, train=True)

To create a preprocessing function for evaluation:

>>> args = argparse.Namespace(use_preprocessor=False, ...)
>>> preprocess_fn = TTS2Task.build_preprocess_fn(args, train=False)

########## NOTE Ensure that the use_preprocessor argument is set to True to enable preprocessing; otherwise, the function will return None.

Builds a vocoder from a given configuration file and model file.

This method is responsible for constructing a vocoder instance based on the provided vocoder configuration and model file. It currently supports vocoder models trained with Parallel WaveGAN.

Parameters:
- vocoder_config_file (Union *[*Path , str ] , optional) – Path to the vocoder configuration file. If not provided, defaults to None.
- vocoder_file (Union *[*Path , str ]) – Path to the vocoder model file. This argument is required and must not be None.
- model (Optional [ESPnetTTS2Model ] , optional) – An instance of the TTS2 model to be used with the vocoder. Defaults to None.
- device (str , optional) – The device to which the vocoder should be moved (e.g., ‘cpu’ or ‘cuda’). Defaults to ‘cpu’.
Returns: An instance of the vocoder model ready for inference.
Return type:ParallelWaveGANPretrainedVocoder
Raises:
- AssertionError – If vocoder_file is None.
- ValueError – If the file format of vocoder_file is not supported.

################### Examples

Building a vocoder using a vocoder config and model file

vocoder = TTS2Task.build_vocoder_from_file(

vocoder_config_file=’path/to/vocoder_config.yaml’, vocoder_file=’path/to/vocoder_model.pkl’

)

Building a vocoder with a specified device

vocoder = TTS2Task.build_vocoder_from_file(

vocoder_config_file=’path/to/vocoder_config.yaml’, vocoder_file=’path/to/vocoder_model.pkl’, device=’cuda’

)

class_choices_list

*: List[[ClassChoices](../train/ClassChoices.md#espnet2.train.class_choices.ClassChoices)]* *= [<espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>]*

num_optimizers

*: int* *= 1*

classmethod optional_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]

Text-to-speech task.

This class implements the TTS2Task for handling text-to-speech tasks, including argument parsing, data processing, and model building.

num_optimizers

Number of optimizers to use.

Type: int

class_choices_list

List of class choices for various components.

Type: list

trainer

The trainer class used for training and evaluation.

Type:Trainer
Parameters:parser (argparse.ArgumentParser) – Argument parser to add task-specific arguments.
Returns: A function that collates data for training or evaluation.
Return type: Callable
Yields:Optional[Callable] –
A function that preprocesses data based on the : provided arguments.
Raises:RuntimeError – If the source or target token list is not a string or dictionary.

################### Examples

Adding task arguments

parser = argparse.ArgumentParser() TTS2Task.add_task_arguments(parser)

Building a collate function

collate_fn = TTS2Task.build_collate_fn(args, train=True)

Building a preprocess function

preprocess_fn = TTS2Task.build_preprocess_fn(args, train=True)

Required data names

required_data = TTS2Task.required_data_names(train=True)

Optional data names

optional_data = TTS2Task.optional_data_names(train=True)

Building the model

model = TTS2Task.build_model(args)

Building a vocoder from a file

vocoder = TTS2Task.build_vocoder_from_file(vocoder_config_file=’path/to/config’,

vocoder_file=’path/to/vocoder’)

classmethod required_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]

Defines the required data names for the TTS2Task class.

This method specifies the necessary data names that are required for training and inference. The data names differ based on whether the function is called in training mode or inference mode.

Parameters:
- train (bool) – Indicates if the function is called in training mode. Default is True.
- inference (bool) – Indicates if the function is called in inference mode. Default is False.
Returns: A tuple of required data names. The data names include:
- For training (inference=False): : (“text”, “speech”, “discrete_speech”)
- For inference (inference=True): : (“text”,)
Return type: Tuple[str, …]

########## NOTE

The “speech” data is used for on-the-fly feature extraction like

pitch and energy.

The “discrete_speech” is mainly used for predicting the target.

################### Examples

>>> required_data_names()
('text', 'speech', 'discrete_speech')

>>> required_data_names(inference=True)
('text',)

trainer

alias of Trainer