espnet2.tasks.tts2.TTS2Task
espnet2.tasks.tts2.TTS2Task
class espnet2.tasks.tts2.TTS2Task
Bases: AbsTask
Text-to-speech (TTS) task class for ESPnet2.
This class is responsible for managing the TTS task, including setting up arguments, building models, and handling data processing. It extends the abstract base task class AbsTask.
num_optimizers
Number of optimizers to use. Default is 1.
- Type: int
class_choices_list
List of class choices for various components in the TTS task.
- Type: List[ClassChoices]
trainer
Trainer class used for training procedures.
- Type:Trainer
add_task_arguments(parser
argparse.ArgumentParser): Adds command-line arguments specific to the TTS task.
build_collate_fn(args
argparse.Namespace, train: bool) -> Callable: Builds a collate function for batching data.
build_preprocess_fn(args
argparse.Namespace, train: bool) -> Optional[Callable]: Builds a preprocessing function for input data.
required_data_names(train
bool = True, inference: bool = False) -> Tuple[str, …]: Returns the names of the required data for the task.
optional_data_names(train
bool = True, inference: bool = False) -> Tuple[str, …]: Returns the names of the optional data for the task.
build_model(args
argparse.Namespace) -> ESPnetTTS2Model: Constructs the TTS model based on the provided arguments.
build_vocoder_from_file(
vocoder_config_file: Union[Path, str] = None, vocoder_file: Union[Path, str] = None, model: Optional[ESPnetTTS2Model] = None, device: str = “cpu”
)
Builds a vocoder from the specified configuration and model files.
################### Examples
To add task-specific arguments to a parser: : parser = argparse.ArgumentParser() TTS2Task.add_task_arguments(parser)
To build a model: : args = … # Namespace with necessary arguments model = TTS2Task.build_model(args)
########## NOTE Ensure that the necessary configuration files and data are available when using this class.
classmethod add_task_arguments(parser: ArgumentParser)
Add task-related arguments to the argument parser.
This method defines and adds various arguments related to the TTS task that are necessary for model configuration and data preprocessing. The arguments include options for specifying source and target token lists, model configurations, and preprocessing preferences.
- Parameters:parser (argparse.ArgumentParser) – The argument parser instance to which the task-related arguments will be added.
########## NOTE Use ‘_’ instead of ‘-’ to avoid confusion in argument names.
################### Examples
To use this method, you can create an argument parser and call add_task_arguments:
python import argparse parser = argparse.ArgumentParser() TTS2Task.add_task_arguments(parser) args = parser.parse_args()
- Raises:RuntimeError – If there is an issue with argument parsing or required arguments are missing.
classmethod build_collate_fn(args: Namespace, train: bool) → Callable[[Collection[Tuple[str, Dict[str, ndarray]]]], Tuple[List[str], Dict[str, Tensor]]]
Build a collate function for batching data during training or evaluation.
This method constructs a collate function that is used to combine multiple data samples into a single batch. The collate function will pad sequences to the maximum length in the batch and convert the data into appropriate tensor formats.
- Parameters:
- args (argparse.Namespace) – The command-line arguments parsed into a namespace object.
- train (bool) – A flag indicating whether the collate function is being built for training or evaluation.
- Returns: Callable[[Collection[Tuple[str, Dict[str, np.ndarray]]]], : > Tuple[List[str], Dict[str, torch.Tensor]]]: <br/> A callable that takes a collection of data samples and returns a tuple containing a list of keys and a dictionary of tensorized data.
################### Examples
>>> collate_fn = TTS2Task.build_collate_fn(args, train=True)
>>> batch = collate_fn(data_samples)
>>> print(batch)
########## NOTE The function uses the CommonCollateFn for the actual implementation, which handles padding and tensor conversion.
classmethod build_model(args: Namespace) → ESPnetTTS2Model
Builds and returns an instance of the ESPnetTTS2Model using the provided
arguments.
This method constructs a text-to-speech model by first processing the source and target token lists, extracting discrete features, and configuring various components like pitch and energy extraction. The resulting model is an instance of ESPnetTTS2Model.
- Parameters:args (argparse.Namespace) – The parsed command-line arguments containing configurations for model building, including paths to token lists and various extraction settings.
- Returns: An instance of the ESPnetTTS2Model configured with : the specified parameters.
- Return type:ESPnetTTS2Model
- Raises:RuntimeError – If the source or target token lists are not provided in the expected format (must be str or dict).
################### Examples
>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument("--src_token_list", type=str, default="src_tokens.txt")
>>> parser.add_argument("--tgt_token_list", type=str, default="tgt_tokens.txt")
>>> args = parser.parse_args()
>>> model = TTS2Task.build_model(args)
########## NOTE Ensure that the token lists specified in the arguments exist and are properly formatted to avoid runtime errors.
classmethod build_preprocess_fn(args: Namespace, train: bool) → Callable[[str, Dict[str, array]], Dict[str, ndarray]] | None
Build a preprocessing function based on the provided arguments.
This function constructs a preprocessing function that will be used to process the input data for the TTS task. It leverages the CommonPreprocessor if preprocessing is enabled through the arguments.
- Parameters:
- args (argparse.Namespace) – The parsed arguments containing configurations for preprocessing.
- train (bool) – A flag indicating whether the function is being built for training or not.
- Returns: A preprocessing function that takes a string and a dictionary of numpy arrays as input and returns a dictionary of numpy arrays. Returns None if preprocessing is not enabled.
- Return type: Optional[Callable[[str, Dict[str, np.array]], Dict[str, np.ndarray]]]
################### Examples
To create a preprocessing function for training:
>>> args = argparse.Namespace(use_preprocessor=True, src_token_type='phn', ...)
>>> preprocess_fn = TTS2Task.build_preprocess_fn(args, train=True)
To create a preprocessing function for evaluation:
>>> args = argparse.Namespace(use_preprocessor=False, ...)
>>> preprocess_fn = TTS2Task.build_preprocess_fn(args, train=False)
########## NOTE Ensure that the use_preprocessor argument is set to True to enable preprocessing; otherwise, the function will return None.
classmethod build_vocoder_from_file(vocoder_config_file: Path | str | None = None, vocoder_file: Path | str | None = None, model: ESPnetTTS2Model | None = None, device: str = 'cpu')
Builds a vocoder from a given configuration file and model file.
This method is responsible for constructing a vocoder instance based on the provided vocoder configuration and model file. It currently supports vocoder models trained with Parallel WaveGAN.
- Parameters:
- vocoder_config_file (Union *[*Path , str ] , optional) – Path to the vocoder configuration file. If not provided, defaults to None.
- vocoder_file (Union *[*Path , str ]) – Path to the vocoder model file. This argument is required and must not be None.
- model (Optional [ESPnetTTS2Model ] , optional) – An instance of the TTS2 model to be used with the vocoder. Defaults to None.
- device (str , optional) – The device to which the vocoder should be moved (e.g., ‘cpu’ or ‘cuda’). Defaults to ‘cpu’.
- Returns: An instance of the vocoder model ready for inference.
- Return type:ParallelWaveGANPretrainedVocoder
- Raises:
- AssertionError – If vocoder_file is None.
- ValueError – If the file format of vocoder_file is not supported.
################### Examples
Building a vocoder using a vocoder config and model file
vocoder = TTS2Task.build_vocoder_from_file(
vocoder_config_file=’path/to/vocoder_config.yaml’, vocoder_file=’path/to/vocoder_model.pkl’
)
Building a vocoder with a specified device
vocoder = TTS2Task.build_vocoder_from_file(
vocoder_config_file=’path/to/vocoder_config.yaml’, vocoder_file=’path/to/vocoder_model.pkl’, device=’cuda’
)
class_choices_list
num_optimizers
classmethod optional_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]
Text-to-speech task.
This class implements the TTS2Task for handling text-to-speech tasks, including argument parsing, data processing, and model building.
num_optimizers
Number of optimizers to use.
- Type: int
class_choices_list
List of class choices for various components.
- Type: list
trainer
The trainer class used for training and evaluation.
Type:Trainer
Parameters:parser (argparse.ArgumentParser) – Argument parser to add task-specific arguments.
Returns: A function that collates data for training or evaluation.
Return type: Callable
Yields:Optional[Callable] –
A function that preprocesses data based on the : provided arguments.
Raises:RuntimeError – If the source or target token list is not a string or dictionary.
################### Examples
Adding task arguments
parser = argparse.ArgumentParser() TTS2Task.add_task_arguments(parser)
Building a collate function
collate_fn = TTS2Task.build_collate_fn(args, train=True)
Building a preprocess function
preprocess_fn = TTS2Task.build_preprocess_fn(args, train=True)
Required data names
required_data = TTS2Task.required_data_names(train=True)
Optional data names
optional_data = TTS2Task.optional_data_names(train=True)
Building the model
model = TTS2Task.build_model(args)
Building a vocoder from a file
vocoder = TTS2Task.build_vocoder_from_file(vocoder_config_file=’path/to/config’,
vocoder_file=’path/to/vocoder’)
classmethod required_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]
Defines the required data names for the TTS2Task class.
This method specifies the necessary data names that are required for training and inference. The data names differ based on whether the function is called in training mode or inference mode.
- Parameters:
- train (bool) – Indicates if the function is called in training mode. Default is True.
- inference (bool) – Indicates if the function is called in inference mode. Default is False.
- Returns: A tuple of required data names. The data names include:
- For training (inference=False): : (“text”, “speech”, “discrete_speech”)
- For inference (inference=True): : (“text”,)
- Return type: Tuple[str, …]
########## NOTE
- The “speech” data is used for on-the-fly feature extraction like
pitch and energy.
- The “discrete_speech” is mainly used for predicting the target.
################### Examples
>>> required_data_names()
('text', 'speech', 'discrete_speech')
>>> required_data_names(inference=True)
('text',)
trainer
alias of Trainer