espnet2.tasks.gan_svs.GANSVSTask

About 5 min

espnet2.tasks.gan_svs.GANSVSTask

class espnet2.tasks.gan_svs.GANSVSTask

Bases: AbsTask

GAN-based Singing-voice-synthesis task.

This class implements a task for singing voice synthesis using Generative Adversarial Networks (GANs). It manages the configurations for various components including feature extraction, normalization, and model building.

num_optimizers

The number of optimizers required for GAN training.

Type: int

class_choices_list

List of available class choices for various components in the task.

Type: List[ClassChoices]

trainer

The trainer class used for this task.

Type: Type[GANTrainer]

add_task_arguments(parser

argparse.ArgumentParser): Adds task-specific arguments to the provided argument parser.

build_collate_fn(args

argparse.Namespace, train: bool) -> Callable: Builds a collate function for processing batches of data.

build_preprocess_fn(args

argparse.Namespace, train: bool) -> Optional[Callable]: Builds a preprocessing function based on the provided arguments.

required_data_names(train

bool = True, inference: bool = False) -> Tuple[str, …]: Returns a tuple of required data names based on training or inference mode.

optional_data_names(train

bool = True, inference: bool = False) -> Tuple[str, …]: Returns a tuple of optional data names based on training or inference mode.

build_model(args

argparse.Namespace) -> ESPnetGANSVSModel: Builds and returns an instance of the ESPnetGANSVSModel based on the provided arguments.

build_optimizers(args

argparse.Namespace, model: ESPnetGANSVSModel) -> List[torch.optim.Optimizer]: Builds and returns a list of optimizers for training the model.

################### Examples

To add task arguments

parser = argparse.ArgumentParser() GANSVSTask.add_task_arguments(parser)

To build a model

args = parser.parse_args() model = GANSVSTask.build_model(args)

To build optimizers

optimizers = GANSVSTask.build_optimizers(args, model)

######## NOTE This task is designed to work with various feature extraction methods, normalization techniques, and SVS models.

classmethod add_task_arguments(parser: ArgumentParser)

Adds task-related arguments to the provided argument parser.

This method defines command-line arguments specific to the GAN-based Singing-voice-synthesis (SVS) task, including configurations for model parameters, preprocessing options, and feature extraction settings.

Parameters:
- cls – The class itself, used for adding class-specific arguments.
- parser (argparse.ArgumentParser) – The argument parser to which the task-related arguments will be added.

################### Examples

To add task arguments to a parser:

``

python import argparse from gansvs_task import GANSVSTask

parser = argparse.ArgumentParser() GANSVSTask.add_task_arguments(parser) args = parser.parse_args()

``

######## NOTE This method automatically appends argument groups for various configurable components like postfrontend, score extractor, and normalization options.

Raises:ValueError – If there is an issue with the provided argument configurations.

classmethod build_collate_fn(args: Namespace, train: bool) → Callable[[Collection[Tuple[str, Dict[str, ndarray]]]], Tuple[List[str], Dict[str, Tensor]]]

Builds a collate function for processing batches of data during training or

evaluation.

This method constructs a callable that takes a collection of tuples, where each tuple consists of a string (usually the file path or identifier) and a dictionary containing NumPy arrays representing various features. The callable returns a tuple containing a list of strings and a dictionary of PyTorch tensors, appropriately padded for batch processing.

Parameters:
- args (argparse.Namespace) – The argument namespace containing configurations for the task.
- train (bool) – A flag indicating whether the collate function is being built for training or evaluation.
Returns: Callable[[Collection[Tuple[str, Dict[str, np.ndarray]]], Tuple[List[str], Dict[str, torch.Tensor]]]: A collate function that can be used to process batches of data.

################### Examples

>>> collate_fn = GANSVSTask.build_collate_fn(args, train=True)
>>> batch = [
...     ("file1", {"feature1": np.array([1, 2]), "feature2": np.array([3])}),
...     ("file2", {"feature1": np.array([4]), "feature2": np.array([5, 6])}),
... ]
>>> file_ids, tensors = collate_fn(batch)
>>> print(file_ids)  # Output: ['file1', 'file2']
>>> print(tensors)   # Output: {'feature1': tensor(...), 'feature2': tensor(...)}

######## NOTE This function utilizes the CommonCollateFn for handling padding of sequences and other necessary adjustments for batching.

classmethod build_model(args: Namespace) → ESPnetGANSVSModel

Builds the ESPnet GANSVS model based on the provided arguments.

This method configures the model by creating necessary components such as feature extractors, normalization layers, and the main SVS model. It reads the token list from a file or directly from the arguments and initializes the model components according to the specified configurations.

Parameters:args (argparse.Namespace) – The arguments containing model configurations, feature extractor types, and other relevant parameters.
Returns: An instance of the ESPnet GANSVS model configured : with the specified components.
Return type:ESPnetGANSVSModel
Raises:RuntimeError – If token_list is neither a string nor a valid list.

################### Examples

>>> from argparse import Namespace
>>> args = Namespace(
...     token_list="path/to/token_list.txt",
...     odim=None,
...     feats_extract="linear_spectrogram",
...     feats_extract_conf={"hop_length": 256},
...     postfrontend="s3prl",
...     postfrontend_conf={"some_param": "value"},
...     normalize="global_mvn",
...     model_conf={"additional_param": "value"}
... )
>>> model = GANSVSTask.build_model(args)
>>> print(model)

classmethod build_optimizers(args: Namespace, model: ESPnetGANSVSModel) → List[Optimizer]

Builds the optimizers for the GAN-based Singing-voice-synthesis model.

This method creates two optimizers: one for the generator and one for the discriminator of the GAN model. It retrieves the optimizer classes based on the specified arguments and initializes them with the model’s parameters.

Parameters:
- args (argparse.Namespace) – The arguments containing optimizer configurations and settings.
- model (ESPnetGANSVSModel) – The ESPnet GAN-based Singing-voice-synthesis model for which the optimizers are to be created.
Returns: A list containing the generator and : discriminator optimizers.
Return type: List[torch.optim.Optimizer]
Raises:
- ValueError – If the specified optimizer class is not recognized.
- RuntimeError – If fairscale is required but not installed.

################### Examples

>>> from espnet2.gan_svs.espnet_model import ESPnetGANSVSModel
>>> args = ...  # Arguments containing optimizer configurations
>>> model = ESPnetGANSVSModel(...)  # Model initialization
>>> optimizers = GANSVSTask.build_optimizers(args, model)
>>> assert len(optimizers) == 2  # Ensure two optimizers are created

classmethod build_preprocess_fn(args: Namespace, train: bool) → Callable[[str, Dict[str, array]], Dict[str, ndarray]] | None

Builds a preprocessing function for the GAN-based Singing-voice-synthesis task.

This function creates a callable that preprocesses input data based on the provided arguments. If preprocessing is enabled, it initializes an instance of the SVSPreprocessor with the specified configurations. If preprocessing is not enabled, it returns None.

Parameters:
- cls – The class reference.
- args (argparse.Namespace) – Command-line arguments containing configuration options for preprocessing.
- train (bool) – A flag indicating whether the function is being called for training or not.
Returns: A callable that preprocesses the input data, or None if preprocessing is not enabled.
Return type: Optional[Callable[[str, Dict[str, np.ndarray]], Dict[str, np.ndarray]]]

################### Examples

>>> args = argparse.Namespace()
>>> args.use_preprocessor = True
>>> args.token_type = "phn"
>>> args.token_list = "path/to/token_list.txt"
>>> preprocess_fn = GANSVSTask.build_preprocess_fn(args, train=True)
>>> processed_data = preprocess_fn("input_text", {"feature": np.array([])})

######## NOTE The SVSPreprocessor requires specific configurations to function correctly. Ensure that the necessary arguments are provided.

class_choices_list

*: List[[ClassChoices](../train/ClassChoices.md#espnet2.train.class_choices.ClassChoices)]* *= [<espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>]*

num_optimizers

*: int* *= 2*

classmethod optional_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]

Returns the optional data names used in the GAN-based Singing-voice-synthesis

task.

The returned names depend on whether the task is in training or inference mode. In training mode, optional data names include speaker embeddings, durations, pitch, energy, speaker IDs, language IDs, features, and ying. In inference mode, the optional data names include speaker embeddings, singing, pitch, durations, speaker IDs, and language IDs.

Parameters:
- train (bool) – Indicates whether the task is in training mode. Default is True.
- inference (bool) – Indicates whether the task is in inference mode. Default is False.
Returns: A tuple containing the names of optional data.
Return type: Tuple[str, …]

################### Examples

>>> GANSVSTask.optional_data_names(train=True, inference=False)
('spembs', 'durations', 'pitch', 'energy', 'sids', 'lids', 'feats', 'ying')

>>> GANSVSTask.optional_data_names(train=False, inference=True)
('spembs', 'singing', 'pitch', 'durations', 'sids', 'lids')

classmethod required_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]

Returns the required data names for the GAN-based Singing-voice-synthesis task.

The method returns a tuple of required data names based on whether the task is in training or inference mode. The required data names differ in the inference mode where the ‘singing’ data is not required.

Parameters:
- train (bool) – A flag indicating if the task is in training mode. Default is True.
- inference (bool) – A flag indicating if the task is in inference mode. Default is False.
Returns: A tuple containing the names of the required data. The names will be:
- In training mode: (“text”, “singing”, “score”, “label”)
- In inference mode: (“text”, “score”, “label”)
Return type: Tuple[str, …]

################### Examples

>>> GANSVSTask.required_data_names(train=True, inference=False)
('text', 'singing', 'score', 'label')

>>> GANSVSTask.required_data_names(train=False, inference=True)
('text', 'score', 'label')

trainer

alias of GANTrainer