espnet2.tasks.gan_svs.GANSVSTask
espnet2.tasks.gan_svs.GANSVSTask
class espnet2.tasks.gan_svs.GANSVSTask
Bases: AbsTask
GAN-based Singing-voice-synthesis task.
This class implements a task for singing voice synthesis using Generative Adversarial Networks (GANs). It manages the configurations for various components including feature extraction, normalization, and model building.
num_optimizers
The number of optimizers required for GAN training.
- Type: int
class_choices_list
List of available class choices for various components in the task.
- Type: List[ClassChoices]
trainer
The trainer class used for this task.
- Type: Type[GANTrainer]
add_task_arguments(parser
argparse.ArgumentParser): Adds task-specific arguments to the provided argument parser.
build_collate_fn(args
argparse.Namespace, train: bool) -> Callable: Builds a collate function for processing batches of data.
build_preprocess_fn(args
argparse.Namespace, train: bool) -> Optional[Callable]: Builds a preprocessing function based on the provided arguments.
required_data_names(train
bool = True, inference: bool = False) -> Tuple[str, …]: Returns a tuple of required data names based on training or inference mode.
optional_data_names(train
bool = True, inference: bool = False) -> Tuple[str, …]: Returns a tuple of optional data names based on training or inference mode.
build_model(args
argparse.Namespace) -> ESPnetGANSVSModel: Builds and returns an instance of the ESPnetGANSVSModel based on the provided arguments.
build_optimizers(args
argparse.Namespace, model: ESPnetGANSVSModel) -> List[torch.optim.Optimizer]: Builds and returns a list of optimizers for training the model.
################### Examples
To add task arguments
parser = argparse.ArgumentParser() GANSVSTask.add_task_arguments(parser)
To build a model
args = parser.parse_args() model = GANSVSTask.build_model(args)
To build optimizers
optimizers = GANSVSTask.build_optimizers(args, model)
######## NOTE This task is designed to work with various feature extraction methods, normalization techniques, and SVS models.
classmethod add_task_arguments(parser: ArgumentParser)
Adds task-related arguments to the provided argument parser.
This method defines command-line arguments specific to the GAN-based Singing-voice-synthesis (SVS) task, including configurations for model parameters, preprocessing options, and feature extraction settings.
- Parameters:
- cls – The class itself, used for adding class-specific arguments.
- parser (argparse.ArgumentParser) – The argument parser to which the task-related arguments will be added.
################### Examples
To add task arguments to a parser:
``
`
python import argparse from gansvs_task import GANSVSTask
parser = argparse.ArgumentParser() GANSVSTask.add_task_arguments(parser) args = parser.parse_args()
``
`
######## NOTE This method automatically appends argument groups for various configurable components like postfrontend, score extractor, and normalization options.
- Raises:ValueError – If there is an issue with the provided argument configurations.
classmethod build_collate_fn(args: Namespace, train: bool) → Callable[[Collection[Tuple[str, Dict[str, ndarray]]]], Tuple[List[str], Dict[str, Tensor]]]
Builds a collate function for processing batches of data during training or
evaluation.
This method constructs a callable that takes a collection of tuples, where each tuple consists of a string (usually the file path or identifier) and a dictionary containing NumPy arrays representing various features. The callable returns a tuple containing a list of strings and a dictionary of PyTorch tensors, appropriately padded for batch processing.
- Parameters:
- args (argparse.Namespace) – The argument namespace containing configurations for the task.
- train (bool) – A flag indicating whether the collate function is being built for training or evaluation.
- Returns: Callable[[Collection[Tuple[str, Dict[str, np.ndarray]]], Tuple[List[str], Dict[str, torch.Tensor]]]: A collate function that can be used to process batches of data.
################### Examples
>>> collate_fn = GANSVSTask.build_collate_fn(args, train=True)
>>> batch = [
... ("file1", {"feature1": np.array([1, 2]), "feature2": np.array([3])}),
... ("file2", {"feature1": np.array([4]), "feature2": np.array([5, 6])}),
... ]
>>> file_ids, tensors = collate_fn(batch)
>>> print(file_ids) # Output: ['file1', 'file2']
>>> print(tensors) # Output: {'feature1': tensor(...), 'feature2': tensor(...)}
######## NOTE This function utilizes the CommonCollateFn for handling padding of sequences and other necessary adjustments for batching.
classmethod build_model(args: Namespace) → ESPnetGANSVSModel
Builds the ESPnet GANSVS model based on the provided arguments.
This method configures the model by creating necessary components such as feature extractors, normalization layers, and the main SVS model. It reads the token list from a file or directly from the arguments and initializes the model components according to the specified configurations.
- Parameters:args (argparse.Namespace) – The arguments containing model configurations, feature extractor types, and other relevant parameters.
- Returns: An instance of the ESPnet GANSVS model configured : with the specified components.
- Return type:ESPnetGANSVSModel
- Raises:RuntimeError – If token_list is neither a string nor a valid list.
################### Examples
>>> from argparse import Namespace
>>> args = Namespace(
... token_list="path/to/token_list.txt",
... odim=None,
... feats_extract="linear_spectrogram",
... feats_extract_conf={"hop_length": 256},
... postfrontend="s3prl",
... postfrontend_conf={"some_param": "value"},
... normalize="global_mvn",
... model_conf={"additional_param": "value"}
... )
>>> model = GANSVSTask.build_model(args)
>>> print(model)
classmethod build_optimizers(args: Namespace, model: ESPnetGANSVSModel) → List[Optimizer]
Builds the optimizers for the GAN-based Singing-voice-synthesis model.
This method creates two optimizers: one for the generator and one for the discriminator of the GAN model. It retrieves the optimizer classes based on the specified arguments and initializes them with the model’s parameters.
- Parameters:
- args (argparse.Namespace) – The arguments containing optimizer configurations and settings.
- model (ESPnetGANSVSModel) – The ESPnet GAN-based Singing-voice-synthesis model for which the optimizers are to be created.
- Returns: A list containing the generator and : discriminator optimizers.
- Return type: List[torch.optim.Optimizer]
- Raises:
- ValueError – If the specified optimizer class is not recognized.
- RuntimeError – If fairscale is required but not installed.
################### Examples
>>> from espnet2.gan_svs.espnet_model import ESPnetGANSVSModel
>>> args = ... # Arguments containing optimizer configurations
>>> model = ESPnetGANSVSModel(...) # Model initialization
>>> optimizers = GANSVSTask.build_optimizers(args, model)
>>> assert len(optimizers) == 2 # Ensure two optimizers are created
classmethod build_preprocess_fn(args: Namespace, train: bool) → Callable[[str, Dict[str, array]], Dict[str, ndarray]] | None
Builds a preprocessing function for the GAN-based Singing-voice-synthesis task.
This function creates a callable that preprocesses input data based on the provided arguments. If preprocessing is enabled, it initializes an instance of the SVSPreprocessor with the specified configurations. If preprocessing is not enabled, it returns None.
- Parameters:
- cls – The class reference.
- args (argparse.Namespace) – Command-line arguments containing configuration options for preprocessing.
- train (bool) – A flag indicating whether the function is being called for training or not.
- Returns: A callable that preprocesses the input data, or None if preprocessing is not enabled.
- Return type: Optional[Callable[[str, Dict[str, np.ndarray]], Dict[str, np.ndarray]]]
################### Examples
>>> args = argparse.Namespace()
>>> args.use_preprocessor = True
>>> args.token_type = "phn"
>>> args.token_list = "path/to/token_list.txt"
>>> preprocess_fn = GANSVSTask.build_preprocess_fn(args, train=True)
>>> processed_data = preprocess_fn("input_text", {"feature": np.array([])})
######## NOTE The SVSPreprocessor requires specific configurations to function correctly. Ensure that the necessary arguments are provided.
class_choices_list
num_optimizers
classmethod optional_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]
Returns the optional data names used in the GAN-based Singing-voice-synthesis
task.
The returned names depend on whether the task is in training or inference mode. In training mode, optional data names include speaker embeddings, durations, pitch, energy, speaker IDs, language IDs, features, and ying. In inference mode, the optional data names include speaker embeddings, singing, pitch, durations, speaker IDs, and language IDs.
- Parameters:
- train (bool) – Indicates whether the task is in training mode. Default is True.
- inference (bool) – Indicates whether the task is in inference mode. Default is False.
- Returns: A tuple containing the names of optional data.
- Return type: Tuple[str, …]
################### Examples
>>> GANSVSTask.optional_data_names(train=True, inference=False)
('spembs', 'durations', 'pitch', 'energy', 'sids', 'lids', 'feats', 'ying')
>>> GANSVSTask.optional_data_names(train=False, inference=True)
('spembs', 'singing', 'pitch', 'durations', 'sids', 'lids')
classmethod required_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]
Returns the required data names for the GAN-based Singing-voice-synthesis task.
The method returns a tuple of required data names based on whether the task is in training or inference mode. The required data names differ in the inference mode where the ‘singing’ data is not required.
- Parameters:
- train (bool) – A flag indicating if the task is in training mode. Default is True.
- inference (bool) – A flag indicating if the task is in inference mode. Default is False.
- Returns: A tuple containing the names of the required data. The names will be:
- In training mode: (“text”, “singing”, “score”, “label”)
- In inference mode: (“text”, “score”, “label”)
- Return type: Tuple[str, …]
################### Examples
>>> GANSVSTask.required_data_names(train=True, inference=False)
('text', 'singing', 'score', 'label')
>>> GANSVSTask.required_data_names(train=False, inference=True)
('text', 'score', 'label')
trainer
alias of GANTrainer