espnet2.tasks.svs.SVSTask

About 5 min

espnet2.tasks.svs.SVSTask

class espnet2.tasks.svs.SVSTask

Bases: AbsTask

Singing Voice Synthesis (SVS) task class for managing the training and

evaluation of singing voice synthesis models.

This class extends the abstract task class AbsTask and provides methods for adding task-specific arguments, building models, collating data, and preprocessing inputs.

num_optimizers

Number of optimizers to be used in training.

Type: int

class_choices_list

List of class choices for various components in the SVS task.

Type: list

trainer

Trainer class used for training the models.

Type:Trainer

add_task_arguments(parser

argparse.ArgumentParser): Adds task-specific arguments to the provided argument parser.

build_collate_fn(args

argparse.Namespace, train: bool) -> Callable: Builds a collate function for batching data during training or evaluation.

build_preprocess_fn(args

argparse.Namespace, train: bool) -> Optional[Callable]: Builds a preprocessing function for input data based on the provided arguments.

required_data_names(train

bool = True, inference: bool = False) -> Tuple[str, …]: Returns the names of the required data for training or inference.

optional_data_names(train

bool = True, inference: bool = False) -> Tuple[str, …]: Returns the names of the optional data for training or inference.

build_model(args

argparse.Namespace) -> ESPnetSVSModel: Constructs the SVS model based on the provided arguments.

build_vocoder_from_file(vocoder_config_file

Union[Path, str] = None, vocoder_file: Union[Path, str] = None, model: Optional[ESPnetSVSModel] = None, device: str = “cpu”): Builds a vocoder from the provided configuration and model.

################### Examples

Example usage of adding task arguments

parser = argparse.ArgumentParser() SVSTask.add_task_arguments(parser)

Example of building a model

args = parser.parse_args() model = SVSTask.build_model(args)

######### NOTE Ensure that the necessary dependencies for SVS are installed and properly configured.

classmethod add_task_arguments(parser: ArgumentParser)

Add task-specific arguments to the argument parser.

This method adds various command-line arguments related to the singing voice synthesis (SVS) task to the provided argument parser. The arguments include configurations for tokenization, feature extraction, normalization, and the model.

Parameters:parser (argparse.ArgumentParser) – The argument parser to which the task-specific arguments will be added.

######### NOTE The method uses an underscore (_) instead of a hyphen (-) to avoid confusion in argument naming.

################### Examples

>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> SVSTask.add_task_arguments(parser)
>>> args = parser.parse_args()

Raises:ValueError – If the argument configurations are invalid.

classmethod build_collate_fn(args: Namespace, train: bool) → Callable[[Collection[Tuple[str, Dict[str, ndarray]]]], Tuple[List[str], Dict[str, Tensor]]]

Builds a collate function for the SVSTask.

This method constructs a callable that collates a batch of data during training or evaluation. It pads sequences appropriately and prepares tensors for input into the model.

Parameters:
- args (argparse.Namespace) – The parsed command line arguments.
- train (bool) – A flag indicating whether the function is being used for training or evaluation.
Returns: Callable[[Collection[Tuple[str, Dict[str, np.ndarray]]], : Tuple[List[str], Dict[str, torch.Tensor]]]]:
A collate function that processes a batch of data.

################### Examples

>>> from argparse import Namespace
>>> args = Namespace()
>>> collate_fn = SVSTask.build_collate_fn(args, train=True)
>>> batch = [("file1", {"feature": np.array([1, 2, 3])}),
             ("file2", {"feature": np.array([4, 5])})]
>>> collated_data = collate_fn(batch)
>>> print(collated_data)

classmethod build_model(args: Namespace) → ESPnetSVSModel

Build and return an instance of the ESPnetSVSModel based on the provided

arguments.

This method constructs a singing voice synthesis model by setting up various components such as feature extraction, normalization, and the model architecture itself. The model is built using configurations specified in the args parameter.

Parameters:
- args (argparse.Namespace) – The command line arguments containing model
- settings (configuration)
- list (including token)
- dimension (output)

:param : :param feature extractors: :param normalization methods: :param and SVS architecture: :param parameters.:

Returns: An instance of the ESPnet singing voice synthesis model configured according to the specified arguments.
Return type:ESPnetSVSModel
Raises:RuntimeError – If token_list is neither a string nor a list/tuple.

################### Examples

Example of building a model with specified arguments

args = argparse.Namespace(

token_list=’path/to/token_list.txt’, odim=80, feats_extract=’fbank’, feats_extract_conf={‘hop_length’: 256}, normalize=’global_mvn’, svs=’naive_rnn’, model_conf={}

) model = SVSTask.build_model(args) print(model)

######### NOTE The token_list can be provided as a path to a file or directly as a list of tokens. The method handles both cases and will read the token list from the file if a string path is provided.

classmethod build_preprocess_fn(args: Namespace, train: bool) → Callable[[str, Dict[str, array]], Dict[str, ndarray]] | None

Builds a preprocessing function for the singing voice synthesis task.

This method creates a preprocessing function based on the specified arguments. If preprocessing is enabled, it initializes an instance of SVSPreprocessor with the given configuration parameters. If preprocessing is not enabled, it returns None.

Parameters:
- cls – The class reference.
- args (argparse.Namespace) – The arguments namespace containing configuration parameters.
- train (bool) – Indicates whether the function is being built for training or evaluation.
Returns: A preprocessing function if args.use_preprocessor is True, otherwise None.
Return type: Optional[Callable[[str, Dict[str, np.ndarray]], Dict[str, np.ndarray]]]

################### Examples

>>> args = argparse.Namespace(use_preprocessor=True, token_type='phn',
...                            token_list='path/to/token_list.txt',
...                            bpemodel=None, non_linguistic_symbols=None,
...                            cleaner=None, g2p=None, fs=24000,
...                            feats_extract_conf={"hop_length": 256})
>>> preprocess_fn = SVSTask.build_preprocess_fn(args, train=True)
>>> print(preprocess_fn)
&lt;function SVSPreprocessor at 0x...&gt;

Builds a vocoder from the specified configuration and model files.

This method allows for the construction of a vocoder based on a provided configuration file and an optional vocoder file. If the vocoder file is not provided, it defaults to using the Griffin-Lim algorithm for vocoding.

Parameters:
- vocoder_config_file (Union *[*Path , str ] , optional) – Path to the vocoder configuration file in YAML format. If provided, it will be used to initialize the vocoder parameters.
- vocoder_file (Union *[*Path , str ] , optional) – Path to the vocoder model file. If this is a .pkl file, it is expected to be a trained model using Parallel WaveGAN.
- model (Optional [ESPnetSVSModel ] , optional) – An instance of the SVS model from which to extract features if vocoder_file is not specified.
- device (str , optional) – The device on which to load the vocoder model. Defaults to “cpu”.
Returns: Returns an instance of the vocoder (either Spectrogram2Waveform or ParallelWaveGANPretrainedVocoder) if successfully built; otherwise returns None if the vocoder could not be constructed.
Return type: Union[None, Spectrogram2Waveform, ParallelWaveGANPretrainedVocoder]
Raises:ValueError – If the provided vocoder_file format is not supported.

################### Examples

To build a vocoder using a configuration file:

``

python vocoder = SVSTask.build_vocoder_from_file(

vocoder_config_file=’path/to/vocoder_config.yaml’, model=my_svs_model

)

To build a vocoder using a pre-trained model file:

``

python vocoder = SVSTask.build_vocoder_from_file(

vocoder_file=’path/to/vocoder_model.pkl’, vocoder_config_file=’path/to/vocoder_config.yaml’

)

######### NOTE If no vocoder file is provided, Griffin-Lim will be used as a fallback vocoder.

class_choices_list

*: List[[ClassChoices](../train/ClassChoices.md#espnet2.train.class_choices.ClassChoices)]* *= [<espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>]*

num_optimizers

*: int* *= 1*

classmethod optional_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]

Optional data names for the singing voice synthesis task.

This method provides a list of optional data names that can be used during training or inference for the singing voice synthesis task. The returned names depend on the mode (training or inference) and can include various features and attributes related to the synthesis process.

Parameters:
- train (bool) – A flag indicating whether the task is in training mode. Defaults to True.
- inference (bool) – A flag indicating whether the task is in inference mode. Defaults to False.
Returns: A tuple containing the names of optional data.
Return type: Tuple[str, …]

################### Examples

In training mode

optional_data = SVSTask.optional_data_names(train=True) print(optional_data)

Output: (‘spembs’, ‘durations’, ‘pitch’, ‘energy’, ‘sids’, ‘lids’, ‘feats’, ‘ying’)

In inference mode

optional_data = SVSTask.optional_data_names(inference=True) print(optional_data)

Output: (‘spembs’, ‘singing’, ‘pitch’, ‘durations’, ‘sids’, ‘lids’)

classmethod required_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]

Get the required data names for the singing voice synthesis task.

This method returns a tuple of required data names based on whether the task is in training or inference mode. The data names vary depending on the mode.

Parameters:
- train (bool) – Indicates if the task is in training mode. Default is True.
- inference (bool) – Indicates if the task is in inference mode. Default is False.
Returns: A tuple containing the names of the required data.
Return type: Tuple[str, …]

################### Examples

>>> SVSTask.required_data_names(train=True, inference=False)
('text', 'singing', 'score', 'label')

>>> SVSTask.required_data_names(train=False, inference=True)
('text', 'score', 'label')

######### NOTE The required data names differ when the task is in inference mode, where ‘singing’ is not required.

trainer

alias of Trainer