espnet2.tasks.svs.SVSTask
espnet2.tasks.svs.SVSTask
class espnet2.tasks.svs.SVSTask
Bases: AbsTask
Singing Voice Synthesis (SVS) task class for managing the training and
evaluation of singing voice synthesis models.
This class extends the abstract task class AbsTask and provides methods for adding task-specific arguments, building models, collating data, and preprocessing inputs.
num_optimizers
Number of optimizers to be used in training.
- Type: int
class_choices_list
List of class choices for various components in the SVS task.
- Type: list
trainer
Trainer class used for training the models.
- Type:Trainer
add_task_arguments(parser
argparse.ArgumentParser): Adds task-specific arguments to the provided argument parser.
build_collate_fn(args
argparse.Namespace, train: bool) -> Callable: Builds a collate function for batching data during training or evaluation.
build_preprocess_fn(args
argparse.Namespace, train: bool) -> Optional[Callable]: Builds a preprocessing function for input data based on the provided arguments.
required_data_names(train
bool = True, inference: bool = False) -> Tuple[str, …]: Returns the names of the required data for training or inference.
optional_data_names(train
bool = True, inference: bool = False) -> Tuple[str, …]: Returns the names of the optional data for training or inference.
build_model(args
argparse.Namespace) -> ESPnetSVSModel: Constructs the SVS model based on the provided arguments.
build_vocoder_from_file(vocoder_config_file
Union[Path, str] = None, vocoder_file: Union[Path, str] = None, model: Optional[ESPnetSVSModel] = None, device: str = “cpu”): Builds a vocoder from the provided configuration and model.
################### Examples
Example usage of adding task arguments
parser = argparse.ArgumentParser() SVSTask.add_task_arguments(parser)
Example of building a model
args = parser.parse_args() model = SVSTask.build_model(args)
######### NOTE Ensure that the necessary dependencies for SVS are installed and properly configured.
classmethod add_task_arguments(parser: ArgumentParser)
Add task-specific arguments to the argument parser.
This method adds various command-line arguments related to the singing voice synthesis (SVS) task to the provided argument parser. The arguments include configurations for tokenization, feature extraction, normalization, and the model.
- Parameters:parser (argparse.ArgumentParser) – The argument parser to which the task-specific arguments will be added.
######### NOTE The method uses an underscore (_) instead of a hyphen (-) to avoid confusion in argument naming.
################### Examples
>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> SVSTask.add_task_arguments(parser)
>>> args = parser.parse_args()
- Raises:ValueError – If the argument configurations are invalid.
classmethod build_collate_fn(args: Namespace, train: bool) → Callable[[Collection[Tuple[str, Dict[str, ndarray]]]], Tuple[List[str], Dict[str, Tensor]]]
Builds a collate function for the SVSTask.
This method constructs a callable that collates a batch of data during training or evaluation. It pads sequences appropriately and prepares tensors for input into the model.
Parameters:
- args (argparse.Namespace) – The parsed command line arguments.
- train (bool) – A flag indicating whether the function is being used for training or evaluation.
Returns: Callable[[Collection[Tuple[str, Dict[str, np.ndarray]]], : Tuple[List[str], Dict[str, torch.Tensor]]]]:
A collate function that processes a batch of data.
################### Examples
>>> from argparse import Namespace
>>> args = Namespace()
>>> collate_fn = SVSTask.build_collate_fn(args, train=True)
>>> batch = [("file1", {"feature": np.array([1, 2, 3])}),
("file2", {"feature": np.array([4, 5])})]
>>> collated_data = collate_fn(batch)
>>> print(collated_data)
classmethod build_model(args: Namespace) → ESPnetSVSModel
Build and return an instance of the ESPnetSVSModel based on the provided
arguments.
This method constructs a singing voice synthesis model by setting up various components such as feature extraction, normalization, and the model architecture itself. The model is built using configurations specified in the args parameter.
- Parameters:
- args (argparse.Namespace) – The command line arguments containing model
- settings (configuration)
- list (including token)
- dimension (output)
:param : :param feature extractors: :param normalization methods: :param and SVS architecture: :param parameters.:
- Returns: An instance of the ESPnet singing voice synthesis model configured according to the specified arguments.
- Return type:ESPnetSVSModel
- Raises:RuntimeError – If token_list is neither a string nor a list/tuple.
################### Examples
Example of building a model with specified arguments
args = argparse.Namespace(
token_list=’path/to/token_list.txt’, odim=80, feats_extract=’fbank’, feats_extract_conf={‘hop_length’: 256}, normalize=’global_mvn’, svs=’naive_rnn’, model_conf={}
) model = SVSTask.build_model(args) print(model)
######### NOTE The token_list can be provided as a path to a file or directly as a list of tokens. The method handles both cases and will read the token list from the file if a string path is provided.
classmethod build_preprocess_fn(args: Namespace, train: bool) → Callable[[str, Dict[str, array]], Dict[str, ndarray]] | None
Builds a preprocessing function for the singing voice synthesis task.
This method creates a preprocessing function based on the specified arguments. If preprocessing is enabled, it initializes an instance of SVSPreprocessor with the given configuration parameters. If preprocessing is not enabled, it returns None.
- Parameters:
- cls – The class reference.
- args (argparse.Namespace) – The arguments namespace containing configuration parameters.
- train (bool) – Indicates whether the function is being built for training or evaluation.
- Returns: A preprocessing function if args.use_preprocessor is True, otherwise None.
- Return type: Optional[Callable[[str, Dict[str, np.ndarray]], Dict[str, np.ndarray]]]
################### Examples
>>> args = argparse.Namespace(use_preprocessor=True, token_type='phn',
... token_list='path/to/token_list.txt',
... bpemodel=None, non_linguistic_symbols=None,
... cleaner=None, g2p=None, fs=24000,
... feats_extract_conf={"hop_length": 256})
>>> preprocess_fn = SVSTask.build_preprocess_fn(args, train=True)
>>> print(preprocess_fn)
<function SVSPreprocessor at 0x...>
classmethod build_vocoder_from_file(vocoder_config_file: Path | str | None = None, vocoder_file: Path | str | None = None, model: ESPnetSVSModel | None = None, device: str = 'cpu')
Builds a vocoder from the specified configuration and model files.
This method allows for the construction of a vocoder based on a provided configuration file and an optional vocoder file. If the vocoder file is not provided, it defaults to using the Griffin-Lim algorithm for vocoding.
- Parameters:
- vocoder_config_file (Union *[*Path , str ] , optional) – Path to the vocoder configuration file in YAML format. If provided, it will be used to initialize the vocoder parameters.
- vocoder_file (Union *[*Path , str ] , optional) – Path to the vocoder model file. If this is a .pkl file, it is expected to be a trained model using Parallel WaveGAN.
- model (Optional [ESPnetSVSModel ] , optional) – An instance of the SVS model from which to extract features if vocoder_file is not specified.
- device (str , optional) – The device on which to load the vocoder model. Defaults to “cpu”.
- Returns: Returns an instance of the vocoder (either Spectrogram2Waveform or ParallelWaveGANPretrainedVocoder) if successfully built; otherwise returns None if the vocoder could not be constructed.
- Return type: Union[None, Spectrogram2Waveform, ParallelWaveGANPretrainedVocoder]
- Raises:ValueError – If the provided vocoder_file format is not supported.
################### Examples
To build a vocoder using a configuration file:
``
`
python vocoder = SVSTask.build_vocoder_from_file(
vocoder_config_file=’path/to/vocoder_config.yaml’, model=my_svs_model
)
To build a vocoder using a pre-trained model file:
``
`
python vocoder = SVSTask.build_vocoder_from_file(
vocoder_file=’path/to/vocoder_model.pkl’, vocoder_config_file=’path/to/vocoder_config.yaml’
)
######### NOTE If no vocoder file is provided, Griffin-Lim will be used as a fallback vocoder.
class_choices_list
num_optimizers
classmethod optional_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]
Optional data names for the singing voice synthesis task.
This method provides a list of optional data names that can be used during training or inference for the singing voice synthesis task. The returned names depend on the mode (training or inference) and can include various features and attributes related to the synthesis process.
- Parameters:
- train (bool) – A flag indicating whether the task is in training mode. Defaults to True.
- inference (bool) – A flag indicating whether the task is in inference mode. Defaults to False.
- Returns: A tuple containing the names of optional data.
- Return type: Tuple[str, …]
################### Examples
In training mode
optional_data = SVSTask.optional_data_names(train=True) print(optional_data)
Output: (‘spembs’, ‘durations’, ‘pitch’, ‘energy’, ‘sids’, ‘lids’, ‘feats’, ‘ying’)
In inference mode
optional_data = SVSTask.optional_data_names(inference=True) print(optional_data)
Output: (‘spembs’, ‘singing’, ‘pitch’, ‘durations’, ‘sids’, ‘lids’)
classmethod required_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]
Get the required data names for the singing voice synthesis task.
This method returns a tuple of required data names based on whether the task is in training or inference mode. The data names vary depending on the mode.
- Parameters:
- train (bool) – Indicates if the task is in training mode. Default is True.
- inference (bool) – Indicates if the task is in inference mode. Default is False.
- Returns: A tuple containing the names of the required data.
- Return type: Tuple[str, …]
################### Examples
>>> SVSTask.required_data_names(train=True, inference=False)
('text', 'singing', 'score', 'label')
>>> SVSTask.required_data_names(train=False, inference=True)
('text', 'score', 'label')
######### NOTE The required data names differ when the task is in inference mode, where ‘singing’ is not required.
trainer
alias of Trainer