espnet2.svs.abs_svs.AbsSVS

About 2 min

espnet2.svs.abs_svs.AbsSVS

class espnet2.svs.abs_svs.AbsSVS(*args, **kwargs)

Bases: Module, ABC

Singing Voice Synthesis (SVS) abstract class.

This class serves as an abstract base for singing voice synthesis models. It defines the essential methods that any concrete implementation must override, ensuring a consistent interface for SVS functionalities.

require_raw_singing

Indicates whether raw singing data is required for the synthesis process. Defaults to False.

Type: bool

require_vocoder

Indicates whether a vocoder is required for the synthesis process. Defaults to True.

Type: bool

forward()

Calculates outputs and returns the loss tensor.

inference()

Returns predicted output as a dictionary.

######### Examples

To create a concrete implementation of the AbsSVS class, one must subclass it and implement the abstract methods:

``

python class MySVS(AbsSVS):

def forward(self, text, text_lengths, feats, feats_lengths,

**

kwargs): : # Implementation of the forward method pass

def inference(self, text,

**

kwargs): : # Implementation of the inference method pass

``

NOTE

This class uses PyTorch as the underlying framework and inherits from torch.nn.Module to ensure compatibility with PyTorch’s neural network components.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(text: Tensor, text_lengths: Tensor, feats: Tensor, feats_lengths: Tensor, **kwargs) → Tuple[Tensor, Dict[str, Tensor], Tensor]

Calculate outputs and return the loss tensor.

This method is responsible for processing the input tensors and generating the output tensors along with the corresponding loss. The implementation of this method must be provided in subclasses of the AbsSVS class.

Parameters:
- text (torch.Tensor) – The input text represented as a tensor.
- text_lengths (torch.Tensor) – The lengths of the input text sequences.
- feats (torch.Tensor) – The feature representations of the audio.
- feats_lengths (torch.Tensor) – The lengths of the feature sequences.
- **kwargs – Additional keyword arguments for specific implementations.
Returns: A tuple containing:
- The loss tensor.
- A dictionary of auxiliary outputs.
- A tensor representing any additional output information.
Return type: Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor]
Raises:NotImplementedError – If the method is not implemented in a subclass.

######### Examples

Example of using the forward method in a subclass

class MySVS(AbsSVS):

def forward(self, text, text_lengths, feats, feats_lengths,

**

kwargs): : # Implementation goes here pass

abstract inference(text: Tensor, **kwargs) → Dict[str, Tensor]

Singing-voice-synthesis abstract class.

This class serves as a base for singing voice synthesis (SVS) models, defining the necessary methods and properties that must be implemented by any derived class.

require_raw_singing

Indicates whether raw singing data is required.

Type: bool

require_vocoder

Indicates whether a vocoder is required for synthesis.

Type: bool

forward(text, text_lengths, feats, feats_lengths, **kwargs)

Calculate outputs and return the loss tensor.

inference(text, **kwargs)

Return predicted output as a dictionary.

Raises:NotImplementedError – If the method is not implemented in the derived class.

######### Examples

class MySVS(AbsSVS): : def forward(self, text, text_lengths, feats, feats_lengths,

**

kwargs): : # Implement forward logic here pass def inference(self, text,

**

kwargs): : # Implement inference logic here return

my_svs = MySVS() output = my_svs.inference(torch.tensor([1, 2, 3]))

property require_raw_singing

Return whether or not raw_singing is required.

property require_vocoder

Return whether or not vocoder is required.