espnet2.svs.abs_svs.AbsSVS
espnet2.svs.abs_svs.AbsSVS
class espnet2.svs.abs_svs.AbsSVS(*args, **kwargs)
Bases: Module
, ABC
Singing Voice Synthesis (SVS) abstract class.
This class serves as an abstract base for singing voice synthesis models. It defines the essential methods that any concrete implementation must override, ensuring a consistent interface for SVS functionalities.
require_raw_singing
Indicates whether raw singing data is required for the synthesis process. Defaults to False.
- Type: bool
require_vocoder
Indicates whether a vocoder is required for the synthesis process. Defaults to True.
- Type: bool
forward()
Calculates outputs and returns the loss tensor.
inference()
Returns predicted output as a dictionary.
######### Examples
To create a concrete implementation of the AbsSVS class, one must subclass it and implement the abstract methods:
``
`
python class MySVS(AbsSVS):
def forward(self, text, text_lengths, feats, feats_lengths,
**
kwargs): : # Implementation of the forward method pass
def inference(self, text,
**
kwargs): : # Implementation of the inference method pass
``
`
NOTE
This class uses PyTorch as the underlying framework and inherits from torch.nn.Module to ensure compatibility with PyTorch’s neural network components.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
abstract forward(text: Tensor, text_lengths: Tensor, feats: Tensor, feats_lengths: Tensor, **kwargs) → Tuple[Tensor, Dict[str, Tensor], Tensor]
Calculate outputs and return the loss tensor.
This method is responsible for processing the input tensors and generating the output tensors along with the corresponding loss. The implementation of this method must be provided in subclasses of the AbsSVS class.
- Parameters:
- text (torch.Tensor) – The input text represented as a tensor.
- text_lengths (torch.Tensor) – The lengths of the input text sequences.
- feats (torch.Tensor) – The feature representations of the audio.
- feats_lengths (torch.Tensor) – The lengths of the feature sequences.
- **kwargs – Additional keyword arguments for specific implementations.
- Returns: A tuple containing:
- The loss tensor.
- A dictionary of auxiliary outputs.
- A tensor representing any additional output information.
- Return type: Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor]
- Raises:NotImplementedError – If the method is not implemented in a subclass.
######### Examples
Example of using the forward method in a subclass
class MySVS(AbsSVS):
def forward(self, text, text_lengths, feats, feats_lengths,
**
kwargs): : # Implementation goes here pass
abstract inference(text: Tensor, **kwargs) → Dict[str, Tensor]
Singing-voice-synthesis abstract class.
This class serves as a base for singing voice synthesis (SVS) models, defining the necessary methods and properties that must be implemented by any derived class.
require_raw_singing
Indicates whether raw singing data is required.
- Type: bool
require_vocoder
Indicates whether a vocoder is required for synthesis.
- Type: bool
forward(text, text_lengths, feats, feats_lengths, **kwargs)
Calculate outputs and return the loss tensor.
inference(text, **kwargs)
Return predicted output as a dictionary.
- Raises:NotImplementedError – If the method is not implemented in the derived class.
######### Examples
class MySVS(AbsSVS): : def forward(self, text, text_lengths, feats, feats_lengths, <br/>
**
<br/> kwargs): : # Implement forward logic here pass <br/> def inference(self, text, <br/>
**
<br/> kwargs): : # Implement inference logic here return
my_svs = MySVS() output = my_svs.inference(torch.tensor([1, 2, 3]))
property require_raw_singing
Return whether or not raw_singing is required.
property require_vocoder
Return whether or not vocoder is required.