espnet2.tts.abs_tts.AbsTTS
espnet2.tts.abs_tts.AbsTTS
class espnet2.tts.abs_tts.AbsTTS(*args, **kwargs)
Bases: Module
, ABC
Abstract base class for Text-to-Speech (TTS) models.
This class defines the essential methods and properties that any TTS implementation should have. It inherits from torch.nn.Module and provides an interface for forward processing and inference of TTS models.
require_raw_speech
Indicates whether raw speech is required for the TTS model. Default is False.
- Type: bool
require_vocoder
Indicates whether a vocoder is required for the TTS model. Default is True.
- Type: bool
forward(text
torch.Tensor, text_lengths: torch.Tensor, feats: torch.Tensor, feats_lengths: torch.Tensor,
**
kwargs) -> Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor]: Abstract method that must be implemented to calculate outputs and return the loss tensor.
inference(text
torch.Tensor,
**
kwargs) -> Dict[str, torch.Tensor]: Abstract method that must be implemented to return the predicted output as a dictionary.
- Raises:NotImplementedError – If the abstract methods are not implemented in a subclass.
Examples
To create a concrete TTS model, subclass AbsTTS and implement the abstract methods. Here is an example:
``
`
python class MyTTS(AbsTTS):
def forward(self, text, text_lengths, feats, feats_lengths,
**
kwargs): : # Implementation of the forward method pass
def inference(self, text,
**
kwargs): : # Implementation of the inference method pass
``
`
Initialize internal Module state, shared by both nn.Module and ScriptModule.
abstract forward(text: Tensor, text_lengths: Tensor, feats: Tensor, feats_lengths: Tensor, **kwargs) → Tuple[Tensor, Dict[str, Tensor], Tensor]
Calculate outputs and return the loss tensor.
This method is responsible for processing the input text and its corresponding features to compute the model’s outputs and the associated loss. It is an abstract method that must be implemented by any subclass of AbsTTS.
- Parameters:
- text (torch.Tensor) – A tensor representing the input text.
- text_lengths (torch.Tensor) – A tensor containing the lengths of the input text sequences.
- feats (torch.Tensor) – A tensor representing the feature inputs for the model.
- feats_lengths (torch.Tensor) – A tensor containing the lengths of the feature sequences.
- **kwargs – Additional keyword arguments for flexibility in implementation.
- Returns: A tuple containing:
- A tensor representing the output of the model.
- A dictionary with additional outputs, where keys are output names and values are corresponding tensors.
- A tensor representing the loss.
- Return type: Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor]
- Raises:
- NotImplementedError – If this method is called directly without
- implementation in a subclass. –
abstract inference(text: Tensor, **kwargs) → Dict[str, Tensor]
Text-to-speech abstract class.
This class serves as an abstract base class for text-to-speech (TTS) models. It defines the core methods that must be implemented by any subclass, ensuring that the subclasses adhere to a common interface for both training and inference.
forward()
Calculate outputs and return the loss tensor.
inference()
Return predicted output as a dict.
require_raw_speech
Indicates whether raw speech is required.
- Type: bool
require_vocoder
Indicates whether a vocoder is required.
- Type: bool
property require_raw_speech
property require_vocoder