espnet2.s2st.synthesizer.abs_synthesizer.AbsSynthesizer
espnet2.s2st.synthesizer.abs_synthesizer.AbsSynthesizer
class espnet2.s2st.synthesizer.abs_synthesizer.AbsSynthesizer(*args, **kwargs)
Bases: Module
, ABC
Abstract base class for Text-to-Speech (TTS) synthesizers.
This class defines the core interface for TTS synthesizers, including methods for forward processing and inference. Any subclass must implement these methods to provide specific functionality for synthesizing speech from input states.
require_raw_speech
Indicates whether raw speech input is required.
- Type: bool
require_vocoder
Indicates whether a vocoder is required for synthesis.
- Type: bool
forward(input_states, input_states_lengths, feats, feats_lengths, **kwargs)
Calculate outputs and return the loss tensor.
inference(input_states, **kwargs)
Return predicted output as a dict.
- Raises:NotImplementedError – If a subclass does not implement the required methods.
######### Examples
class MySynthesizer(AbsSynthesizer): : def forward(self, input_states, input_states_lengths, feats, : > feats_lengths, <br/> > > ** >
<br/> > kwargs): <br/> # Implementation here pass <br/> def inference(self, input_states, <br/>
**
<br/> kwargs): : # Implementation here pass
synthesizer = MySynthesizer() print(synthesizer.require_raw_speech) # Output: False print(synthesizer.require_vocoder) # Output: True
Initialize internal Module state, shared by both nn.Module and ScriptModule.
abstract forward(input_states: Tensor, input_states_lengths: Tensor, feats: Tensor, feats_lengths: Tensor, **kwargs) → Tuple[Tensor, Dict[str, Tensor], Tensor]
Calculate outputs and return the loss tensor.
This method is responsible for processing the input states and features to produce the model outputs, which include the loss tensor and any additional information as a dictionary. The input tensors must adhere to specific dimensions that correspond to the expected data format.
- Parameters:
- input_states (torch.Tensor) – A tensor containing the input states.
- input_states_lengths (torch.Tensor) – A tensor indicating the lengths of the input states.
- feats (torch.Tensor) – A tensor containing the features for synthesis.
- feats_lengths (torch.Tensor) – A tensor indicating the lengths of the features.
- **kwargs – Additional keyword arguments for specific configurations.
- Returns: A tuple : containing the loss tensor, a dictionary of outputs, and the additional tensor required for processing.
- Return type: Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor]
- Raises:NotImplementedError – If the method is not implemented in a subclass.
######### Examples
>>> model = MySynthesizer() # MySynthesizer should inherit from AbsSynthesizer
>>> input_states = torch.randn(1, 10, 256) # Example input tensor
>>> input_states_lengths = torch.tensor([10])
>>> feats = torch.randn(1, 20, 80) # Example feature tensor
>>> feats_lengths = torch.tensor([20])
>>> loss, outputs, additional = model.forward(
... input_states, input_states_lengths, feats, feats_lengths)
abstract inference(input_states: Tensor, **kwargs) → Dict[str, Tensor]
AbsSynthesizer is an abstract class for text-to-speech (TTS) synthesis models.
This class defines the essential methods and properties that any TTS synthesizer should implement, including the forward method for calculating outputs and losses, and the inference method for generating predicted outputs. It also includes properties to specify requirements for raw speech and vocoder.
require_raw_speech
Indicates whether raw speech is required.
- Type: bool
require_vocoder
Indicates whether a vocoder is required.
- Type: bool
forward(input_states, input_states_lengths, feats, feats_lengths, **kwargs)
Calculate outputs and return the loss tensor.
inference(input_states, **kwargs)
Return predicted output as a dict.
######### Examples
class MySynthesizer(AbsSynthesizer): : def forward(self, input_states, input_states_lengths, feats, : > feats_lengths, <br/> > > ** >
<br/> > kwargs): <br/> # Implement forward logic pass <br/> def inference(self, input_states, <br/>
**
<br/> kwargs): : # Implement inference logic return
synthesizer = MySynthesizer() output = synthesizer.inference(torch.tensor([[0.0]])) print(output)
NOTE
This class cannot be instantiated directly and must be subclassed.
property require_raw_speech
Return whether or not raw_speech is required.
property require_vocoder
Return whether or not vocoder is required.