espnet2.speechlm.core_lm.abs_core_lm.AbsCoreLM
espnet2.speechlm.core_lm.abs_core_lm.AbsCoreLM
class espnet2.speechlm.core_lm.abs_core_lm.AbsCoreLM(*args, **kwargs)
Bases: Module
, ABC
The abstract CoreLM class for SpeechLM, which is the major component of SpeechLM.
It supports or is going to support several styles of SpeechLM: Auto-Regressive (AR):
SpearTTS: https://arxiv.org/abs/2302.03540 (TODO) MusicGen: https://arxiv.org/abs/2306.05284 (TODO) UniAudio: https://arxiv.org/abs/2310.00704
Non-Auto-Regressive (NAR): : SoundStorm: https://arxiv.org/abs/2305.09636 (TODO)
Auto-Regressive + Non-Auto-Regressive (AR + NRA): Hybrid of AR and NAR. : Vall-E: https://arxiv.org/abs/2301.02111
For developers: to build a new core_lm model, try to follow: : 1. Build with Espnet Espnet internal modules: Use modules from espnet2.speechlm.module.transformer.py. If you get some modules that are specific to your model, put them under espnet2.speechlm.module.<model_name>.py. 2. or, Build with HuggingFace model/modules: Put everything in espnet2.speechlm.core_lm.<model_name>.py. Usually, this is just a wrapper that bridges HF models into Espnet SpeechLM.
Reminder: try to avoid any model dependency beyond espnet2.speechlm.
None
forward()
Abstract method for model forward pass.
inference()
Method for performing inference with the model.
- Raises:NotImplementedError – If the method is not implemented in a subclass.
####### Examples
Example subclass implementation
class MyCoreLM(AbsCoreLM):
def forward(self, dec_seq, dec_seq_lengths=None, enc_seq=None, : > enc_seq_lengths=None, prefix_len=None): <br/>
Implementation here
pass
def inference(self, prefix, opts, enc_seq=None, suffix=None): : # Implementation here pass
Initialize internal Module state, shared by both nn.Module and ScriptModule.
abstract forward(dec_seq: Tensor, dec_seq_lengths: Tensor | None = None, enc_seq: Tensor | None = None, enc_seq_lengths: Tensor | None = None, prefix_len: Tensor | None = None) → Tuple[Tensor, Dict, Tensor]
The abstract CoreLM class for SpeechLM, which is the major component of SpeechLM.
It supports or is going to support several styles of SpeechLM: Auto-Regressive (AR):
SpearTTS: https://arxiv.org/abs/2302.03540 (TODO) MusicGen: https://arxiv.org/abs/2306.05284 (TODO) UniAudio: https://arxiv.org/abs/2310.00704
Non-Auto-Regressive (NAR): : SoundStorm: https://arxiv.org/abs/2305.09636 (TODO)
Auto-Regressive + Non-Auto-Regressive (AR + NRA): Hybrid of AR and NAR. : Vall-E: https://arxiv.org/abs/2301.02111
For developers: to build a new core_lm model, try to follow: : 1. Build with Espnet Espnet internal modules: Use modules from espnet2.speechlm.module.transformer.py. If you get some modules that is specific to your model, put them under espnet2.speechlm.module.<model_name>.py. 2. or, Build with HuggingFace model/modules: Put everything in espnet2.speechlm.core_lm.<model_name>.py. Usually this is just a wrapper that bridges HF models into Espnet SpeechLM.
Reminder: try to avoid any model dependency beyond espnet2.speechlm.
inference(prefix: Tensor, opts: SpeechLMInferenceOptions, enc_seq: Tensor | None = None, suffix: Tensor | None = None)
The abstract CoreLM class for SpeechLM, which is the major component of SpeechLM.
It supports or is going to support several styles of SpeechLM: Auto-Regressive (AR):
SpearTTS: https://arxiv.org/abs/2302.03540 (TODO) MusicGen: https://arxiv.org/abs/2306.05284 (TODO) UniAudio: https://arxiv.org/abs/2310.00704
Non-Auto-Regressive (NAR): : SoundStorm: https://arxiv.org/abs/2305.09636 (TODO)
Auto-Regressive + Non-Auto-Regressive (AR + NRA): Hybrid of AR and NAR. : Vall-E: https://arxiv.org/abs/2301.02111
For developers: to build a new core_lm model, try to follow: : 1. Build with Espnet Espnet internal modules: Use modules from espnet2.speechlm.module.transformer.py. If you get some modules that are specific to your model, put them under espnet2.speechlm.module.<model_name>.py. 2. or, Build with HuggingFace model/modules: Put everything in espnet2.speechlm.core_lm.<model_name>.py. Usually, this is just a wrapper that bridges HF models into Espnet SpeechLM.
Reminder: try to avoid any model dependency beyond espnet2.speechlm.
None
- Parameters:None
- Returns: None
- Yields: None
- Raises:NotImplementedError – If the method is not implemented by a subclass.
####### Examples
None
NOTE
This class serves as a base for implementing specific CoreLM models.