espnet2.speechlm.core_lm.abs_core_lm.AbsCoreLM

About 2 min

espnet2.speechlm.core_lm.abs_core_lm.AbsCoreLM

class espnet2.speechlm.core_lm.abs_core_lm.AbsCoreLM(*args, **kwargs)

Bases: Module, ABC

The abstract CoreLM class for SpeechLM, which is the major component of SpeechLM.

It supports or is going to support several styles of SpeechLM: Auto-Regressive (AR):

SpearTTS: https://arxiv.org/abs/2302.03540 (TODO) MusicGen: https://arxiv.org/abs/2306.05284 (TODO) UniAudio: https://arxiv.org/abs/2310.00704

Non-Auto-Regressive (NAR): : SoundStorm: https://arxiv.org/abs/2305.09636 (TODO)

Auto-Regressive + Non-Auto-Regressive (AR + NRA): Hybrid of AR and NAR. : Vall-E: https://arxiv.org/abs/2301.02111

For developers: to build a new core_lm model, try to follow: : 1. Build with Espnet Espnet internal modules: Use modules from espnet2.speechlm.module.transformer.py. If you get some modules that are specific to your model, put them under espnet2.speechlm.module.<model_name>.py. 2. or, Build with HuggingFace model/modules: Put everything in espnet2.speechlm.core_lm.<model_name>.py. Usually, this is just a wrapper that bridges HF models into Espnet SpeechLM.

Reminder: try to avoid any model dependency beyond espnet2.speechlm.

None

forward()

Abstract method for model forward pass.

inference()

Method for performing inference with the model.

Raises:NotImplementedError – If the method is not implemented in a subclass.

####### Examples

Example subclass implementation

class MyCoreLM(AbsCoreLM):

def forward(self, dec_seq, dec_seq_lengths=None, enc_seq=None, : > enc_seq_lengths=None, prefix_len=None): <br/>
Implementation here
pass

def inference(self, prefix, opts, enc_seq=None, suffix=None): : # Implementation here pass

Initialize internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(dec_seq: Tensor, dec_seq_lengths: Tensor | None = None, enc_seq: Tensor | None = None, enc_seq_lengths: Tensor | None = None, prefix_len: Tensor | None = None) → Tuple[Tensor, Dict, Tensor]

The abstract CoreLM class for SpeechLM, which is the major component of SpeechLM.

It supports or is going to support several styles of SpeechLM: Auto-Regressive (AR):

SpearTTS: https://arxiv.org/abs/2302.03540 (TODO) MusicGen: https://arxiv.org/abs/2306.05284 (TODO) UniAudio: https://arxiv.org/abs/2310.00704

Non-Auto-Regressive (NAR): : SoundStorm: https://arxiv.org/abs/2305.09636 (TODO)

Auto-Regressive + Non-Auto-Regressive (AR + NRA): Hybrid of AR and NAR. : Vall-E: https://arxiv.org/abs/2301.02111

For developers: to build a new core_lm model, try to follow: : 1. Build with Espnet Espnet internal modules: Use modules from espnet2.speechlm.module.transformer.py. If you get some modules that is specific to your model, put them under espnet2.speechlm.module.<model_name>.py. 2. or, Build with HuggingFace model/modules: Put everything in espnet2.speechlm.core_lm.<model_name>.py. Usually this is just a wrapper that bridges HF models into Espnet SpeechLM.

Reminder: try to avoid any model dependency beyond espnet2.speechlm.

inference(prefix: Tensor, opts: SpeechLMInferenceOptions, enc_seq: Tensor | None = None, suffix: Tensor | None = None)

The abstract CoreLM class for SpeechLM, which is the major component of SpeechLM.

It supports or is going to support several styles of SpeechLM: Auto-Regressive (AR):

SpearTTS: https://arxiv.org/abs/2302.03540 (TODO) MusicGen: https://arxiv.org/abs/2306.05284 (TODO) UniAudio: https://arxiv.org/abs/2310.00704

Non-Auto-Regressive (NAR): : SoundStorm: https://arxiv.org/abs/2305.09636 (TODO)

Auto-Regressive + Non-Auto-Regressive (AR + NRA): Hybrid of AR and NAR. : Vall-E: https://arxiv.org/abs/2301.02111

Reminder: try to avoid any model dependency beyond espnet2.speechlm.

None

Parameters:None
Returns: None
Yields: None
Raises:NotImplementedError – If the method is not implemented by a subclass.

####### Examples

None

NOTE

This class serves as a base for implementing specific CoreLM models.