espnet2.asr_transducer.beam_search_transducer.Hypothesis

About 2 min

espnet2.asr_transducer.beam_search_transducer.Hypothesis

class espnet2.asr_transducer.beam_search_transducer.Hypothesis(score: float, yseq: List[int], dec_state: Tuple[Tensor, Tensor | None] | None = None, lm_state: Dict[str, Any] | List[Any] | None = None)

Bases: object

Search algorithms for Transducer models.

This module implements search algorithms for Transducer models, including the Beam Search algorithm. The Hypothesis class defines the default hypothesis structure used in these search algorithms.

Classes: : Hypothesis: Represents a single hypothesis with its associated score, : label sequence, and states. <br/> ExtendedHypothesis: An extension of the Hypothesis class that includes : decoder output and language model scores. <br/> BeamSearchTransducer: Implements beam search for transducer models.

Hypothesis

score: Total log-probability of the hypothesis. yseq: Label sequence represented as a list of integer IDs. dec_state: RNN/MEGA Decoder state (None if stateless). lm_state: RNNLM state, can be a tuple of (N, D_lm) or None.

ExtendedHypothesis

dec_out: Decoder output sequence of shape (B, D_dec). lm_score: Log-probabilities of the language model for given labels

of shape (vocab_size).

BeamSearchTransducer

decoder: Decoder module used in the beam search. joint_network: Joint network module for scoring. beam_size: Size of the beam for search. lm: Language model module for scoring. lm_weight: Weight for the language model during scoring. search_type: Type of search algorithm to use. max_sym_exp: Maximum symbol expansions at each time step. u_max: Maximum expected target sequence length. nstep: Maximum expansion steps at each time step. expansion_gamma: Allowed log-probability difference for pruning. expansion_beta: Additional candidates for hypothesis selection. score_norm: Whether to normalize final scores by length. nbest: Number of final hypotheses to return. streaming: Whether to perform chunk-by-chunk beam search.

Parameters:
- Hypothesis –
  score (float): Total log-probability of the hypothesis. yseq (List[int]): Sequence of label IDs. dec_state (Optional[Tuple[torch.Tensor, Optional[torch.Tensor]]]):
  Decoder state.
  lm_state (Optional[Union[Dict[str, Any], List[Any]]]): Language model state.
- ExtendedHypothesis – dec_out (torch.Tensor, optional): Decoder output sequence. lm_score (torch.Tensor, optional): Log-probabilities of LM.
- BeamSearchTransducer – decoder (AbsDecoder): Decoder module for the transducer. joint_network (JointNetwork): Joint network module for scoring. beam_size (int): Size of the beam. lm (Optional[torch.nn.Module]): Language model module. lm_weight (float): Weight for language model in scoring. search_type (str): Algorithm to use during inference. max_sym_exp (int): Maximum symbol expansions at each time step. u_max (int): Maximum expected target sequence length. nstep (int): Maximum expansion steps at each time step. expansion_gamma (float): Allowed log-probability difference for pruning. expansion_beta (int): Additional candidates for selection. score_norm (bool): Whether to normalize scores by length. nbest (int): Number of final hypotheses to return. streaming (bool): Whether to perform chunk-by-chunk beam search.
Returns: A list of n-best hypotheses after performing beam search.
Return type:Hypothesis

Examples

>>> hyp = Hypothesis(score=0.0, yseq=[1, 2, 3])
>>> hyp.score
0.0
>>> hyp.yseq
[1, 2, 3]

>>> transducer = BeamSearchTransducer(decoder, joint_network, beam_size=5)
>>> results = transducer(enc_out)
>>> len(results)
5

NOTE

The Hypothesis class is designed to store and manage the state of hypotheses during the beam search process.

dec_state : Tuple[Tensor, Tensor | [None](../asr/AbsDecoder.md#espnet2.asr.decoder.abs_decoder.AbsDecoder.None)] | None = None

lm_state : Dict[str, Any] | List[Any] | [None](../asr/AbsDecoder.md#espnet2.asr.decoder.abs_decoder.AbsDecoder.None) = None

score : float

yseq : List[int]