espnet2.asr.decoder.transformer_decoder.BaseTransformerDecoder
espnet2.asr.decoder.transformer_decoder.BaseTransformerDecoder
class espnet2.asr.decoder.transformer_decoder.BaseTransformerDecoder(vocab_size: int, encoder_output_size: int, dropout_rate: float = 0.1, positional_dropout_rate: float = 0.1, input_layer: str = 'embed', use_output_layer: bool = True, pos_enc_class=<class 'espnet2.legacy.nets.pytorch_backend.transformer.embedding.PositionalEncoding'>, normalize_before: bool = True, gradient_checkpoint_layers: ~typing.List[int] = [])
Bases: AbsDecoder, BatchScorerInterface, MaskParallelScorerInterface
Base class of Transfomer decoder module.
- Parameters:
- vocab_size β output dim
- encoder_output_size β dimension of attention
- attention_heads β the number of heads of multi head attention
- linear_units β the number of units of position-wise feed forward
- num_blocks β the number of decoder blocks
- dropout_rate β dropout rate
- self_attention_dropout_rate β dropout rate for attention
- input_layer β input layer type
- use_output_layer β whether to use output layer
- pos_enc_class β PositionalEncoding or ScaledPositionalEncoding
- normalize_before β whether to use layer_norm before the first block
- concat_after β whether to concat attention layerβs input and output if True, additional linear will be applied. i.e. x -> x + linear(concat(x, att(x))) if False, no additional linear will be applied. i.e. x -> x + att(x)
Initialize internal Module state, shared by both nn.Module and ScriptModule.
batch_score(ys: Tensor, states: List[Any], xs: Tensor, return_hs: bool = False) β Tuple[Tensor, List[Any]]
Score new token batch.
- Parameters:
- ys (torch.Tensor) β torch.int64 prefix tokens (n_batch, ylen).
- states (List *[*Any ]) β Scorer states for prefix tokens.
- xs (torch.Tensor) β The encoder feature that generates ys (n_batch, xlen, n_feat).
- Returns: Tuple of : batchfied scores for next token with shape of (n_batch, n_vocab) and next state list for ys.
- Return type: tuple[torch.Tensor, List[Any]]
batch_score_partially_AR(ys: Tensor, states: List[Any], xs: Tensor, yseq_lengths: Tensor) β Tuple[Tensor, List[Any]]
forward(hs_pad: Tensor, hlens: Tensor, ys_in_pad: Tensor, ys_in_lens: Tensor, return_hs: bool = False, return_all_hs: bool = False) β Tuple[Tensor, Tensor]
Forward decoder.
Parameters:
- hs_pad β encoded memory, float32 (batch, maxlen_in, feat)
- hlens β (batch)
- ys_in_pad β input token ids, int64 (batch, maxlen_out) if input_layer == βembedβ input tensor (batch, maxlen_out, #mels) in the other cases
- ys_in_lens β (batch)
- return_hs β (bool) whether to return the last hidden output before output layer
- return_all_hs β (bool) whether to return all the hidden intermediates
Returns: tuple containing:
x: decoded token score before softmax (batch, maxlen_out, token) : if use_output_layer is True,
olens: (batch, )
Return type: (tuple)
forward_one_step(tgt: Tensor, tgt_mask: Tensor, memory: Tensor, memory_mask: Tensor = None, , cache: List[Tensor] = None, return_hs: bool = False) β Tuple[Tensor, List[Tensor]]
Forward one step.
- Parameters:
- tgt β input token ids, int64 (batch, maxlen_out)
- tgt_mask β input token mask, (batch, maxlen_out) dtype=torch.uint8 in PyTorch 1.2- dtype=torch.bool in PyTorch 1.2+ (include 1.2)
- memory β encoded memory, float32 (batch, maxlen_in, feat)
- memory_mask β encoded memory mask (batch, 1, maxlen_in)
- cache β cached output list of (batch, max_time_out-1, size)
- return_hs β dec hidden state corresponding to ys, used for searchable hidden ints
- Returns: NN output value and cache per self.decoders. y.shape` is (batch, maxlen_out, token)
- Return type: y, cache
forward_partially_AR(tgt: Tensor, tgt_mask: Tensor, tgt_lengths: Tensor, memory: Tensor, cache: List[Tensor] = None) β Tuple[Tensor, List[Tensor]]
Forward one step.
- Parameters:
- tgt β input token ids, int64 (n_mask * n_beam, maxlen_out)
- tgt_mask β input token mask, (n_mask * n_beam, maxlen_out) dtype=torch.uint8 in PyTorch 1.2- dtype=torch.bool in PyTorch 1.2+ (include 1.2)
- tgt_lengths β (n_mask * n_beam, )
- memory β encoded memory, float32 (batch, maxlen_in, feat)
- cache β cached output list of (batch, max_time_out-1, size)
- Returns: NN output value and cache per self.decoders. y.shape` is (batch, maxlen_out, token)
- Return type: y, cache
score(ys, state, x, return_hs=False)
Score.
