espnet2.asr_transducer.decoder.stateless_decoder.StatelessDecoder

About 5 min

espnet2.asr_transducer.decoder.stateless_decoder.StatelessDecoder

class espnet2.asr_transducer.decoder.stateless_decoder.StatelessDecoder(vocab_size: int, embed_size: int = 256, embed_dropout_rate: float = 0.0, embed_pad: int = 0)

Bases: AbsDecoder

Stateless decoder definition for Transducer models.

This class implements a stateless Transducer decoder module, which is designed to process input label sequences and generate corresponding output embeddings. It inherits from the abstract base class AbsDecoder.

embed

The embedding layer for converting label IDs to embeddings.

Type: torch.nn.Embedding

embed

_dropout_rate

Dropout layer for the embedding output.

Type: torch.nn.Dropout

output_size

The size of the output embeddings.

Type: int

vocab_size

The size of the vocabulary.

Type: int

device

The device on which the decoder is located.

Type: torch.device

score_cache

Cache for storing computed embeddings for label sequences to avoid redundant calculations.

Type: dict
Parameters:
- vocab_size (int) – Output size, representing the number of unique label IDs.
- embed_size (int , optional) – Size of the embedding vector. Defaults to 256.
- embed_dropout_rate (float , optional) – Dropout rate for the embedding layer. Defaults to 0.0.
- embed_pad (int , optional) – ID for the padding/blank symbol. Defaults to 0.

################### Examples

Initialize the decoder

decoder = StatelessDecoder(vocab_size=1000, embed_size=256)

Forward pass with label sequences

labels = torch.tensor([[1, 2, 3], [4, 5, 6]]) # Example label IDs output = decoder(labels)

Scoring a single label sequence

score, _ = decoder.score([1, 2, 3])

Batch scoring for multiple hypotheses

from espnet2.asr_transducer.beam_search_transducer import Hypothesis hyps = [Hypothesis(yseq=[1, 2, 3]), Hypothesis(yseq=[4, 5, 6])] batch_output, _ = decoder.batch_score(hyps)

Setting the device

decoder.set_device(torch.device(‘cuda’))

Initializing states for a batch

decoder.init_state(batch_size=2)

Selecting a specific state

decoder.select_state(None, idx=0)

Creating batch states

decoder.create_batch_states([None, None])

Construct a StatelessDecoder object.

batch_score(hyps: List[Hypothesis]) → Tuple[Tensor, None]

One-step forward hypotheses.

This method computes the output sequences for a batch of hypotheses by using the last label of each hypothesis. It processes the input in parallel to enhance efficiency.

Parameters:hyps – A list of Hypothesis objects containing the label sequences.
Returns: Decoder output sequences. Shape (B, D_dec), where B is the batch : size and D_dec is the dimension of the decoder output.
states: Decoder hidden states. Always returns None as this : implementation does not maintain hidden states.
Return type: out

################### Examples

>>> decoder = StatelessDecoder(vocab_size=100)
>>> hyps = [Hypothesis(yseq=[1]), Hypothesis(yseq=[2])]
>>> output, _ = decoder.batch_score(hyps)
>>> print(output.shape)  # Output: torch.Size([2, 256])

######### NOTE The method assumes that the Hypothesis objects are well-formed and contain valid label sequences.

create_batch_states(new_states: List[Tensor | None]) → None

Create decoder hidden states.

This method is responsible for creating and managing the hidden states for the decoder. The hidden states are typically used to maintain information across decoding steps in a sequence-to-sequence model.

Parameters:new_states – A list of new decoder hidden states, where each entry is of type Optional[torch.Tensor]. The expected shape is [N x None], where N is the number of states to be created.
Returns: This method does not return any value, as it modifies the internal state of the decoder.
Return type: None

################### Examples

>>> decoder = StatelessDecoder(vocab_size=1000)
>>> states = [None] * 5  # Create 5 new states
>>> decoder.create_batch_states(states)
>>> # States are now created and managed internally.

######### NOTE This function does not actually store the states; it is intended to be implemented in a derived class to handle state management according to specific requirements.

Raises:
- ValueError – If the input new_states does not conform to the
- expected format. –

forward(labels: Tensor, states: Any | None = None) → Tensor

Encode source label sequences.

This method takes a batch of label ID sequences and returns their corresponding embedded representations. The embeddings are obtained from an embedding layer followed by a dropout layer.

Parameters:
- labels – A tensor of shape (B, L) containing label ID sequences, where B is the batch size and L is the sequence length.
- states – Optional; Decoder hidden states. Currently unused and defaults to None.
Returns: A tensor of shape (B, U, D_emb) representing the embedded output sequences, where U is the length of the output sequence and D_emb is the embedding dimension.

################### Examples

>>> decoder = StatelessDecoder(vocab_size=100)
>>> labels = torch.tensor([[1, 2, 3], [4, 5, 6]])
>>> output = decoder.forward(labels)
>>> output.shape
torch.Size([2, 3, 256])  # Assuming embed_size is 256

######### NOTE The embedding dropout rate is applied during the embedding process to prevent overfitting.

init_state(batch_size: int) → None

Stateless decoder definition for Transducer models.

This module defines a StatelessDecoder class, which is a stateless Transducer decoder used in automatic speech recognition systems. It inherits from the AbsDecoder class and implements methods for forward decoding and state management.

output_size

The output size of the decoder.

Type: int

vocab_size

The size of the vocabulary.

Type: int

device

The device on which the model is allocated.

Type: torch.device

score_cache

A cache for storing computed scores to avoid redundant calculations.

Type: dict
Parameters:
- vocab_size (int) – Output size.
- embed_size (int , optional) – Embedding size. Defaults to 256.
- embed_dropout_rate (float , optional) – Dropout rate for embedding layer. Defaults to 0.0.
- embed_pad (int , optional) – Embed/Blank symbol ID. Defaults to 0.

################### Examples

Creating a StatelessDecoder instance

decoder = StatelessDecoder(vocab_size=1000, embed_size=256)

Initializing decoder states

initial_states = decoder.init_state(batch_size=32)

Forward pass with label sequences

labels = torch.randint(0, 1000, (32, 10)) # Batch of 32, sequence length 10 output = decoder.forward(labels)

Scoring a label sequence

score, _ = decoder.score([1, 2, 3])

Batch scoring hypotheses

from espnet2.asr_transducer.beam_search_transducer import Hypothesis hyps = [Hypothesis(yseq=[1, 2, 3])] # List of Hypothesis instances batch_output, _ = decoder.batch_score(hyps)

######### NOTE The decoder does not maintain state across different calls, hence it is stateless. This means that the init_state method always returns None.

score(label_sequence: List[int], states: Any | None = None) → Tuple[Tensor, None]

Stateless decoder definition for Transducer models.

This module implements a stateless Transducer decoder for ASR (Automatic Speech Recognition) models. It is designed to work with label sequences and provides methods for scoring and processing these sequences efficiently.

output_size

Size of the output embeddings.

Type: int

vocab_size

Size of the vocabulary.

Type: int

device

The device (CPU or GPU) where the model is located.

Type: torch.device

score_cache

A cache to store computed scores for label sequences.

Type: dict
Parameters:
- vocab_size (int) – Output size of the decoder.
- embed_size (int , optional) – Size of the embedding layer. Default is 256.
- embed_dropout_rate (float , optional) – Dropout rate for the embedding layer. Default is 0.0.
- embed_pad (int , optional) – Padding symbol ID for embeddings. Default is 0.

################### Examples

decoder = StatelessDecoder(vocab_size=1000, embed_size=256) label_sequence = [1, 2, 3] output, _ = decoder.score(label_sequence)

select_state(states: Tensor | None, idx: int) → None

Stateless decoder definition for Transducer models.

This module implements a stateless Transducer decoder, which is part of the ESPnet2 library. It is designed for use in automatic speech recognition tasks using transducer models.

vocab_size

The size of the vocabulary for the decoder.

Type: int

output_size

The output size of the embedding layer.

Type: int

device

The device on which the decoder is located.

Type: torch.device

score_cache

A cache for storing computed scores for label sequences.

Type: dict
Parameters:
- vocab_size (int) – Output size.
- embed_size (int , optional) – Embedding size. Default is 256.
- embed_dropout_rate (float , optional) – Dropout rate for the embedding layer. Default is 0.0.
- embed_pad (int , optional) – Embed/Blank symbol ID. Default is 0.

################### Examples

Create a StatelessDecoder instance

decoder = StatelessDecoder(vocab_size=1000, embed_size=256)

Forward pass through the decoder

labels = torch.tensor([[1, 2, 3], [4, 5, 6]]) output = decoder(labels)

Get the score for a label sequence

score, _ = decoder.score([1, 2, 3])

Initialize states for a batch

decoder.init_state(batch_size=32)

Select a specific state

state = decoder.select_state(None, idx=0)

Set the device for the decoder

decoder.set_device(torch.device(‘cuda:0’))

Create batch states

decoder.create_batch_states([None] * 32)

set_device(device: device) → None

Set GPU device to use.

This method allows you to specify the device (CPU or GPU) on which the decoder should operate. This is particularly useful for models that may need to be switched between devices during training or inference.

Parameters:device – The device ID (e.g., torch.device(‘cuda:0’) for the first GPU or torch.device(‘cpu’) for the CPU).

################### Examples

>>> decoder = StatelessDecoder(vocab_size=100)
>>> decoder.set_device(torch.device('cuda:0'))

######### NOTE Ensure that the device is available and that the model is moved to the appropriate device to avoid runtime errors.