espnet2.mt.frontend.embedding.Embedding

About 2 min

espnet2.mt.frontend.embedding.Embedding

class espnet2.mt.frontend.embedding.Embedding(input_size: int = 400, embed_dim: int = 400, pos_enc_class=<class 'espnet.nets.pytorch_backend.transformer.embedding.PositionalEncoding'>, positional_dropout_rate: float = 0.1)

Bases: AbsFrontend

Embedding Frontend for text based inputs.

This class provides an embedding layer for processing text inputs, utilizing positional encoding to enhance the representation of input tokens.

embed_dim

The dimension of the embedding space.

Type: int

embed

A sequential model combining embedding and positional encoding.

Type: torch.nn.Sequential
Parameters:
- input_size (int) – Number of input tokens.
- embed_dim (int) – Embedding size.
- pos_enc_class – Class for positional encoding (e.g., PositionalEncoding).
- positional_dropout_rate (float) – Dropout rate after adding positional encoding.
Returns: None

######### Examples

>>> embedding = Embedding(input_size=1000, embed_dim=256)
>>> input_tensor = torch.randint(0, 1000, (32, 50))  # (batch_size, seq_len)
>>> input_lengths = torch.full((32,), 50)  # All sequences are of length 50
>>> output, output_lengths = embedding(input_tensor, input_lengths)
>>> output.shape  # Should be (32, 50, 256)
torch.Size([32, 50, 256])
>>> output_lengths.shape  # Should be (32,)
torch.Size([32])

Initialize.

Parameters:
- input_size – Number of input tokens.
- embed_dim – Embedding Size.
- pos_enc_class – PositionalEncoding or ScaledPositionalEncoding
- positional_dropout_rate – dropout rate after adding positional encoding

forward(input: Tensor, input_lengths: Tensor) → Tuple[Tensor, Tensor]

Apply a sliding window on the input.

This method processes the input tensor and applies an embedding layer followed by positional encoding, returning the embedded output along with the input lengths.

Parameters:
- input – A tensor of shape (B, T) or (B, T, D), where B is the batch size, T is the sequence length, and D is the feature dimension.
- input_lengths – A tensor containing the lengths of the input sequences within the batch.
Returns: A tuple containing: : - A tensor with dimensions (B, T, D) representing the embedded output.
- A tensor containing the output lengths within the batch.
Return type: Tuple[torch.Tensor, torch.Tensor]

######### Examples

>>> embedding = Embedding(input_size=1000, embed_dim=256)
>>> input_tensor = torch.randint(0, 1000, (32, 10))  # Batch of 32, seq len 10
>>> input_lengths = torch.full((32,), 10)  # All sequences have length 10
>>> output, output_lengths = embedding(input_tensor, input_lengths)
>>> output.shape
torch.Size([32, 10, 256])  # Output shape should match (B, T, D)
>>> output_lengths
tensor([10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
        10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10])

NOTE

Ensure that the input tensor contains valid token indices within the range of the input size.

output_size() → int

Return output length of feature dimension D, i.e. the embedding dim.

This method provides the size of the output feature dimension D, which is equivalent to the embedding dimension of the layer. It is useful for determining the output shape of the embedding layer, particularly when constructing models that depend on the embedding size.

Returns: The embedding dimension size.
Return type: int

######### Examples

>>> embedding = Embedding(input_size=500, embed_dim=256)
>>> embedding.output_size()
256

>>> patch_embedding = PatchEmbedding(input_size=500, embed_dim=128)
>>> patch_embedding.output_size()
128

>>> codec_embedding = CodecEmbedding(input_size=500)
>>> codec_embedding.output_size()
&lt;codebook_dim value&gt;  # replace with actual codebook dimension