espnet2.mt.frontend.embedding.Embedding
espnet2.mt.frontend.embedding.Embedding
class espnet2.mt.frontend.embedding.Embedding(input_size: int = 400, embed_dim: int = 400, pos_enc_class=<class 'espnet.nets.pytorch_backend.transformer.embedding.PositionalEncoding'>, positional_dropout_rate: float = 0.1)
Bases: AbsFrontend
Embedding Frontend for text based inputs.
This class provides an embedding layer for processing text inputs, utilizing positional encoding to enhance the representation of input tokens.
embed_dim
The dimension of the embedding space.
- Type: int
embed
A sequential model combining embedding and positional encoding.
Type: torch.nn.Sequential
Parameters:
- input_size (int) – Number of input tokens.
- embed_dim (int) – Embedding size.
- pos_enc_class – Class for positional encoding (e.g., PositionalEncoding).
- positional_dropout_rate (float) – Dropout rate after adding positional encoding.
Returns: None
######### Examples
>>> embedding = Embedding(input_size=1000, embed_dim=256)
>>> input_tensor = torch.randint(0, 1000, (32, 50)) # (batch_size, seq_len)
>>> input_lengths = torch.full((32,), 50) # All sequences are of length 50
>>> output, output_lengths = embedding(input_tensor, input_lengths)
>>> output.shape # Should be (32, 50, 256)
torch.Size([32, 50, 256])
>>> output_lengths.shape # Should be (32,)
torch.Size([32])
Initialize.
- Parameters:
- input_size – Number of input tokens.
- embed_dim – Embedding Size.
- pos_enc_class – PositionalEncoding or ScaledPositionalEncoding
- positional_dropout_rate – dropout rate after adding positional encoding
forward(input: Tensor, input_lengths: Tensor) → Tuple[Tensor, Tensor]
Apply a sliding window on the input.
This method processes the input tensor and applies an embedding layer followed by positional encoding, returning the embedded output along with the input lengths.
- Parameters:
- input – A tensor of shape (B, T) or (B, T, D), where B is the batch size, T is the sequence length, and D is the feature dimension.
- input_lengths – A tensor containing the lengths of the input sequences within the batch.
- Returns: A tuple containing: : - A tensor with dimensions (B, T, D) representing the embedded output.
- A tensor containing the output lengths within the batch.
- Return type: Tuple[torch.Tensor, torch.Tensor]
######### Examples
>>> embedding = Embedding(input_size=1000, embed_dim=256)
>>> input_tensor = torch.randint(0, 1000, (32, 10)) # Batch of 32, seq len 10
>>> input_lengths = torch.full((32,), 10) # All sequences have length 10
>>> output, output_lengths = embedding(input_tensor, input_lengths)
>>> output.shape
torch.Size([32, 10, 256]) # Output shape should match (B, T, D)
>>> output_lengths
tensor([10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
NOTE
Ensure that the input tensor contains valid token indices within the range of the input size.
output_size() → int
Return output length of feature dimension D, i.e. the embedding dim.
This method provides the size of the output feature dimension D, which is equivalent to the embedding dimension of the layer. It is useful for determining the output shape of the embedding layer, particularly when constructing models that depend on the embedding size.
- Returns: The embedding dimension size.
- Return type: int
######### Examples
>>> embedding = Embedding(input_size=500, embed_dim=256)
>>> embedding.output_size()
256
>>> patch_embedding = PatchEmbedding(input_size=500, embed_dim=128)
>>> patch_embedding.output_size()
128
>>> codec_embedding = CodecEmbedding(input_size=500)
>>> codec_embedding.output_size()
<codebook_dim value> # replace with actual codebook dimension