espnet2.uasr.generator.conv_generator.SamePad

About 1 min

espnet2.uasr.generator.conv_generator.SamePad

class espnet2.uasr.generator.conv_generator.SamePad(kernel_size, causal=False)

Bases: Module

Applies same padding to the input tensor based on the kernel size.

This module ensures that the output tensor has the same spatial dimensions as the input tensor after applying a convolution operation, by calculating the necessary padding based on the kernel size. It can also handle causal padding if specified.

remove

The number of elements to remove from the end of the input tensor based on the kernel size and whether causal padding is applied.

Type: int
Parameters:
- kernel_size (int) – The size of the convolution kernel.
- causal (bool) – If True, applies causal padding. Defaults to False.
Returns: The padded input tensor with the same spatial dimensions.
Return type: torch.Tensor

####### Examples

>>> import torch
>>> same_pad = SamePad(kernel_size=3)
>>> input_tensor = torch.randn(1, 3, 10)  # (batch_size, channels, length)
>>> output_tensor = same_pad(input_tensor)
>>> output_tensor.shape
torch.Size([1, 3, 10])  # Output shape is the same as input shape

>>> causal_pad = SamePad(kernel_size=3, causal=True)
>>> output_tensor_causal = causal_pad(input_tensor)
>>> output_tensor_causal.shape
torch.Size([1, 3, 9])  # Output shape is reduced by 2 for causal padding

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Forward pass for the ConvGenerator.

This method processes input features through a series of transformations, including optional batch normalization, residual connections, and convolution. It generates output samples along with a real sample if text is provided, and adjusts the padding mask accordingly.

Parameters:
- feats (torch.Tensor) – Input feature tensor of shape (batch_size, input_dim, seq_length).
- text (Optional *[*torch.Tensor ]) – Optional tensor of shape (batch_size, text_length) containing target text indices.
- feats_padding_mask (torch.Tensor) – Tensor of shape (batch_size, seq_length) indicating padded positions (True for padding, False for valid data).
Returns: Tuple[torch.Tensor, Optional[torch.Tensor], Optional[torch.Tensor], : torch.Tensor]: A tuple containing:
- generated_sample (torch.Tensor): The generated output tensor
of shape (batch_size, output_dim, new_seq_length).
- real_sample (Optional[torch.Tensor]): The one-hot encoded tensor : of the target text if text is provided, otherwise None.
- inter_x (Optional[torch.Tensor]): Intermediate output from the : residual connection if enabled, otherwise None.
- generated_sample_padding_mask (torch.Tensor): The updated : padding mask for the generated sample.
Raises:AssertionError – If text is provided but contains only zeros.

####### Examples

>>> generator = ConvGenerator(input_dim=256, output_dim=128)
>>> feats = torch.randn(32, 256, 100)
>>> text = torch.randint(0, 128, (32, 10))
>>> padding_mask = torch.ones(32, 100, dtype=torch.bool)
>>> generated_sample, real_sample, inter_x, mask = generator(feats,
... text, padding_mask)

NOTE

This method assumes that the input tensors are properly shaped and that the padding mask correctly reflects the padded regions of the input features.