espnet2.asr.encoder.avhubert_encoder.GradMultiply

About 2 min

espnet2.asr.encoder.avhubert_encoder.GradMultiply

class espnet2.asr.encoder.avhubert_encoder.GradMultiply(*args, **kwargs)

Bases: Function

Applies a gradient multiplication operation to a tensor.

This class implements a custom autograd function that multiplies the gradients of a tensor by a specified scale factor during the backward pass. This can be useful for controlling the flow of gradients, particularly in scenarios like feature extraction where one may want to reduce the contribution of certain features to the overall loss.

scale

The scale factor by which to multiply the gradients.

Type: float

forward(ctx, x, scale)

Applies the forward pass of the function.

backward(ctx, grad)

Applies the backward pass of the function.

Parameters:
- x (torch.Tensor) – The input tensor whose gradients will be scaled.
- scale (float) – The factor to multiply the gradients by.
Returns: The input tensor, unchanged during the forward pass.
Return type: torch.Tensor

######### Examples

>>> import torch
>>> from your_module import GradMultiply
>>> x = torch.tensor([1.0, 2.0], requires_grad=True)
>>> scale = 0.5
>>> output = GradMultiply.apply(x, scale)
>>> output.backward(torch.tensor([1.0, 1.0]))
>>> print(x.grad)  # Output will be [0.5, 1.0], scaled by 0.5

static backward(ctx, grad)

Compute the gradient by scaling the input gradient.

This function is part of the GradMultiply class and is used to scale the gradient during backpropagation. It allows for controlling the contribution of the input features to the loss.

Parameters:
- ctx – The context object that can be used to stash information for backward computation. This is automatically provided by PyTorch.
- grad – The gradient of the loss with respect to the output of this function. This is a tensor containing the gradients from the subsequent layer.
Returns:
- The scaled gradient for the input tensor.
- None, as no gradient scaling is needed for the scale parameter.
Return type: A tuple containing

######### Examples

>>> import torch
>>> from your_module import GradMultiply
>>> input_tensor = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
>>> scale = 0.1
>>> output = GradMultiply.apply(input_tensor, scale)
>>> output.backward(torch.tensor([1.0, 1.0, 1.0]))
>>> print(input_tensor.grad)  # Should show [0.1, 0.2, 0.3]

NOTE

This function should be used in conjunction with the forward method of the GradMultiply class. It modifies the gradient passed to it, scaling it by the specified factor.

Raises:None –

static forward(ctx, x, scale)

Forward pass for the AVHubert Encoder.

This method processes input tensors for both audio and video modalities and returns the encoded outputs along with their respective lengths.

Parameters:
- xs_pad (Dict *[*str , torch.Tensor ]) – A dictionary containing input tensors. The expected keys are:
  - ‘video’: input tensor of shape (B, 1, L, H, W)
  - ‘audio’: input tensor of shape (B, D, L)
- ilens (torch.Tensor) – A tensor of shape (B,) representing the input lengths for each batch.
- prev_states (torch.Tensor , optional) – Not currently used.
Returns: A tuple containing:
- Encoded tensor of shape (B, T, D), where T is the length of
the output sequence and D is the output dimension.
- A tensor containing the output lengths of shape (B,).
- An optional tensor, currently set to None.
Return type: Tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]]
Raises:ValueError – If neither ‘video’ nor ‘audio’ keys are present in xs_pad.

######### Examples

>>> model = FairseqAVHubertEncoder()
>>> xs_pad = {
...     'video': torch.randn(2, 1, 100, 224, 224),
...     'audio': torch.randn(2, 104, 100)
... }
>>> ilens = torch.tensor([100, 100])
>>> output, lengths, _ = model(xs_pad, ilens)
>>> print(output.shape)  # Output tensor shape
torch.Size([2, 100, 1024])
>>> print(lengths)  # Output lengths
tensor([100, 100])