espnet2.asr_transducer.encoder.modules.convolution.ConformerConvolution

About 1 min

espnet2.asr_transducer.encoder.modules.convolution.ConformerConvolution

class espnet2.asr_transducer.encoder.modules.convolution.ConformerConvolution(channels: int, kernel_size: int, activation: Module = ReLU(), norm_args: Dict = {}, causal: bool = False)

Bases: Module

ConformerConvolution module definition.

Parameters:
- channels – The number of channels.
- kernel_size – Size of the convolving kernel.
- activation – Activation function.
- norm_args – Normalization module arguments.
- causal – Whether to use causal convolution (set to True if streaming).

Construct an ConformerConvolution object.

forward(x: Tensor, mask: Tensor | None = None, cache: Tensor | None = None) → Tuple[Tensor, Tensor]

Compute convolution module.

This method applies a series of convolution operations on the input tensor x, potentially using a source mask and a cache for efficient processing. The convolution is performed using a pointwise, depthwise, and another pointwise convolution layer, with optional activation and normalization.

Parameters:
- x – ConformerConvolution input sequences. Shape (B, T, D_hidden),
- size (where B is the batch)
- length (T is the sequence)
- and
- units. (D_hidden is the number of hidden)
- mask – Source mask. Shape (B, T_2). This mask is applied to zero out certain positions in the input tensor x.
- cache – ConformerConvolution input cache. Shape (1, D_hidden, conv_kernel). This cache is used to store previous outputs for causal convolutions.
Returns: ConformerConvolution output sequences. Shape (B, ?, D_hidden), where the second dimension may vary depending on the operations performed. cache: ConformerConvolution output cache. Shape (1, D_hidden, conv_kernel).
This cache can be used in subsequent calls to maintain state.
Return type: x

Examples

>>> model = ConformerConvolution(channels=64, kernel_size=3)
>>> input_tensor = torch.randn(32, 10, 64)  # Batch of 32, 10 time steps
>>> output, new_cache = model(input_tensor)
>>> print(output.shape)  # Output shape should be (32, ?, 64)

NOTE

Ensure that the input tensor x is properly shaped and the kernel size is odd to maintain symmetry in convolution operations.