espnet2.asr_transducer.encoder.modules.convolution.ConformerConvolution
espnet2.asr_transducer.encoder.modules.convolution.ConformerConvolution
class espnet2.asr_transducer.encoder.modules.convolution.ConformerConvolution(channels: int, kernel_size: int, activation: Module = ReLU(), norm_args: Dict = {}, causal: bool = False)
Bases: Module
ConformerConvolution module definition.
- Parameters:
- channels – The number of channels.
- kernel_size – Size of the convolving kernel.
- activation – Activation function.
- norm_args – Normalization module arguments.
- causal – Whether to use causal convolution (set to True if streaming).
Construct an ConformerConvolution object.
Compute convolution module.
This method applies a series of convolution operations on the input tensor x, potentially using a source mask and a cache for efficient processing. The convolution is performed using a pointwise, depthwise, and another pointwise convolution layer, with optional activation and normalization.
- Parameters:
- x – ConformerConvolution input sequences. Shape (B, T, D_hidden),
- size (where B is the batch)
- length (T is the sequence)
- and
- units. (D_hidden is the number of hidden)
- mask – Source mask. Shape (B, T_2). This mask is applied to zero out certain positions in the input tensor x.
- cache – ConformerConvolution input cache. Shape (1, D_hidden, conv_kernel). This cache is used to store previous outputs for causal convolutions.
- Returns: ConformerConvolution output sequences. Shape (B, ?, D_hidden), where the second dimension may vary depending on the operations performed. cache: ConformerConvolution output cache. Shape (1, D_hidden, conv_kernel).
This cache can be used in subsequent calls to maintain state.
- Return type: x
Examples
>>> model = ConformerConvolution(channels=64, kernel_size=3)
>>> input_tensor = torch.randn(32, 10, 64) # Batch of 32, 10 time steps
>>> output, new_cache = model(input_tensor)
>>> print(output.shape) # Output shape should be (32, ?, 64)
NOTE
Ensure that the input tensor x is properly shaped and the kernel size is odd to maintain symmetry in convolution operations.