espnet2.asr_transducer.utils.make_chunk_mask

Less than 1 minute

espnet2.asr_transducer.utils.make_chunk_mask

espnet2.asr_transducer.utils.make_chunk_mask(size: int, chunk_size: int, num_left_chunks: int = 0, device: device | None = None) → Tensor

Create a chunk mask for the subsequent steps.

This function generates a boolean mask tensor indicating which frames can be attended to based on the chunking strategy defined by the chunk_size and num_left_chunks. The resulting mask tensor has a shape of (size, size), where size is the total number of frames.

Reference: : https://github.com/k2-fsa/icefall/blob/master/icefall/utils.py

Parameters:
- size – Size of the source mask.
- chunk_size – Number of frames in each chunk.
- num_left_chunks – Number of left chunks that the attention module can see. A null or negative value means full context.
- device – The device for the mask tensor (e.g., ‘cpu’ or ‘cuda’).
Returns: A boolean tensor of shape (size, size) representing : the chunk mask, where True indicates frames that can be attended to, and False indicates masked frames.
Return type: mask

Examples

>>> mask = make_chunk_mask(size=10, chunk_size=3, num_left_chunks=1)
>>> print(mask)
tensor([[False,  True,  True, False, False, False, False, False, False, False],
        [ True,  True,  True, False, False, False, False, False, False, False],
        [False,  True,  True, False, False, False, False, False, False, False],
        [False, False, False,  True,  True,  True, False, False, False, False],
        [False, False, False,  True,  True,  True, False, False, False, False],
        ...
       ])