espnet2.enh.layers.dptnet.ImprovedTransformerLayer
espnet2.enh.layers.dptnet.ImprovedTransformerLayer
class espnet2.enh.layers.dptnet.ImprovedTransformerLayer(rnn_type, input_size, att_heads, hidden_size, dropout=0.0, activation='relu', bidirectional=True, norm='gLN')
Bases: Module
Container module of the (improved) Transformer proposed in [1].
This class implements the Improved Transformer Layer as part of the Dual-Path Transformer Network (DPTNet) architecture. It incorporates a multi-head self-attention mechanism followed by a feed-forward network, and can utilize various RNN types for processing the input features. This layer is designed for applications such as end-to-end monaural speech separation.
Reference: : Chen, J., Mao, Q., & Liu, D. (2020). Dual-path transformer network: Direct context-aware modeling for end-to-end monaural speech separation. In Proc. ISCA Interspeech (pp. 2642–2646).
rnn_type
Type of RNN used (‘RNN’, ‘LSTM’, or ‘GRU’).
- Type: str
att_heads
Number of attention heads.
- Type: int
self_attn
Multi-head self-attention layer.
- Type: nn.MultiheadAttention
dropout
Dropout layer for regularization.
- Type: nn.Dropout
norm_attn
Normalization layer for attention output.
rnn
RNN layer based on specified rnn_type.
- Type: nn.Module
feed_forward
Feed-forward network following the RNN.
- Type: nn.Sequential
norm_ff
Normalization layer for feed-forward output.
- Parameters:
- rnn_type (str) – Select from ‘RNN’, ‘LSTM’, and ‘GRU’.
- input_size (int) – Dimension of the input feature.
- att_heads (int) – Number of attention heads.
- hidden_size (int) – Dimension of the hidden state.
- dropout (float) – Dropout ratio. Default is 0.
- activation (str) – Activation function applied at the output of RNN.
- bidirectional (bool , optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- norm (str , optional) – Type of normalization to use.
####### Examples
>>> layer = ImprovedTransformerLayer(
... rnn_type='LSTM',
... input_size=256,
... att_heads=4,
... hidden_size=128,
... dropout=0.1,
... activation='relu'
... )
>>> input_tensor = torch.randn(10, 20, 256) # (batch, seq_len, input_size)
>>> output_tensor = layer(input_tensor)
- Raises:AssertionError – If rnn_type is not one of ‘RNN’, ‘LSTM’, or ‘GRU’.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(x, attn_mask=None)
Forward pass through the Improved Transformer Layer.
This method takes the input tensor x, applies self-attention, a feed-forward neural network, and normalization, returning the transformed output.
- Parameters:
- x (torch.Tensor) – Input tensor of shape (batch, seq, input_size).
- attn_mask (torch.Tensor , optional) – Attention mask to prevent attention to certain positions. Default is None.
- Returns: Output tensor of the same shape as input x after applying the transformer layer.
- Return type: torch.Tensor
####### Examples
>>> layer = ImprovedTransformerLayer('LSTM', 128, 4, 64)
>>> input_tensor = torch.randn(32, 10, 128) # (batch, seq, input_size)
>>> output_tensor = layer(input_tensor)
>>> print(output_tensor.shape) # Should output: torch.Size([32, 10, 128])
NOTE
The input tensor x should have dimensions corresponding to (batch size, sequence length, input size).