espnet2.enh.layers.dprnn.DPRNN
espnet2.enh.layers.dprnn.DPRNN
class espnet2.enh.layers.dprnn.DPRNN(rnn_type, input_size, hidden_size, output_size, dropout=0, num_layers=1, bidirectional=True)
Bases: Module
Deep dual-path RNN for efficient long sequence modeling.
This module implements a dual-path RNN as proposed in Luo et al. “Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation.” The dual-path RNN applies RNN layers in both row and column directions to effectively model long sequences while maintaining efficiency.
- Parameters:
- rnn_type (str) – Select from ‘RNN’, ‘LSTM’, and ‘GRU’.
- input_size (int) – Dimension of the input feature. The input should have shape (batch, seq_len, input_size).
- hidden_size (int) – Dimension of the hidden state.
- output_size (int) – Dimension of the output size.
- dropout (float) – Dropout ratio. Default is 0.
- num_layers (int) – Number of stacked RNN layers. Default is 1.
- bidirectional (bool) – Whether the RNN layers are bidirectional. Default is True.
####### Examples
>>> dprnn = DPRNN('LSTM', input_size=256, hidden_size=128,
output_size=256, num_layers=2)
>>> input_tensor = torch.randn(32, 10, 256) # (batch_size, seq_len, input_size)
>>> output = dprnn(input_tensor)
>>> print(output.shape) # Should be (32, 256, 10, 1)
- Returns: The output tensor with shape : (batch_size, output_size, dim1, dim2).
- Return type: torch.Tensor
NOTE
The dual-path RNN consists of row RNNs and column RNNs that are applied in sequence. The output is then processed through a linear layer for final predictions.
- Raises:AssertionError – If rnn_type is not one of ‘RNN’, ‘LSTM’, or ‘GRU’.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(input)
Perform a forward pass through the RNN layer.
This method processes the input tensor through the RNN layer, applies dropout, and performs a linear projection to transform the RNN output back to the input feature space.
- Parameters:
- input (torch.Tensor) – Input tensor of shape (batch, seq_len, dim). It should have dimensions representing batch size, sequence length, and input feature dimension.
- state (torch.Tensor , optional) – The initial hidden state for the RNN. If not provided, the RNN will initialize its hidden state internally.
- Returns: A tuple containing: : - output (torch.Tensor): The output tensor after processing through the RNN layer, of shape (batch, seq_len, dim).
- state (torch.Tensor): The final hidden state of the RNN.
- Return type: tuple
####### Examples
>>> rnn = SingleRNN('LSTM', input_size=10, hidden_size=20)
>>> input_tensor = torch.randn(5, 15, 10) # batch_size=5, seq_len=15
>>> output, state = rnn(input_tensor)
NOTE
The input tensor must be of shape (batch, seq_len, input_size) and the output tensor will have the same shape as the input tensor.