espnet2.enh.layers.dprnn.DPRNN

About 1 min

espnet2.enh.layers.dprnn.DPRNN

class espnet2.enh.layers.dprnn.DPRNN(rnn_type, input_size, hidden_size, output_size, dropout=0, num_layers=1, bidirectional=True)

Bases: Module

Deep dual-path RNN for efficient long sequence modeling.

This module implements a dual-path RNN as proposed in Luo et al. “Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation.” The dual-path RNN applies RNN layers in both row and column directions to effectively model long sequences while maintaining efficiency.

Parameters:
- rnn_type (str) – Select from ‘RNN’, ‘LSTM’, and ‘GRU’.
- input_size (int) – Dimension of the input feature. The input should have shape (batch, seq_len, input_size).
- hidden_size (int) – Dimension of the hidden state.
- output_size (int) – Dimension of the output size.
- dropout (float) – Dropout ratio. Default is 0.
- num_layers (int) – Number of stacked RNN layers. Default is 1.
- bidirectional (bool) – Whether the RNN layers are bidirectional. Default is True.

####### Examples

>>> dprnn = DPRNN('LSTM', input_size=256, hidden_size=128,
                  output_size=256, num_layers=2)
>>> input_tensor = torch.randn(32, 10, 256)  # (batch_size, seq_len, input_size)
>>> output = dprnn(input_tensor)
>>> print(output.shape)  # Should be (32, 256, 10, 1)

Returns: The output tensor with shape : (batch_size, output_size, dim1, dim2).
Return type: torch.Tensor

NOTE

The dual-path RNN consists of row RNNs and column RNNs that are applied in sequence. The output is then processed through a linear layer for final predictions.

Raises:AssertionError – If rnn_type is not one of ‘RNN’, ‘LSTM’, or ‘GRU’.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(input)

Perform a forward pass through the RNN layer.

This method processes the input tensor through the RNN layer, applies dropout, and performs a linear projection to transform the RNN output back to the input feature space.

Parameters:
- input (torch.Tensor) – Input tensor of shape (batch, seq_len, dim). It should have dimensions representing batch size, sequence length, and input feature dimension.
- state (torch.Tensor , optional) – The initial hidden state for the RNN. If not provided, the RNN will initialize its hidden state internally.
Returns: A tuple containing: : - output (torch.Tensor): The output tensor after processing through the RNN layer, of shape (batch, seq_len, dim).
- state (torch.Tensor): The final hidden state of the RNN.
Return type: tuple

####### Examples

>>> rnn = SingleRNN('LSTM', input_size=10, hidden_size=20)
>>> input_tensor = torch.randn(5, 15, 10)  # batch_size=5, seq_len=15
>>> output, state = rnn(input_tensor)

NOTE

The input tensor must be of shape (batch, seq_len, input_size) and the output tensor will have the same shape as the input tensor.