espnet2.s2st.synthesizer.translatotron2.Prenet

About 1 min

espnet2.s2st.synthesizer.translatotron2.Prenet

class espnet2.s2st.synthesizer.translatotron2.Prenet(idim, units=128, num_layers=2, dropout=0.5)

Bases: Module

Non-Attentive Tacotron (NAT) Prenet.

The Prenet is a feedforward neural network module that acts as a preprocessing layer for the inputs in the NAT framework. It consists of multiple linear layers followed by dropout and ReLU activation. This module helps in learning the representation of the input features before passing them to the main synthesizer.

layers

A list of linear layers for processing input.

Type: ModuleList

dropout

Dropout layer to prevent overfitting.

Type: Dropout

activation

Activation function applied after each layer.

Type: ReLU
Parameters:
- idim (int) – The dimension of the input features.
- units (int , optional) – The number of units in each layer. Default is 128.
- num_layers (int , optional) – The number of layers in the Prenet. Default is 2.
- dropout (float , optional) – The dropout rate applied after each layer. Default is 0.5.

####### Examples

>>> prenet = Prenet(idim=256, units=128, num_layers=2, dropout=0.5)
>>> input_tensor = torch.randn(10, 256)  # Batch size of 10
>>> output = prenet(input_tensor)
>>> output.shape
torch.Size([10, 128])  # Output shape after processing

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Pass the input through the Prenet layers.

The forward method applies a series of linear transformations followed by dropout and ReLU activation to the input tensor x. Each layer is defined in the constructor and is executed sequentially.

Parameters:x (torch.Tensor) – Input tensor of shape [batch_size, input_dim].
Returns: Output tensor after passing through the Prenet layers with shape [batch_size, units].
Return type: torch.Tensor

####### Examples

>>> prenet = Prenet(idim=256, units=128, num_layers=2, dropout=0.5)
>>> input_tensor = torch.randn(10, 256)  # Example batch of size 10
>>> output_tensor = prenet(input_tensor)
>>> output_tensor.shape
torch.Size([10, 128])

NOTE

The input tensor should be of the appropriate shape and dimension matching the expected input for the Prenet layers.