espnet2.tts.fastspeech2.variance_predictor.VariancePredictor

About 1 min

espnet2.tts.fastspeech2.variance_predictor.VariancePredictor

class espnet2.tts.fastspeech2.variance_predictor.VariancePredictor(idim: int, n_layers: int = 2, n_chans: int = 384, kernel_size: int = 3, bias: bool = True, dropout_rate: float = 0.5)

Bases: Module

Variance predictor module.

This module implements the variance predictor described in FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

conv

List of convolutional layers for variance prediction.

Type: torch.nn.ModuleList

linear

Linear layer for output prediction.

Type: torch.nn.Linear
Parameters:
- idim (int) – Input dimension.
- n_layers (int) – Number of convolutional layers.
- n_chans (int) – Number of channels of convolutional layers.
- kernel_size (int) – Kernel size of convolutional layers.
- bias (bool) – Whether to use bias in convolutional layers.
- dropout_rate (float) – Dropout rate.

####### Examples

>>> vp = VariancePredictor(idim=256)
>>> input_tensor = torch.rand(8, 100, 256)  # (B, Tmax, idim)
>>> masks = torch.zeros(8, 100, dtype=torch.uint8)  # No padding
>>> output = vp(input_tensor, masks)
>>> print(output.shape)  # Output shape will be (8, 100, 1)

Raises:TypeError – If any of the arguments are of the wrong type.

NOTE

This module is designed for use in text-to-speech synthesis models.

Initilize duration predictor module.

Parameters:
- idim (int) – Input dimension.
- n_layers (int) – Number of convolutional layers.
- n_chans (int) – Number of channels of convolutional layers.
- kernel_size (int) – Kernel size of convolutional layers.
- dropout_rate (float) – Dropout rate.

forward(xs: Tensor, x_masks: Tensor | None = None) → Tensor

Calculate forward propagation.

This method processes the input sequences through the convolutional layers and returns the predicted variance for each sequence. It can handle padded inputs using the provided masks.

Parameters:
- xs (Tensor) – Batch of input sequences with shape (B, Tmax, idim).
- x_masks (ByteTensor , optional) – Batch of masks indicating padded parts with shape (B, Tmax). Default is None.
Returns: Batch of predicted sequences with shape (B, Tmax, 1).
Return type: Tensor

####### Examples

>>> vp = VariancePredictor(idim=80)
>>> input_tensor = torch.rand(32, 100, 80)  # (B, Tmax, idim)
>>> mask_tensor = torch.zeros(32, 100, dtype=torch.bool)  # No padding
>>> output = vp.forward(input_tensor, mask_tensor)
>>> print(output.shape)  # Should print: torch.Size([32, 100, 1])

NOTE

Ensure that the input tensor xs is appropriately shaped and the mask tensor, if provided, matches the dimensions of xs.