espnet2.s2st.losses.tacotron_loss.S2STTacotron2Loss
espnet2.s2st.losses.tacotron_loss.S2STTacotron2Loss
class espnet2.s2st.losses.tacotron_loss.S2STTacotron2Loss(weight: float = 1.0, loss_type: str = 'L1+L2', use_masking: str2bool = True, use_weighted_masking: str2bool = False, bce_pos_weight: float = 20.0)
Bases: AbsS2STLoss
Tacotron-based loss for S2ST.
This class implements the loss function for the sequence-to-sequence text-to-speech (S2ST) model based on the Tacotron architecture. It incorporates various loss types including L1, L2, and Binary Cross Entropy (BCE) and allows for masking options during loss computation.
weight
Weighting factor for the loss. Default is 1.0.
- Type: float
loss_type
Type of loss to compute. Options are “L1+L2”, “L1”, or “L2”. Default is “L1+L2”.
- Type: str
loss
Instance of the Tacotron2Loss class.
Type:Tacotron2Loss
Parameters:
- weight (float) – Weighting factor for the loss. Default is 1.0.
- loss_type (str) – Type of loss to compute. Options are “L1+L2”, “L1”, or “L2”. Default is “L1+L2”.
- use_masking (str2bool) – Flag to enable masking. Default is True.
- use_weighted_masking (str2bool) – Flag to enable weighted masking. Default is False.
- bce_pos_weight (float) – Positive weight for BCE loss. Default is 20.0.
Returns: L1 loss value. Tensor: Mean square error loss value. Tensor: Binary cross entropy loss value.
Return type: Tensor
Raises:ValueError – If an unknown loss type is specified.
####### Examples
>>> loss_fn = S2STTacotron2Loss(weight=1.0, loss_type="L1")
>>> after_outs = torch.randn(4, 100, 80) # Example tensor
>>> before_outs = torch.randn(4, 100, 80) # Example tensor
>>> logits = torch.randn(4, 100) # Example tensor
>>> ys = torch.randn(4, 100, 80) # Example tensor
>>> labels = torch.randint(0, 2, (4, 100)) # Example tensor
>>> olens = torch.tensor([100, 90, 80, 70]) # Example tensor
>>> loss, l1_loss, mse_loss, bce_loss = loss_fn(
... after_outs, before_outs, logits, ys, labels, olens)
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(after_outs: Tensor, before_outs: Tensor, logits: Tensor, ys: Tensor, labels: Tensor, olens: Tensor)
Forward pass for computing the loss based on the outputs and targets.
This method calculates the loss values for a given batch of outputs from the Tacotron model and the corresponding target values. The loss is computed based on the specified loss type (L1, L2, or a combination).
- Parameters:
- after_outs (Tensor) – Batch of outputs after postnets (B, Lmax, odim).
- before_outs (Tensor) – Batch of outputs before postnets (B, Lmax, odim).
- logits (Tensor) – Batch of stop logits (B, Lmax).
- ys (Tensor) – Batch of padded target features (B, Lmax, odim).
- labels (LongTensor) – Batch of the sequences of stop token labels (B, Lmax).
- olens (LongTensor) – Batch of the lengths of each target (B,).
- Returns: Total loss value. Tensor: L1 loss value. Tensor: Mean square error loss value. Tensor: Binary cross entropy loss value.
- Return type: Tensor
- Raises:ValueError – If an unknown loss type is specified.
####### Examples
>>> loss_fn = S2STTacotron2Loss()
>>> after_outs = torch.randn(2, 10, 80)
>>> before_outs = torch.randn(2, 10, 80)
>>> logits = torch.randn(2, 10)
>>> ys = torch.randn(2, 10, 80)
>>> labels = torch.tensor([[1, 0, 1, 1, 0, 0, 1, 0, 0, 0],
... [0, 1, 0, 0, 1, 1, 0, 0, 0, 0]])
>>> olens = torch.tensor([10, 10])
>>> total_loss, l1_loss, mse_loss, bce_loss = loss_fn(
... after_outs, before_outs, logits, ys, labels, olens)