espnet2.asvspoof.loss.oc_softmax_loss.ASVSpoofOCSoftmaxLoss
espnet2.asvspoof.loss.oc_softmax_loss.ASVSpoofOCSoftmaxLoss
class espnet2.asvspoof.loss.oc_softmax_loss.ASVSpoofOCSoftmaxLoss(weight: float = 1.0, enc_dim: int = 128, m_real: float = 0.5, m_fake: float = 0.2, alpha: float = 20.0)
Bases: AbsASVSpoofLoss
Implementation of the One-Class Softmax Loss for ASV Spoofing.
This loss function is designed to differentiate between real and spoofed audio samples in the context of anti-spoofing systems. It utilizes a one-class softmax approach to handle embeddings from a speaker verification model.
weight
The weight of the loss. Default is 1.0.
- Type: float
feat_dim
The dimension of the encoder’s output features. Default is 128.
- Type: int
m_real
Margin for real embeddings. Default is 0.5.
- Type: float
m_fake
Margin for fake embeddings. Default is 0.2.
- Type: float
alpha
Scaling factor for the loss. Default is 20.0.
- Type: float
center
Learnable parameter representing the center of the embedding space.
- Type: torch.nn.Parameter
softplus
Softplus activation function.
Type: torch.nn.Softplus
Parameters:
- weight (float , optional) – Weight of the loss function. Defaults to 1.0.
- enc_dim (int , optional) – Dimension of the encoder’s output. Defaults to 128.
- m_real (float , optional) – Margin for real embeddings. Defaults to 0.5.
- m_fake (float , optional) – Margin for fake embeddings. Defaults to 0.2.
- alpha (float , optional) – Scaling factor for the loss. Defaults to 20.0.
Returns: None
Raises:ValueError – If the input dimensions do not match the expected sizes.
######### Examples
>>> loss_fn = ASVSpoofOCSoftmaxLoss()
>>> labels = torch.tensor([[1], [0]])
>>> embeddings = torch.randn(2, 10, 128) # Batch of 2, 10 frames, 128 dim
>>> loss = loss_fn(labels, embeddings)
>>> print(loss)
####### NOTE The forward method computes the loss based on the provided labels and embeddings. The score method can be used to obtain the prediction scores for the embeddings.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(label: Tensor, emb: Tensor, **kwargs)
Compute the forward pass for the ASVSpoofOCSoftmaxLoss.
This method calculates the loss based on the ground truth labels and the encoder embedding output. The embeddings are normalized, and scores are computed using the learned center and the embeddings.
- Parameters:
- label (torch.Tensor) – Ground truth label tensor of shape [Batch, 1]. It indicates whether the input is real or spoofed.
- emb (torch.Tensor) – Encoder embedding output tensor of shape [Batch, T, enc_dim]. This is the output from the encoder.
- Returns: The computed loss value for the input batch.
- Return type: torch.Tensor
- Raises:ValueError – If the dimensions of label and emb do not match.
######### Examples
>>> import torch
>>> loss_fn = ASVSpoofOCSoftmaxLoss()
>>> labels = torch.tensor([[1], [0]]) # Real and spoofed
>>> embeddings = torch.randn(2, 10, 128) # Example embeddings
>>> loss = loss_fn.forward(labels, embeddings)
>>> print(loss)
####### NOTE The loss computation involves several steps that include normalizing the embeddings and center, calculating scores, applying a bias, and using the Softplus function.
score(emb: Tensor)
Compute the scores based on the encoder embeddings.
This method calculates the similarity scores between the input embeddings and the learned center vector. The scores can be used to evaluate the confidence of the model’s predictions regarding whether the input is real or spoofed.
- Parameters:emb (torch.Tensor) – Encoder embedding output of shape [Batch, T, enc_dim], where Batch is the number of samples, T is the sequence length, and enc_dim is the dimensionality of the embeddings.
- Returns: A tensor of shape [Batch] containing the computed similarity scores for each input embedding.
- Return type: torch.Tensor
######### Examples
>>> loss_fn = ASVSpoofOCSoftmaxLoss()
>>> embeddings = torch.randn(32, 10, 128) # 32 samples, 10 time steps
>>> scores = loss_fn.score(embeddings)
>>> print(scores.shape) # Output: torch.Size([32])
####### NOTE The method normalizes both the input embeddings and the learned center vector before computing the scores to ensure that the scores are computed based on cosine similarity.