espnet2.asvspoof.loss.am_softmax_loss.ASVSpoofAMSoftmaxLoss
espnet2.asvspoof.loss.am_softmax_loss.ASVSpoofAMSoftmaxLoss
class espnet2.asvspoof.loss.am_softmax_loss.ASVSpoofAMSoftmaxLoss(weight: float = 1.0, enc_dim: int = 128, s: float = 20, m: float = 0.5)
Bases: AbsASVSpoofLoss
Adaptive Margin Softmax Loss for ASV Spoofing.
This class implements the Adaptive Margin Softmax loss function designed for Automatic Speaker Verification (ASV) spoofing tasks. The loss is based on a binary classification framework where the model learns to differentiate between genuine and spoofed audio samples.
weight
A scaling factor for the loss. Default is 1.0.
- Type: float
enc_dim
Dimensionality of the encoder output. Default is 128.
- Type: int
s
Scaling factor for the logits. Default is 20.
- Type: float
m
Margin added to the logits for improved separation. Default is 0.5.
- Type: float
centers
Learnable parameters representing class centers.
- Type: torch.nn.Parameter
s
Sigmoid activation function.
- Type: torch.nn.Sigmoid
loss
Binary cross-entropy loss function.
Type: torch.nn.BCELoss
Parameters:
- weight (float , optional) – Weighting factor for the loss computation.
- enc_dim (int , optional) – Dimensionality of the encoder output.
- s (float , optional) – Scaling factor for logits.
- m (float , optional) – Margin for the softmax loss.
Returns: Computed loss value.
Return type: torch.Tensor
######### Examples
>>> loss_fn = ASVSpoofAMSoftmaxLoss()
>>> labels = torch.tensor([[1], [0]])
>>> embeddings = torch.randn(2, 10, 128) # Batch of 2, 10 time steps, 128 features
>>> loss = loss_fn(labels, embeddings)
>>> print(loss)
####### NOTE The input embeddings should be in the shape [Batch, T, enc_dim], where Batch is the number of samples, T is the sequence length, and enc_dim is the dimension of the encoder output.
- Raises:ValueError – If the dimensions of label and embedding do not match.
Initialize internal Module state, shared by both nn.Module and ScriptModule.
forward(label: Tensor, emb: Tensor, **kwargs)
Compute the forward pass of the ASVSpoofAMSoftmaxLoss.
This method computes the loss for the given input embeddings and labels using an angular margin softmax approach. The embeddings are normalized and compared against learned class centers to produce logits, which are then used to compute the binary cross-entropy loss.
- Parameters:
- label (torch.Tensor) – Ground truth labels with shape [Batch, 1], where each label is either 0 or 1.
- emb (torch.Tensor) – Encoder embedding output with shape [Batch, T, enc_dim], where T is the sequence length and enc_dim is the dimension of the embeddings.
- Returns: The computed loss value as a tensor.
- Return type: torch.Tensor
######### Examples
>>> import torch
>>> loss_fn = ASVSpoofAMSoftmaxLoss()
>>> labels = torch.tensor([[1], [0]])
>>> embeddings = torch.rand(2, 10, 128) # Batch of 2, 10 timesteps, 128 dim
>>> loss = loss_fn(labels, embeddings)
>>> print(loss)
####### NOTE The input emb is averaged across the time dimension before normalization. The learned class centers are also normalized prior to computing the logits.
s
Compute the prediction scores for the given embeddings.
This method takes the encoder embeddings, normalizes them, and computes the logits by performing a matrix multiplication with the normalized centers. The first column of the logits is returned as the prediction scores.
- Parameters:emb (torch.Tensor) – Encoder embedding output of shape [Batch, T, enc_dim]. The embeddings are averaged across the time dimension (T).
- Returns: The prediction scores of shape [Batch]. This tensor : contains the scores for each input sample.
- Return type: torch.Tensor
######### Examples
>>> loss_fn = ASVSpoofAMSoftmaxLoss()
>>> embeddings = torch.randn(32, 10, 128) # Batch of 32 samples
>>> scores = loss_fn.score(embeddings)
>>> print(scores.shape) # Output: torch.Size([32])
####### NOTE The embeddings must be computed using the same model and settings as the one used during training for the scores to be meaningful.