espnet2.enh.layers.dnsmos.DNSMOS_local

About 2 min

espnet2.enh.layers.dnsmos.DNSMOS_local

class espnet2.enh.layers.dnsmos.DNSMOS_local(primary_model_path, p808_model_path, use_gpu=False, convert_to_torch=False)

Bases: object

A class for estimating Mean Opinion Scores (MOS) for audio signals using local

models. This implementation leverages pre-trained models for audio quality assessment based on deep learning.

convert_to_torch

Flag indicating whether to convert models to PyTorch.

Type: bool

use_gpu

Flag indicating whether to use GPU for computations.

Type: bool

primary_model

Model for primary audio processing.

Type: torch.nn.Module or ort.InferenceSession

p808_model

Model for P.808 metrics estimation.

Type: torch.nn.Module or ort.InferenceSession

spectrogram

Spectrogram transformation module.

Type: torch.nn.Module

to_db

Transformation module to convert amplitude to decibels.

Type: torch.nn.Module
Parameters:
- primary_model_path (str) – Path to the primary model file (ONNX format).
- p808_model_path (str) – Path to the P.808 model file (ONNX format).
- use_gpu (bool , optional) – Flag to enable GPU usage. Default is False.
- convert_to_torch (bool , optional) – Flag to convert models to PyTorch. Default is False.
Raises:RuntimeError – If onnx2torch or onnxruntime is not installed when required.

######### Examples

>>> dnsmos = DNSMOS_local('path/to/primary/model', 'path/to/p808/model',
...                        use_gpu=True, convert_to_torch=True)
>>> audio_signal = np.random.rand(16000 * 9)  # Simulated audio signal
>>> result = dnsmos(audio_signal, input_fs=16000, is_personalized_MOS=True)
>>> print(result)
{
    "OVRL_raw": 3.5,
    "SIG_raw": 3.8,
    "BAK_raw": 3.2,
    "OVRL": 3.6,
    "SIG": 3.9,
    "BAK": 3.3,
    "P808_MOS": 3.7,
}

####### NOTE The input audio signal should be a 1D numpy array or a torch tensor.

audio_melspec(audio, n_mels=120, frame_size=320, hop_length=160, sr=16000, to_db=True)

Compute the Mel spectrogram of the given audio signal.

This method calculates the Mel spectrogram for the input audio signal, either using PyTorch or librosa, depending on the configuration of the DNSMOS_local instance.

Parameters:
- audio (torch.Tensor or np.ndarray) – The input audio signal.
- n_mels (int , optional) – Number of Mel bands to generate. Defaults to 120.
- frame_size (int , optional) – Size of the FFT window. Defaults to 320.
- hop_length (int , optional) – Number of samples between frames. Defaults to 160.
- sr (int , optional) – Sampling rate of the audio signal. Defaults to 16000.
- to_db (bool , optional) – Whether to convert the Mel spectrogram to dB scale. Defaults to True.
Returns: The computed Mel spectrogram, transposed.
Return type: np.ndarray or torch.Tensor

######### Examples

>>> import torch
>>> audio = torch.randn(16000)  # Simulated audio signal
>>> mel_spec = dnsmos_local.audio_melspec(audio)
>>> print(mel_spec.shape)
(n_frames, n_mels)

####### NOTE If self.convert_to_torch is True, the function uses PyTorch for computations; otherwise, it uses librosa. The output is transposed to match the expected shape.

Raises:ValueError – If the audio input is not a valid tensor or ndarray.

get_polyfit_val(sig, bak, ovr, is_personalized_MOS)

Calculates polynomial fitting values for the given audio metrics.

This function uses polynomial regression to compute adjusted values for signal, background, and overall metrics based on input parameters. It applies different polynomial coefficients depending on whether the calculation is for personalized Mean Opinion Score (MOS).

Parameters:
- sig (float) – The signal metric value to be adjusted.
- bak (float) – The background metric value to be adjusted.
- ovr (float) – The overall metric value to be adjusted.
- is_personalized_MOS (bool) – Flag indicating if the calculation is for personalized MOS. If True, personalized coefficients are used.
Returns: A tuple containing the adjusted signal, background, and overall metrics:
- sig_poly (float): Adjusted signal metric.
- bak_poly (float): Adjusted background metric.
- ovr_poly (float): Adjusted overall metric.
Return type: tuple

######### Examples

>>> dnsmos_local = DNSMOS_local(...)
>>> sig_adjusted, bak_adjusted, ovr_adjusted = dnsmos_local.get_polyfit_val(
...     sig=1.5, bak=0.5, ovr=1.0, is_personalized_MOS=True
... )
>>> print(sig_adjusted, bak_adjusted, ovr_adjusted)
(-0.10, 0.40, 1.15)

####### NOTE The polynomial coefficients are defined within the function based on the is_personalized_MOS flag.