espnet2.train.preprocessor.detect_non_silence

Less than 1 minute

espnet2.train.preprocessor.detect_non_silence

espnet2.train.preprocessor.detect_non_silence(x: ndarray, threshold: float = 0.01, frame_length: int = 1024, frame_shift: int = 512, window: str = 'boxcar') → ndarray

Power based voice activity detection.

This function detects non-silent frames in an audio signal based on the power of the signal. It frames the input signal, applies a windowing function, and computes the mean power to determine which frames contain non-silent segments based on a specified threshold.

Parameters:
- x (np.ndarray) – The input audio signal represented as a numpy array. It should have a shape of (Channel, Time).
- threshold (float) – The power threshold for detecting non-silence. Defaults to 0.01.
- frame_length (int) – The length of each frame for analysis. Defaults to 1024.
- frame_shift (int) – The number of samples to shift between frames. Defaults to 512.
- window (str) – The type of window to apply to each frame. Defaults to “boxcar”.
Returns: A boolean array of the same shape as x, where True indicates non-silent frames and False indicates silent frames.
Return type: np.ndarray

Examples

>>> x = np.random.randn(1000)
>>> detect = detect_non_silence(x)
>>> assert x.shape == detect.shape
>>> assert detect.dtype == np.bool

Raises:
- ValueError – If the input array is empty or if frame_length is less
- than 1**,** greater than the input length**, or** if frame_shift is less –
- than or equal to 0. –