espnet.transform.spec_augment.SpecAugment
Less than 1 minute
espnet.transform.spec_augment.SpecAugment
class espnet.transform.spec_augment.SpecAugment(**kwargs)
Bases: FuncTrans
spec agument
apply random time warping and time/freq masking default setting is based on LD (Librispeech double) in Table 2
- Parameters:
- x (numpy.ndarray) – (time, freq)
- resize_mode (str) – “PIL” (fast, nondifferentiable) or “sparse_image_warp” (slow, differentiable)
- max_time_warp (int) – maximum frames to warp the center frame in spectrogram (W)
- freq_mask_width (int) – maximum width of the random freq mask (F)
- n_freq_mask (int) – the number of the random freq mask (m_F)
- time_mask_width (int) – maximum width of the random time mask (T)
- n_time_mask (int) – the number of the random time mask (m_T)
- inplace (bool) – overwrite intermediate array
- replace_with_zero (bool) – pad zero on mask if true else use mean