Speech Recognition with Speech Enhancement
Speech Recognition with Speech Enhancement
This is the common recipe for ESPnet2 joint-task with speech enhancement frontend. Following are the directory structure of speech enhancement and joint-task recipes:
Table of Contents
egs2/
βββ chime4/
β βββ enh1/
β βββ enh_asr1/
β βββ asr1/
βββ l3das22/
β βββ enh1/
| β βββ conf/
| β βββ local/
| | β βββ data.sh
| | β βββ metric.sh
β | β βββ ...
| β βββ enh.sh -> ../../TEMPLATE/enh1/enh.sh
| β βββ run.sh
| β βββ ...
βββ lt_slurp_spatialized/
β βββ enh1/
βββ slurp_spatialized/
β βββ enh_asr1/
| β βββ enh_asr.sh -> ../../TEMPLATE/enh_asr1/enh_asr.sh
| β βββ run.sh
| β βββ ...
β βββ asr1/
βββ ...
βββ TEMPLATE/
βββ enh1/
β βββ enh.sh
βββ enh_asr1/
β βββ enh_asr.sh
βββ enh_diar1/
β βββ enh_diar.sh
βββ enh_st1/
β βββ enh_st.sh
βββ ...Introduction to enh_asr.sh
In egs2/TEMPLATE/enh_asr1/enh_asr.sh, 17 stages are included. Most of the stages are similar to asr.sh and enh.sh.
stage 1 to stage 5: data preparation stages
- Stage 1: Data preparation
- Stage 2: Speech perturbation
- Stage 3: Format wav.scp
- Stage 4: Remove short data
- Stage 5: Generate token_list using BPE
stage 6 to stage 9: language model training steps
- Stage 6: LM collect stats
- Stage 7: LM Training
- Stage 8: Calc perplexity
- Stage 9: Ngram Training
stage 10 to stage 11: joint-task training steps
- Stage 10: Collect stats for the joint task.
- Stage 11: Joint task Training
We have created EnhS2TTask in espnet2/tasks/enh_s2t_train.py, which is used to train the ESPnetEnhS2TModel(AbsESPnetModel) defined in espnet2/enh/espnet_enh_s2t_model.py. The ESPnetEnhS2TModel takes a front-end enh_model, and a back-end s2t_model (such as ASR, SLU, ST, and SD models) as inputs to build a joint-model.
Related python files:
espnet2/
βββ bin/
β βββ enh_s2t_train.py
βββ enh/
β βββ espnet_enh_s2t_model.py
βββ tasks/
β βββ enh_s2t.py
βββ ...stage 12 to stage 13: Inference stages: Decoding and enhancing
- Stage 12: downstream tasks (ASR) Decoding
- Stage 13: Enhance Speech
Related python files:
espnet2/
βββ bin/
β βββ asr_inference.py
β βββ diar_inference.py
β βββ enh_inference.py
β βββ st_inference.py
βββ ...stage 14 to stage 15: Scoring recognition and SSE results
- Stage 14: Scoring ASR
- Stage 15: Scoring Enhancement
stage 16 to stage 17: model uploading steps
- Stage 16: Pack model
- Stage 17: Upload model to Hugging Face
