Abstract
This study proposes a context-aware dynamic pruning method for multilingual and multitask speech foundation models. The method achieves up to 30% reduction in inference cost while maintaining model accuracy. Unlike conventional pruning, which is fixed during training, our method enables flexible module-level pruning based on contextual cues such as language, speaker, and task during inference.