论文标题
H_EVAL:自动语音识别任务的新型混合评估指标
H_eval: A new hybrid evaluation metric for automatic speech recognition tasks
论文作者
论文摘要
许多研究研究了单词错误率(WER)的缺点,作为自动语音识别(ASR)系统的评估度量。由于仅考虑字面的单词级正确性,因此已经开发了基于语义相似性(例如语义距离(SD)和Bertscore)的新评估指标。但是,我们发现这些指标有自己的局限性,例如倾向于过度优先考虑关键字。我们提出了H_EVAL,这是一种针对ASR系统的新型混合评估度量,它考虑了语义正确性和错误率,并且在WER和SD表现不佳的情况下表现出色。与BertScore相比,由于较轻的计算,它的度量计算时间减少了49倍。此外,我们表明H_EVAL与下游NLP任务密切相关。此外,为了减少度量计算时间,我们使用蒸馏技术构建了多个快速且轻巧的模型
Many studies have examined the shortcomings of word error rate (WER) as an evaluation metric for automatic speech recognition (ASR) systems. Since WER considers only literal word-level correctness, new evaluation metrics based on semantic similarity such as semantic distance (SD) and BERTScore have been developed. However, we found that these metrics have their own limitations, such as a tendency to overly prioritise keywords. We propose H_eval, a new hybrid evaluation metric for ASR systems that considers both semantic correctness and error rate and performs significantly well in scenarios where WER and SD perform poorly. Due to lighter computation compared to BERTScore, it offers 49 times reduction in metric computation time. Furthermore, we show that H_eval correlates strongly with downstream NLP tasks. Also, to reduce the metric calculation time, we built multiple fast and lightweight models using distillation techniques