论文标题
通过替代拼写预测模型改善上下文识别稀有单词
Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model
论文作者
论文摘要
上下文ASR将偏差项列表与音频一起列出,随着ASR使用的更加广泛。我们正在发布上下文偏见列表,以伴随Enation21数据集,为此任务创建公共基准测试。我们使用WENET工具包的端到端ASR模型在此基准测试上介绍了基线结果。我们显示了应用于两种不同解码算法的浅融合上下文偏见的结果。我们的基线结果证实了观察到的观察,即端到端模型尤其是在训练过程中很少或从未见过的单词,并且现有的浅融合技术不能充分解决这个问题。我们提出了一个替代拼写预测模型,与没有其他拼写的上下文偏见相比,相对相对,将稀有单词相对34.7%,而访问量的单词相对97.2%。该模型在概念上类似于先前的工作中使用的模型,但是更容易实现,因为它不依赖发音字典或现有的文本对语音系统。
Contextual ASR, which takes a list of bias terms as input along with audio, has drawn recent interest as ASR use becomes more widespread. We are releasing contextual biasing lists to accompany the Earnings21 dataset, creating a public benchmark for this task. We present baseline results on this benchmark using a pretrained end-to-end ASR model from the WeNet toolkit. We show results for shallow fusion contextual biasing applied to two different decoding algorithms. Our baseline results confirm observations that end-to-end models struggle in particular with words that are rarely or never seen during training, and that existing shallow fusion techniques do not adequately address this problem. We propose an alternate spelling prediction model that improves recall of rare words by 34.7% relative and of out-of-vocabulary words by 97.2% relative, compared to contextual biasing without alternate spellings. This model is conceptually similar to ones used in prior work, but is simpler to implement as it does not rely on either a pronunciation dictionary or an existing text-to-speech system.