论文标题

在口语理解系统中查询重写的查询预培训

Pre-Training for Query Rewriting in A Spoken Language Understanding System

论文作者

Chen, Zheng, Fan, Xing, Ling, Yuan, Mathias, Lambert, Guo, Chenlei

论文摘要

查询重写(QR)是一种越来越重要的技术,可以减少由口语理解管道中的错误引起的客户摩擦,其中错误源于各种来源,例如语音识别错误,语言理解错误或实体解决错误。在这项工作中,我们首先提出了一种基于神经回归的方法来重写。然后,受到预先训练的上下文语言嵌入的广泛成功的启发,也是为了补偿QR培训数据不足的一种方式,我们提出了一种基于语言模型(LM)的方法(LM),以与语音助手一起在历史用户对话数据上进行预先培训的查询嵌入。此外,我们建议使用语言理解系统产生的NLU假设来增强预训练。我们的实验表明,预训练提供了丰富的先前信息,并帮助QR任务实现了强大的性能。我们还显示了与NLU假设的联合预培训,这进一步有益。最后,经过预训练后,我们发现一小组重写对足以微调QR模型,以通过对所有QR培训数据进行全面培训来胜过强大的基线。

Query rewriting (QR) is an increasingly important technique to reduce customer friction caused by errors in a spoken language understanding pipeline, where the errors originate from various sources such as speech recognition errors, language understanding errors or entity resolution errors. In this work, we first propose a neural-retrieval based approach for query rewriting. Then, inspired by the wide success of pre-trained contextual language embeddings, and also as a way to compensate for insufficient QR training data, we propose a language-modeling (LM) based approach to pre-train query embeddings on historical user conversation data with a voice assistant. In addition, we propose to use the NLU hypotheses generated by the language understanding system to augment the pre-training. Our experiments show pre-training provides rich prior information and help the QR task achieve strong performance. We also show joint pre-training with NLU hypotheses has further benefit. Finally, after pre-training, we find a small set of rewrite pairs is enough to fine-tune the QR model to outperform a strong baseline by full training on all QR training data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源