在口语理解系统中查询重写的查询预培训

论文标题

在口语理解系统中查询重写的查询预培训

Pre-Training for Query Rewriting in A Spoken Language Understanding System

论文作者

Chen, Zheng, Fan, Xing, Ling, Yuan, Mathias, Lambert, Guo, Chenlei

论文摘要

查询重写（QR）是一种越来越重要的技术，可以减少由口语理解管道中的错误引起的客户摩擦，其中错误源于各种来源，例如语音识别错误，语言理解错误或实体解决错误。在这项工作中，我们首先提出了一种基于神经回归的方法来重写。然后，受到预先训练的上下文语言嵌入的广泛成功的启发，也是为了补偿QR培训数据不足的一种方式，我们提出了一种基于语言模型（LM）的方法（LM），以与语音助手一起在历史用户对话数据上进行预先培训的查询嵌入。此外，我们建议使用语言理解系统产生的NLU假设来增强预训练。我们的实验表明，预训练提供了丰富的先前信息，并帮助QR任务实现了强大的性能。我们还显示了与NLU假设的联合预培训，这进一步有益。最后，经过预训练后，我们发现一小组重写对足以微调QR模型，以通过对所有QR培训数据进行全面培训来胜过强大的基线。

Query rewriting (QR) is an increasingly important technique to reduce customer friction caused by errors in a spoken language understanding pipeline, where the errors originate from various sources such as speech recognition errors, language understanding errors or entity resolution errors. In this work, we first propose a neural-retrieval based approach for query rewriting. Then, inspired by the wide success of pre-trained contextual language embeddings, and also as a way to compensate for insufficient QR training data, we propose a language-modeling (LM) based approach to pre-train query embeddings on historical user conversation data with a voice assistant. In addition, we propose to use the NLU hypotheses generated by the language understanding system to augment the pre-training. Our experiments show pre-training provides rich prior information and help the QR task achieve strong performance. We also show joint pre-training with NLU hypotheses has further benefit. Finally, after pre-training, we find a small set of rewrite pairs is enough to fine-tune the QR model to outperform a strong baseline by full training on all QR training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题