KRLS：通过增强关键字学习，在以任务为导向的对话框中改善端到端响应生成

论文标题

KRLS：通过增强关键字学习，在以任务为导向的对话框中改善端到端响应生成

KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning

论文作者

Yu, Xiao, Wu, Qingyang, Qian, Kun, Yu, Zhou

论文摘要

在以任务为导向的对话框（TOD）中，加固学习（RL）算法训练模型，以直接优化与任务相关的指标的响应。但是，RL需要执行探索，由于自动回归序列生成过程缓慢，这可能会耗时。我们研究了一种创建更有效的基于RL的算法以在离线设置中提高TOD性能的方法。首先，我们使用更快的生成过程，通过训练语言模型（LM）通过监督学习，从独立的下一个字分布中进行采样。然后，我们引入了一个细粒度的奖励功能，以通过衡量每个生成的令牌的重要性和语义亲密度来帮助模型在对话框中学习关键信息。多WOZ数据集的实验显示了我们的新培训算法，包括下一个字样采样（KRLS）的关键字增强学习，在端到端响应生成任务上实现了最先进的性能，与使用自动调节生成的标准RL算法相比，培训时间缩短了15％。

In task-oriented dialogs (TOD), reinforcement learning (RL) algorithms train a model to directly optimize response for task-related metrics. However, RL needs to perform exploration, which can be time-consuming due to the slow auto-regressive sequence generation process. We investigate an approach to create a more efficient RL-based algorithm to improve TOD performance in an offline setting. First, we use a faster generation procedure that samples from independent next-word distributions after training the language model (LM) with supervised learning. We then introduce a fine-grained reward function to help the model focus on learning key information in a dialog, by measuring the importance and semantic closeness of each generated token. Experiments on the MultiWoZ dataset show our new training algorithm, Keywords Reinforcement Learning with Next-word Sampling (KRLS), achieves state-of-the-art performance on the end-to-end response generation task, with a 15% training time reduction compared to a standard RL algorithm using auto-regressive generation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题