论文标题

改进和模仿:通过强化学习和人类示范来减少说服对话的重复和不一致

Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion Dialogues via Reinforcement Learning and Human Demonstration

论文作者

Shi, Weiyan, Li, Yu, Sahay, Saurav, Yu, Zhou

论文摘要

说服对话系统反映了机器在言语交流之外进行战略性动作的能力,从而使自己与面向任务或开放式对话系统进行区分,并具有自己的独特价值。但是,重复和不一致问题仍然存在于对话响应的产生中,并且可能会严重影响用户体验并阻碍说服结果。此外,尽管强化学习(RL)方法在诸如游戏之类的战略任务中取得了巨大成功,但他们需要一个复杂的用户模拟器才能为对话系统提供实时反馈,这限制了RL在说服对话中的应用。为了解决更好的说服对话系统的这些问题,我们将RL应用于没有用户模拟器的语言模型基线,并提炼句子级别的信息,以通过奖励进行重复,不一致和任务相关性。此外,为了更好地完成说服任务,该模型从人类的示范中学习以模仿人类的说服行为并选择最有说服力的反应。实验表明,我们的模型在自动指标和人类评估结果上都优于先前的最新对话模型,并根据用户反馈产生更多样化,一致和说服力的对话。

Persuasion dialogue systems reflect the machine's ability to make strategic moves beyond verbal communication, and therefore differentiate themselves from task-oriented or open-domain dialogue systems and have their own unique values. However, the repetition and inconsistency problems still persist in dialogue response generation and could substantially impact user experience and impede the persuasion outcome. Besides, although reinforcement learning (RL) approaches have achieved big success in strategic tasks such as games, they require a sophisticated user simulator to provide real-time feedback to the dialogue system, which limits the application of RL on persuasion dialogues. To address these issues towards a better persuasion dialogue system, we apply RL to refine a language model baseline without user simulators, and distill sentence-level information about repetition, inconsistency, and task relevance through rewards. Moreover, to better accomplish the persuasion task, the model learns from human demonstration to imitate human persuasion behavior and selects the most persuasive responses. Experiments show that our model outperforms previous state-of-the-art dialogue models on both automatic metrics and human evaluation results on a donation persuasion task, and generates more diverse, consistent and persuasive conversations according to the user feedback.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源