我爱你的连锁邮件！让骑士在幻想游戏世界中微笑：开放域的目标对话代理商

论文标题

我爱你的连锁邮件！让骑士在幻想游戏世界中微笑：开放域的目标对话代理商

I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

论文作者

Prabhumoye, Shrimai, Li, Margaret, Urbanek, Jack, Dinan, Emily, Kiela, Douwe, Weston, Jason, Szlam, Arthur

论文摘要

对话研究倾向于区分聊天和面向目标的任务。尽管前者可以说是更自然的，并且更广泛地使用了语言，但后者的指标和直接的学习信号具有更清晰的指标。人类毫不费力地将两者结合在一起，例如参与聊天聊天，目的是交换信息或引起特定响应。在这里，我们在这两个领域之间的鸿沟弥合了丰富的基于文本的幻想环境的环境，在该环境中，代理商和人类都进行了行动和对话。具体来说，我们通过强化学习训练一个面向目标的模型，反对模仿``chat-chat''模型的模型：策略要么学会学主题，要么学会从chit-chat模型中挑选出言语。我们表明，这两种模型的表现都优于逆模型基线，并且可以与他们的对话合作伙伴自然交谈以实现目标。

Dialogue research tends to distinguish between chit-chat and goal-oriented tasks. While the former is arguably more naturalistic and has a wider use of language, the latter has clearer metrics and a straightforward learning signal. Humans effortlessly combine the two, for example engaging in chit-chat with the goal of exchanging information or eliciting a specific response. Here, we bridge the divide between these two domains in the setting of a rich multi-player text-based fantasy environment where agents and humans engage in both actions and dialogue. Specifically, we train a goal-oriented model with reinforcement learning against an imitation-learned ``chit-chat'' model with two approaches: the policy either learns to pick a topic or learns to pick an utterance given the top-K utterances from the chit-chat model. We show that both models outperform an inverse model baseline and can converse naturally with their dialogue partner in order to achieve goals.

下载PDF全文

下载文献需遵守相关版权规定

论文标题