论文标题

建立对话系统的模拟聊天:学习从说明中生成对话

Simulated Chats for Building Dialog Systems: Learning to Generate Conversations from Instructions

论文作者

Mohapatra, Biswesh, Pandey, Gaurav, Contractor, Danish, Joshi, Sachindra

论文摘要

流行的对话框数据集(例如Multiwoz)是通过为人群工人提供以自然语言表示的指令来创建的,该指令描述了要完成的任务。人群工人扮演用户和代理商的角色,以生成对话框,以完成涉及预订餐厅桌子,打电话给出租车等的任务。在本文中,我们提出了一种数据创建策略,该策略使用预训练的语言模型GPT2,以模拟通过创建用户机器人和代理机器人的群众之间的互动来模拟人群工人之间的互动。我们使用较小比例的实际人群生成的对话及其相应的说明来训练模拟器。我们证明,通过使用模拟数据,我们可以在两个公开可用数据集上的低资源设置(Multiwoz数据集和角色聊天数据集)上实现重大改进。

Popular dialog datasets such as MultiWOZ are created by providing crowd workers an instruction, expressed in natural language, that describes the task to be accomplished. Crowd workers play the role of a user and an agent to generate dialogs to accomplish tasks involving booking restaurant tables, calling a taxi etc. In this paper, we present a data creation strategy that uses the pre-trained language model, GPT2, to simulate the interaction between crowd workers by creating a user bot and an agent bot. We train the simulators using a smaller percentage of actual crowd-generated conversations and their corresponding instructions. We demonstrate that by using the simulated data, we achieve significant improvements in low-resource settings on two publicly available datasets - the MultiWOZ dataset and the Persona chat dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源