论文标题
多角色感知奖励分解多代理任务对话政策学习
Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition
论文作者
论文摘要
许多研究应用了强化学习来训练对话政策并表现出巨大的希望。一种常见的方法是使用用户模拟器来获得大量的模拟用户体验来增强学习算法。但是,对现实的用户模拟器进行建模是具有挑战性的。基于规则的模拟器需要重大的域专业知识来进行复杂的任务,并且数据驱动的模拟器需要大量数据,甚至还不清楚如何评估模拟器。为了避免事先明确构建用户模拟器,我们提出了多代理对话框策略学习,该学习将系统和用户视为对话框代理。两个代理相互互动,并共同学习。该方法使用参与者批判性框架来促进预处理并提高可伸缩性。我们还提出了混合价值网络,以使角色感知奖励分解以在面向任务的对话框中整合每个代理的特定角色领域知识。结果表明,我们的方法可以同时成功构建系统策略和用户策略,并且两个代理可以通过对话交互实现高任务成功率。
Many studies have applied reinforcement learning to train a dialog policy and show great promise these years. One common approach is to employ a user simulator to obtain a large number of simulated user experiences for reinforcement learning algorithms. However, modeling a realistic user simulator is challenging. A rule-based simulator requires heavy domain expertise for complex tasks, and a data-driven simulator requires considerable data and it is even unclear how to evaluate a simulator. To avoid explicitly building a user simulator beforehand, we propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents. Two agents interact with each other and are jointly learned simultaneously. The method uses the actor-critic framework to facilitate pretraining and improve scalability. We also propose Hybrid Value Network for the role-aware reward decomposition to integrate role-specific domain knowledge of each agent in the task-oriented dialog. Results show that our method can successfully build a system policy and a user policy simultaneously, and two agents can achieve a high task success rate through conversational interaction.