论文标题
驯服潜在变分对话政策的连续后代
Taming Continuous Posteriors for Latent Variational Dialogue Policies
论文作者
论文摘要
在潜在行动增强学习(RL)中利用摊销的变分推断已被证明是一种有效的以任务为导向的对话(TOD)系统来优化对话成功的方法。到目前为止,已经被认为是性能的主要驱动力之一。在这项工作中,我们重新访问了潜在RL的高斯差异后代,并表明它们可以产生比分类更好的性能。我们通过简化培训程序并提出方法来实现这一目标,以使潜在对话政策正规化以保持良好的响应连贯性。使用连续的潜在表示,我们的模型在多沃兹基准上实现了最先进的对话成功率,并且还可以很好地与响应相干性的分类潜在方法进行比较。
Utilizing amortized variational inference for latent-action reinforcement learning (RL) has been shown to be an effective approach in Task-oriented Dialogue (ToD) systems for optimizing dialogue success. Until now, categorical posteriors have been argued to be one of the main drivers of performance. In this work we revisit Gaussian variational posteriors for latent-action RL and show that they can yield even better performance than categoricals. We achieve this by simplifying the training procedure and propose ways to regularize the latent dialogue policy to retain good response coherence. Using continuous latent representations our model achieves state of the art dialogue success rate on the MultiWOZ benchmark, and also compares well to categorical latent methods in response coherence.