对神经反应产生的反事实非政策培训

论文标题

对神经反应产生的反事实非政策培训

Counterfactual Off-Policy Training for Neural Response Generation

论文作者

Zhu, Qingfu, Zhang, Weinan, Liu, Ting, Wang, William Yang

论文摘要

由于潜在的响应范围很大，因此开放域对话的生成遭受了数据功能不全问题。在本文中，我们建议通过反事实推理探索潜在的反应。鉴于观察到的响应，反事实推理模型会自动渗透可以采取的替代政策的结果。事后观察中合成的反事实反应比从头开始合成的响应质量更高。对对抗学习框架下的反事实反应的培训有助于探索潜在响应空间的高回报领域。一项关于DailyDialog数据集的实证研究表明，我们的方法显着优于HRED模型以及常规的对抗学习方法。

Open-domain dialogue generation suffers from the data insufficiency problem due to the vast size of potential responses. In this paper, we propose to explore potential responses by counterfactual reasoning. Given an observed response, the counterfactual reasoning model automatically infers the outcome of an alternative policy that could have been taken. The resulting counterfactual response synthesized in hindsight is of higher quality than the response synthesized from scratch. Training on the counterfactual responses under the adversarial learning framework helps to explore the high-reward area of the potential response space. An empirical study on the DailyDialog dataset shows that our approach significantly outperforms the HRED model as well as the conventional adversarial learning approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题