后处理网络：使用强化学习优化管道任务对话系统的方法

论文标题

后处理网络：使用强化学习优化管道任务对话系统的方法

Post-processing Networks: Method for Optimizing Pipeline Task-oriented Dialogue Systems using Reinforcement Learning

论文作者

Ohashi, Atsumoto, Higashinaka, Ryuichiro

论文摘要

许多研究提出了通过使用强化学习在系统中的共同训练模块来优化整个管道任务对话系统对话性能的方法。但是，这些方法受到限制，因为它们只能应用于使用可训练的神经方法实施的模块。为了解决此问题，我们提出了一种方法，以优化由使用任意方法进行对话性能的模块组成的管道系统。使用我们的方法，在此系统中安装了称为后处理网络（PPN）的基于神经的组件（PPN），以后处理每个模块的输出。所有PPN均已更新，以通过使用强化学习来提高系统的整体对话性能，而不必每个模块都可以区分。通过对多Woz数据集的对话模拟和人类评估，我们表明我们的方法可以改善由各种模块组成的管道系统的对话性能。

Many studies have proposed methods for optimizing the dialogue performance of an entire pipeline task-oriented dialogue system by jointly training modules in the system using reinforcement learning. However, these methods are limited in that they can only be applied to modules implemented using trainable neural-based methods. To solve this problem, we propose a method for optimizing a pipeline system composed of modules implemented with arbitrary methods for dialogue performance. With our method, neural-based components called post-processing networks (PPNs) are installed inside such a system to post-process the output of each module. All PPNs are updated to improve the overall dialogue performance of the system by using reinforcement learning, not necessitating each module to be differentiable. Through dialogue simulation and human evaluation on the MultiWOZ dataset, we show that our method can improve the dialogue performance of pipeline systems consisting of various modules.

下载PDF全文

下载文献需遵守相关版权规定

论文标题