KDCONV：中国多域对话数据集用于多转向知识驱动的对话

论文标题

KDCONV：中国多域对话数据集用于多转向知识驱动的对话

KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation

论文作者

Zhou, Hao, Zheng, Chujie, Huang, Kaili, Huang, Minlie, Zhu, Xiaoyan

论文摘要

由于缺乏关于多个主题和知识注释的多转交谈对话数据，知识驱动的对话系统的研究在很大程度上受到了限制。在本文中，我们提出了一个中国多域知识驱动的对话数据集KDCONV，该数据集将多转交流的主题与知识图相结合。我们的语料库包含来自三个领域（电影，音乐和旅行）的4.5k对话，平均转弯数为19.0。这些对话包含有关相关主题和多个主题之间自然过渡的深入讨论。为了促进对该语料库的以下研究，我们提供了几种基准模型。比较结果表明，可以通过引入背景知识来增强模型，但仍有很大的空间来利用知识来对多转向对话进行建模以进行进一步研究。结果还表明，不同领域之间存在明显的性能差异，表明值得进一步探索转移学习和域的适应性。语料库和基准模型公开可用。

The research of knowledge-driven conversational systems is largely limited due to the lack of dialog data which consist of multi-turn conversations on multiple topics and with knowledge annotations. In this paper, we propose a Chinese multi-domain knowledge-driven conversation dataset, KdConv, which grounds the topics in multi-turn conversations to knowledge graphs. Our corpus contains 4.5K conversations from three domains (film, music, and travel), and 86K utterances with an average turn number of 19.0. These conversations contain in-depth discussions on related topics and natural transition between multiple topics. To facilitate the following research on this corpus, we provide several benchmark models. Comparative results show that the models can be enhanced by introducing background knowledge, yet there is still a large space for leveraging knowledge to model multi-turn conversations for further research. Results also show that there are obvious performance differences between different domains, indicating that it is worth to further explore transfer learning and domain adaptation. The corpus and benchmark models are publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题