论文标题
Crosswoz:一个大规模的中国跨域任务对话数据集
CrossWOZ: A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset
论文作者
论文摘要
为了促进多域(跨域)对话建模以及减轻中国面向任务的数据集的短缺,我们提出了CrossWoz,CrossWoz是第一个大型中国跨域的大型跨域向导,以任务为导向的数据集。它包含5个域名的6K对话会议和102K言语,包括酒店,餐厅,景点,地铁和出租车。此外,语料库包含对话态的丰富注释,对话在用户和系统方面都采取行动。大约60%的对话具有跨域用户目标,这些用户目标有利于域间的依赖性,并鼓励对话中跨域的自然过渡。我们还为管道的面向任务对话系统提供了一个用户模拟器和几种基准模型,这将促进研究人员在该语料库上进行比较和评估其模型。 Crosswoz的大尺寸和丰富的注释使得调查跨域对话建模的各种任务,例如对话状态跟踪,政策学习,用户模拟等。
To advance multi-domain (cross-domain) dialogue modeling as well as alleviate the shortage of Chinese task-oriented datasets, we propose CrossWOZ, the first large-scale Chinese Cross-Domain Wizard-of-Oz task-oriented dataset. It contains 6K dialogue sessions and 102K utterances for 5 domains, including hotel, restaurant, attraction, metro, and taxi. Moreover, the corpus contains rich annotation of dialogue states and dialogue acts at both user and system sides. About 60% of the dialogues have cross-domain user goals that favor inter-domain dependency and encourage natural transition across domains in conversation. We also provide a user simulator and several benchmark models for pipelined task-oriented dialogue systems, which will facilitate researchers to compare and evaluate their models on this corpus. The large size and rich annotation of CrossWOZ make it suitable to investigate a variety of tasks in cross-domain dialogue modeling, such as dialogue state tracking, policy learning, user simulation, etc.