信仰：寻求信息对话的忠实基准

论文标题

信仰：寻求信息对话的忠实基准

FaithDial: A Faithful Benchmark for Information-Seeking Dialogue

论文作者

Dziri, Nouha, Kamalloo, Ehsan, Milton, Sivan, Zaiane, Osmar, Yu, Mo, Ponti, Edoardo M., Reddy, Siva

论文摘要

信息寻求对话的目的是用自然语言话语以基于知识来源的自然语言回应。但是，对话系统通常会产生不支持的话语，这是一种被称为幻觉的现象。为了减轻这种行为，我们采用了一个以数据为中心的解决方案，并通过编辑Wikipedia（WOW）基准的Wizard的幻觉响应来创建FaithDial，这是一种无幻觉对话的新基准。我们观察到，信仰比哇更忠实，同时还保持着引人入胜的对话。我们表明，FaithDial可以作为：i）幻觉评论家的训练信号，该幻觉批评者与现有数据集相比，与对话相干相比，在开始基准中，在开始基准中，表现是否是忠实的； ii）高质量的对话世代。我们基准了一系列最先进的模型，并提出了一个辅助对比目标，该目标基于几个自动化量指标实现了最高水平的忠诚和抽象性。此外，我们发现FaithDial的好处将元素转移到其他数据集（例如CMU-DOG和TopicalChat）上。最后，人类评估表明，在信仰训练的模型中产生的反应被认为是更容易解释的，合作的和引人入胜的。

The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on knowledge sources. However, dialogue systems often produce unsupported utterances, a phenomenon known as hallucination. To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (WoW) benchmark. We observe that FaithDial is more faithful than WoW while also maintaining engaging conversations. We show that FaithDial can serve as training signal for: i) a hallucination critic, which discriminates whether an utterance is faithful or not, and boosts the performance by 12.8 F1 score on the BEGIN benchmark compared to existing datasets for dialogue coherence; ii) high-quality dialogue generation. We benchmark a series of state-of-the-art models and propose an auxiliary contrastive objective that achieves the highest level of faithfulness and abstractiveness based on several automated metrics. Further, we find that the benefits of FaithDial generalize to zero-shot transfer on other datasets, such as CMU-Dog and TopicalChat. Finally, human evaluation reveals that responses generated by models trained on FaithDial are perceived as more interpretable, cooperative, and engaging.

下载PDF全文

下载文献需遵守相关版权规定

论文标题