通过AI提高Wikipedia的可验证性

论文标题

通过AI提高Wikipedia的可验证性

Improving Wikipedia Verifiability with AI

论文作者

Petroni, Fabio, Broscheit, Samuel, Piktus, Aleksandra, Lewis, Patrick, Izacard, Gautier, Hosseini, Lucas, Dwivedi-Yu, Jane, Lomeli, Maria, Schick, Timo, Mazaré, Pierre-Emmanuel, Joulin, Armand, Grave, Edouard, Riedel, Sebastian

论文摘要

可验证性是Wikipedia的核心内容政策：可能会受到挑战的主张需要受到引用的支持。在线有数百万篇文章，每月发行成千上万的新文章。因此，找到相关资源是一项艰巨的任务：许多主张没有任何支持它们的参考。此外，一旦更新或删除了原始源后，即使现有的引用也可能不支持给定的索赔或过时。因此，维持和提高Wikipedia参考的质量是一个重要的挑战，并且迫切需要更好的工具来协助人类进行这项工作。在这里，我们表明可以在人工智能（AI）的帮助下解决改善参考的过程。我们开发了一个基于神经网络的系统，称为Side，以识别不太可能支持其主张的Wikipedia引用，并随后从网络上推荐更好的索赔。我们对现有的Wikipedia参考文献进行训练，因此从成千上万的Wikipedia编辑的贡献和智慧中学习。使用众包，我们观察到，对于最有可能被我们的系统标记为无法验证的前10％，与最初引用的参考文献相比，人类更喜欢系统建议的替代方案。为了验证我们的系统的适用性，我们建立了一个演示来与讲英语的Wikipedia社区互动，并发现该方面的第一个引文建议收集的偏好比现有的Wikipedia引用的偏好超过60％，因为根据Side的说法，相同的前10％的前10％最有可能无法验证的索赔。我们的结果表明，可以与人类同时使用基于AI的系统来提高Wikipedia的可验证性。更普遍地，我们希望我们的工作可以用来帮助事实检查工作，并在线增加信息的一般信任度。

Verifiability is a core content policy of Wikipedia: claims that are likely to be challenged need to be backed by citations. There are millions of articles available online and thousands of new articles are released each month. For this reason, finding relevant sources is a difficult task: many claims do not have any references that support them. Furthermore, even existing citations might not support a given claim or become obsolete once the original source is updated or deleted. Hence, maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist humans in this effort. Here, we show that the process of improving references can be tackled with the help of artificial intelligence (AI). We develop a neural network based system, called Side, to identify Wikipedia citations that are unlikely to support their claims, and subsequently recommend better ones from the web. We train this model on existing Wikipedia references, therefore learning from the contributions and combined wisdom of thousands of Wikipedia editors. Using crowd-sourcing, we observe that for the top 10% most likely citations to be tagged as unverifiable by our system, humans prefer our system's suggested alternatives compared to the originally cited reference 70% of the time. To validate the applicability of our system, we built a demo to engage with the English-speaking Wikipedia community and find that Side's first citation recommendation collects over 60% more preferences than existing Wikipedia citations for the same top 10% most likely unverifiable claims according to Side. Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia. More generally, we hope that our work can be used to assist fact checking efforts and increase the general trustworthiness of information online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题