论文标题
审查器:一种用于大规模,数据驱动索赔验证的混合定位方法
Scrutinizer: A Mixed-Initiative Approach to Large-Scale, Data-Driven Claim Verification
论文作者
论文摘要
诸如国际能源机构(IEA)之类的组织花费大量时间和金钱来手动检查总结数据的文本文档。审查器系统的目的是通过支持人类事实检查者将文本主张转换为关联数据库上的SQL查询来减少验证开销。 审查器协调人类事实检查员的团队。它通过向用户提出查询或查询片段来减少验证时间。这些建议基于主张文本分类器,在大型文档验证期间逐渐改善。此外,审查器使用临时执行查询候选者来缩小替代方案的范围。验证过程由基于成本的优化器控制。它优化了与用户的互动,并确定索赔验证。对于后者,它认为预期的验证开销以及预期的索赔公用事业是分类器的培训样本。我们根据实际的索赔和数据以及使用IEA使用的专业事实检查器来评估审查器系统和用户研究。我们的实验始终显示出在验证时间的大量节省,而不会降低结果准确性。
Organizations such as the International Energy Agency (IEA) spend significant amounts of time and money to manually fact check text documents summarizing data. The goal of the Scrutinizer system is to reduce verification overheads by supporting human fact checkers in translating text claims into SQL queries on an associated database. Scrutinizer coordinates teams of human fact checkers. It reduces verification time by proposing queries or query fragments to the users. Those proposals are based on claim text classifiers, that gradually improve during the verification of a large document. In addition, Scrutinizer uses tentative execution of query candidates to narrow down the set of alternatives. The verification process is controlled by a cost-based optimizer. It optimizes the interaction with users and prioritizes claim verifications. For the latter, it considers expected verification overheads as well as the expected claim utility as training samples for the classifiers. We evaluate the Scrutinizer system using simulations and a user study, based on actual claims and data and using professional fact checkers employed by IEA. Our experiments consistently demonstrate significant savings in verification time, without reducing result accuracy.