两转的辩论并没有帮助人类回答艰难的阅读理解问题

论文标题

两转的辩论并没有帮助人类回答艰难的阅读理解问题

Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions

论文作者

Parrish, Alicia, Trivedi, Harsh, Nangia, Nikita, Padmakumar, Vishakh, Phang, Jason, Saimbhi, Amanpreet Singh, Bowman, Samuel R.

论文摘要

使用基于语言模型的问题解答系统来帮助人类完成困难任务的一部分受到这些系统产生的文本的不可靠性的限制。使用硬多项选择阅读理解问题作为测试床，我们评估是否向人类提出了两个竞争答案选项的争论，一个是正确的，另一个是不正确的，即使其中一个论点是不可靠且具有欺骗性的，人类法官也能够更准确地表现。如果这很有帮助，我们也许可以通过要求在需要的情况下提出这些参数来增加对基于语言模型的系统的正当信任。先前的研究表明，只有这种格式的一次论点对人类无济于事。但是，由于辩论设置的特征是来回对话，因此我们跟进先前的结果，以测试添加第二轮反对意见是否对人类有帮助。我们发现，无论他们是否可以访问论点，人类都会在我们的任务上表现类似。这些发现表明，在回答阅读理解问题的情况下，辩论不是一种有用的格式。

The use of language-model-based question-answering systems to aid humans in completing difficult tasks is limited, in part, by the unreliability of the text these systems generate. Using hard multiple-choice reading comprehension questions as a testbed, we assess whether presenting humans with arguments for two competing answer options, where one is correct and the other is incorrect, allows human judges to perform more accurately, even when one of the arguments is unreliable and deceptive. If this is helpful, we may be able to increase our justified trust in language-model-based systems by asking them to produce these arguments where needed. Previous research has shown that just a single turn of arguments in this format is not helpful to humans. However, as debate settings are characterized by a back-and-forth dialogue, we follow up on previous results to test whether adding a second round of counter-arguments is helpful to humans. We find that, regardless of whether they have access to arguments or not, humans perform similarly on our task. These findings suggest that, in the case of answering reading comprehension questions, debate is not a helpful format.

下载PDF全文

下载文献需遵守相关版权规定

论文标题