ChartQA：一个问题的基准回答有关图表的视觉和逻辑推理

论文标题

ChartQA：一个问题的基准回答有关图表的视觉和逻辑推理

ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning

论文作者

Masry, Ahmed, Long, Do Xuan, Tan, Jia Qing, Joty, Shafiq, Hoque, Enamul

论文摘要

图表非常受欢迎用于分析数据。在探索图表时，人们经常提出各种复杂的推理问题，涉及多个逻辑和算术操作。他们还通常在问题中指出图表的视觉特征。但是，大多数现有的数据集都不关注复杂的推理问题，因为它们的问题是基于模板的，并且答案来自固定的唱机。在这项工作中，我们提出了一个大规模的基准测试，涵盖了9.6k人写的问题以及由人写的图表摘要产生的23.1k问题。为了解决涉及图表上视觉和逻辑推理的基准中的独特挑战，我们以统一的方式提出了两个基于变压器的模型，它们结合了视觉特征和图表的数据表以回答问题。尽管我们的模型在先前的数据集以及我们的基准上实现了最新的结果，但评估还揭示了回答复杂推理问题的一些挑战。

Charts are very popular for analyzing data. When exploring charts, people often ask a variety of complex reasoning questions that involve several logical and arithmetic operations. They also commonly refer to visual features of a chart in their questions. However, most existing datasets do not focus on such complex reasoning questions as their questions are template-based and answers come from a fixed-vocabulary. In this work, we present a large-scale benchmark covering 9.6K human-written questions as well as 23.1K questions generated from human-written chart summaries. To address the unique challenges in our benchmark involving visual and logical reasoning over charts, we present two transformer-based models that combine visual features and the data table of the chart in a unified way to answer questions. While our models achieve the state-of-the-art results on the previous datasets as well as on our benchmark, the evaluation also reveals several challenges in answering complex reasoning questions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题