论文标题
部分可观测时空混沌系统的无模型预测
OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering
论文作者
论文摘要
表中的信息可能是文本的重要补充,使基于表的问题答案(QA)具有巨大的价值。处理表的内在复杂性通常会增加模型设计和数据注释的额外负担。在本文中,我们旨在以最少的注释工作开发一个简单的基于表的质量检查模型。由于基于表的质量检查需要问题和表之间的对齐方式以及在多个表元素上执行复杂推理的能力,我们提出了一种杂食性的预读方法,该方法既将自然和合成数据都赋予模型具有这些各自的能力。具体而言,在给定表的情况下,我们利用检索将它们与相关的自然句子配对以进行掩盖预处理,并通过将SQL从表中进行转换为QA损失进行预处理,以合成NL问题。我们在几次和完整的设置中进行了广泛的实验,结果清楚地证明了我们的模型Omnitab的优越性,最佳的多任务方法的绝对增益分别为16.2%和2.7%,在128次和完整的设置中,也建立了一个新的在Wikable Questions上的新型目前。详细的消融和分析揭示了自然和合成数据的不同特征,从而阐明了杂食性预训练的未来方向。可以在https://github.com/jzbjyb/omnitab上获得代码,预读数据和预验证的模型。
The information in tables can be an important complement to text, making table-based question answering (QA) systems of great value. The intrinsic complexity of handling tables often adds an extra burden to both model design and data annotation. In this paper, we aim to develop a simple table-based QA model with minimal annotation effort. Motivated by the fact that table-based QA requires both alignment between questions and tables and the ability to perform complicated reasoning over multiple table elements, we propose an omnivorous pretraining approach that consumes both natural and synthetic data to endow models with these respective abilities. Specifically, given freely available tables, we leverage retrieval to pair them with relevant natural sentences for mask-based pretraining, and synthesize NL questions by converting SQL sampled from tables for pretraining with a QA loss. We perform extensive experiments in both few-shot and full settings, and the results clearly demonstrate the superiority of our model OmniTab, with the best multitasking approach achieving an absolute gain of 16.2% and 2.7% in 128-shot and full settings respectively, also establishing a new state-of-the-art on WikiTableQuestions. Detailed ablations and analyses reveal different characteristics of natural and synthetic data, shedding light on future directions in omnivorous pretraining. Code, pretraining data, and pretrained models are available at https://github.com/jzbjyb/OmniTab.