论文标题
表格式:表文本编码的强大变压器建模
TableFormer: Robust Transformer Modeling for Table-Text Encoding
论文作者
论文摘要
理解表是自然语言理解的重要方面。现有的用于表理解的模型需要对表结构进行线性化,其中行或列顺序被编码为不必要的偏差。这种虚假的偏见使该模型容易受到行和列顺序扰动的影响。此外,先前的工作尚未彻底建模表结构或表文本对齐,从而阻碍了表文本理解能力。在这项工作中,我们提出了一个强大且具有结构意识的表文本编码架构表格,其中表格结构偏见是通过可学习的注意力偏见完全合并的。 TableFormer是(1)严格对行和列订单不变,并且(2)由于其表格感应偏置而可以更好地理解表。 Our evaluations showed that TableFormer outperforms strong baselines in all settings on SQA, WTQ and TabFact table reasoning datasets, and achieves state-of-the-art performance on SQA, especially when facing answer-invariant row and column order perturbations (6% improvement over the best baseline), because previous SOTA models' performance drops by 4% - 6% when facing such perturbations while TableFormer is not affected.
Understanding tables is an important aspect of natural language understanding. Existing models for table understanding require linearization of the table structure, where row or column order is encoded as an unwanted bias. Such spurious biases make the model vulnerable to row and column order perturbations. Additionally, prior work has not thoroughly modeled the table structures or table-text alignments, hindering the table-text understanding ability. In this work, we propose a robust and structurally aware table-text encoding architecture TableFormer, where tabular structural biases are incorporated completely through learnable attention biases. TableFormer is (1) strictly invariant to row and column orders, and, (2) could understand tables better due to its tabular inductive biases. Our evaluations showed that TableFormer outperforms strong baselines in all settings on SQA, WTQ and TabFact table reasoning datasets, and achieves state-of-the-art performance on SQA, especially when facing answer-invariant row and column order perturbations (6% improvement over the best baseline), because previous SOTA models' performance drops by 4% - 6% when facing such perturbations while TableFormer is not affected.