TabTransFormer：使用上下文嵌入的表格数据建模

论文标题

TabTransFormer：使用上下文嵌入的表格数据建模

TabTransformer: Tabular Data Modeling Using Contextual Embeddings

论文作者

Huang, Xin, Khetan, Ashish, Cvitkovic, Milan, Karnin, Zohar

论文摘要

我们提出了TabTransFormer，这是一种新型的深层表格数据建模架构，用于监督和半监督学习。 TabTransFormer建立在基于自我注意的变压器的基础上。变压器层将分类特征的嵌入到鲁棒的上下文嵌入中，以达到更高的预测准确性。通过对15个公开可用数据集进行的大量实验，我们表明，TabTransFormer在表格数据中的最先进的深度学习方法的表现在平均AUC上的最新深度学习方法至少高于1.0％，并且与基于树的集合模型的性能匹配。此外，我们证明，从TabTransFormer中学到的上下文嵌入对于丢失和嘈杂的数据特征非常强大，并提供了更好的解释性。最后，对于半监督的设置，我们开发了一种无监督的预训练程序来学习数据驱动的上下文嵌入，从而使最先进的方法平均每次AUC提升2.1％。

We propose TabTransformer, a novel deep tabular data modeling architecture for supervised and semi-supervised learning. The TabTransformer is built upon self-attention based Transformers. The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Through extensive experiments on fifteen publicly available datasets, we show that the TabTransformer outperforms the state-of-the-art deep learning methods for tabular data by at least 1.0% on mean AUC, and matches the performance of tree-based ensemble models. Furthermore, we demonstrate that the contextual embeddings learned from TabTransformer are highly robust against both missing and noisy data features, and provide better interpretability. Lastly, for the semi-supervised setting we develop an unsupervised pre-training procedure to learn data-driven contextual embeddings, resulting in an average 2.1% AUC lift over the state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题