可变形的图形变压器

论文标题

可变形的图形变压器

Deformable Graph Transformer

论文作者

Park, Jinyoung, Yun, Seongjun, Park, Hyeonjin, Kang, Jaewoo, Jeong, Jisu, Kim, Kyung-Min, Ha, Jung-woo, Kim, Hyunwoo J.

论文摘要

基于变压器的模型最近显示了超出自然语言处理和计算机视觉的图形结构数据的表示成功。然而，由于图表上的完整点产物关注的缺点，例如二次复杂性相对于节点的数量和来自巨大无关的节点的消息聚集，因此成功仅限于小规模图。为了解决这些问题，我们提出了可变形的图形变压器（DGT），该图形通过动态采样相关的节点进行稀疏注意，以有效地处理具有线性复杂性的大规模图。具体而言，我们的框架首先构建具有各种标准的多个节点序列，以考虑结构和语义接近。然后，与我们可学习的Katz位置编码结合使用，将稀疏的注意力应用于节点序列，以大大降低计算成本。广泛的实验表明，与基于变压器的图形模型相比，我们的DGT在7 Graph Benchmark数据集上实现了最新的基准数据集，其计算成本减少了2.5-449倍。

Transformer-based models have recently shown success in representation learning on graph-structured data beyond natural language processing and computer vision. However, the success is limited to small-scale graphs due to the drawbacks of full dot-product attention on graphs such as the quadratic complexity with respect to the number of nodes and message aggregation from enormous irrelevant nodes. To address these issues, we propose Deformable Graph Transformer (DGT) that performs sparse attention via dynamically sampled relevant nodes for efficiently handling large-scale graphs with a linear complexity in the number of nodes. Specifically, our framework first constructs multiple node sequences with various criteria to consider both structural and semantic proximity. Then, combining with our learnable Katz Positional Encodings, the sparse attention is applied to the node sequences for learning node representations with a significantly reduced computational cost. Extensive experiments demonstrate that our DGT achieves state-of-the-art performance on 7 graph benchmark datasets with 2.5 - 449 times less computational cost compared to transformer-based graph models with full attention.

下载PDF全文

下载文献需遵守相关版权规定

论文标题