图形感知器IO：图形结构化数据的一般体系结构

论文标题

图形感知器IO：图形结构化数据的一般体系结构

Graph Perceiver IO: A General Architecture for Graph Structured Data

论文作者

Bae, Seyun, Byun, Hoyoon, Oh, Changdae, Cho, Yoon-Sik, Song, Kyungwoo

论文摘要

多模式的机器学习已被广泛研究以开发通用智能。最近，感知者和感知者IO对各种数据集域和任务显示了竞争成果。但是，最近的著作，感知者和感知者IO专注于异质方式，包括图像，文本，并且对于图形结构化数据集的研究作品很少。图具有与其他数据集（例如文本和图像）不同的邻接矩阵，并且处理拓扑信息并不微不足道。在这项研究中，我们提供了图形感知器IO（GPIO），即图形结构化数据集的感知器IO。我们将GPIO的主要结构保留为感知器IO，因为除了图形结构化数据集外，感知器IO已经很好地处理了不同的数据集。 GPIO是通过利用位置编码和输出查询平滑处理来处理各种数据集（例如图形结构数据，文本和图像）的通用方法。与图形神经网络（GNN）相比，GPIO需要较低的复杂性，并且可以有效地纳入全球和局部信息，这也可以通过实验进行经验验证。此外，我们为多模式的几弹性分类提出了GPIO+，该分类同时合并了图像和图形。在多个任务中，GPIO的基准精度比GNN更高，包括图形分类，节点分类和多模式文本分类，同时在链接预测中也获得了优越的AP和AUC。此外，GPIO+在多模式的几杆分类中胜过GNN。我们的GPIO（+）可以作为处理各种方式和任务的一般体系结构。

Multimodal machine learning has been widely studied for the development of general intelligence. Recently, the Perceiver and Perceiver IO, show competitive results for diverse dataset domains and tasks. However, recent works, Perceiver and Perceiver IO, have focused on heterogeneous modalities, including image, text, and there are few research works for graph structured datasets. A graph has an adjacency matrix different from other datasets such as text and image, and it is not trivial to handle the topological information. In this study, we provide a Graph Perceiver IO (GPIO), the Perceiver IO for the graph structured dataset. We keep the main structure of the GPIO as the Perceiver IO because the Perceiver IO already handles the diverse dataset well, except for the graph structured dataset. The GPIO is a general method that handles diverse datasets, such as graph-structured data, text, and images, by leveraging positional encoding and output query smoothing. Compared to graph neural networks (GNNs), GPIO requires lower complexity and can efficiently incorporate global and local information, which is also empirically validated through experiments. Furthermore, we propose GPIO+ for the multimodal few-shot classification that incorporates both images and graphs simultaneously. GPIO achieves higher benchmark accuracy than GNNs across multiple tasks, including graph classification, node classification, and multimodal text classification, while also attaining superior AP and AUC in link prediction. Additionally, GPIO+ outperforms GNNs in multimodal few-shot classification. Our GPIO(+) can serve as a general architecture for handling various modalities and tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题