论文标题
PIE:大规模知识图嵌入推理的参数和推理有效解决方案
PIE: a Parameter and Inference Efficient Solution for Large Scale Knowledge Graph Embedding Reasoning
论文作者
论文摘要
知识图(kg)嵌入方法将实体和关系映射到kg中唯一嵌入的方法已显示出许多推理任务的有希望的结果。但是,两个密集实体和稀疏实体的相同嵌入尺寸将导致参数化(稀疏实体)或拟合(密集实体)。通常,较大的维度设定为获得更好的性能。同时,推理时间随着所有实体的实体数量的数量进行了线性增长,并进行了比较。与大量实体合作时,参数和推断都成为挑战。因此,我们提出了PIE,A \ textbf {p}和\ textbf {i} nperion \ textbf {e} fficient解决方案。受张量分解方法的启发,我们发现将矩阵嵌入矩阵中的实体可以减少一半以上的参数,同时保持可比性的性能。为了加速模型推断,我们提出了一项自制的辅助任务,可以看作是细粒实体键入。通过随机掩盖和恢复实体的联系关系,该任务了解实体和关系的共同存在。利用细粒子键入,我们可以在推理过程中过滤无关的实体,并获得可能需要下线时间要求的目标。链接预测基准的实验证明了所提出的关键功能。此外,我们证明了在开放图基准大规模挑战数据集Wikikg90MV2上提出的解决方案的有效性,并实现了最先进的性能状态。
Knowledge graph (KG) embedding methods which map entities and relations to unique embeddings in the KG have shown promising results on many reasoning tasks. However, the same embedding dimension for both dense entities and sparse entities will cause either over parameterization (sparse entities) or under fitting (dense entities). Normally, a large dimension is set to get better performance. Meanwhile, the inference time grows log-linearly with the number of entities for all entities are traversed and compared. Both the parameter and inference become challenges when working with huge amounts of entities. Thus, we propose PIE, a \textbf{p}arameter and \textbf{i}nference \textbf{e}fficient solution. Inspired from tensor decomposition methods, we find that decompose entity embedding matrix into low rank matrices can reduce more than half of the parameters while maintaining comparable performance. To accelerate model inference, we propose a self-supervised auxiliary task, which can be seen as fine-grained entity typing. By randomly masking and recovering entities' connected relations, the task learns the co-occurrence of entity and relations. Utilizing the fine grained typing, we can filter unrelated entities during inference and get targets with possibly sub-linear time requirement. Experiments on link prediction benchmarks demonstrate the proposed key capabilities. Moreover, we prove effectiveness of the proposed solution on the Open Graph Benchmark large scale challenge dataset WikiKG90Mv2 and achieve the state of the art performance.