论文标题
基于深度聚类网络建立跨语义的半知识基础的分析
The Analysis about Building Cross-lingual Sememe Knowledge Base Based on Deep Clustering Network
论文作者
论文摘要
半eme被定义为人类语言的最低语义单元。半知识库(KBS)包含带有Sememes的单词的单词,已成功地应用于许多NLP任务,我们相信,通过学习最小的含义单位,计算机可以更容易理解人类语言。但是,现有的sememe KB仅基于手动注释,人类注释具有个人理解偏见,而词汇的含义将随着时间的流逝而不断更新和改变,并且人为的方法并不总是实用的。为了解决这个问题,我们提出了一种基于深度聚类网络(DCN)的无监督方法来构建半eme KB,您可以使用任何语言通过此方法来构建KB。我们首先学习多语言单词的分布式表示形式,使用缪斯在单个矢量空间中对齐,通过自我发挥的机制学习每个单词的多层含义,然后使用DNC来群集半eme特征。最后,我们仅使用英语的10维半度空间完成了预测。我们发现,低维空间仍然可以保留SEMEMES的主要特征。
A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain words annotated with sememes, have been successfully applied to many NLP tasks, and we believe that by learning the smallest unit of meaning, computers can more easily understand human language. However, Existing sememe KBs are built on only manual annotation, human annotations have personal understanding biases, and the meaning of vocabulary will be constantly updated and changed with the times, and artificial methods are not always practical. To address the issue, we propose an unsupervised method based on a deep clustering network (DCN) to build a sememe KB, and you can use any language to build a KB through this method. We first learn the distributed representation of multilingual words, use MUSE to align them in a single vector space, learn the multi-layer meaning of each word through the self-attention mechanism, and use a DNC to cluster sememe features. Finally, we completed the prediction using only the 10-dimensional sememe space in English. We found that the low-dimensional space can still retain the main feature of the sememes.