论文标题

复合成分的理解歧义

Sense disambiguation of compound constituents

论文作者

Schackow, Carlo, Conrad, Stefan, Plag, Ingo

论文摘要

在分布语义上对名词名词化合物的含义(例如,星鱼,银行帐户,船屋)的含义中,组成型多义的重要作用在很大程度上没有解决(请参阅《星鱼与星际集团与星际运动员与星际运动员》中的含义)。为了使成分的成分不同含义的平均量为代替语义矢量,而不是歧义的矢量,以查看这些更具体的成分含义有助于整个化合物的含义。本文提出了一种新的方法,解决了这个单词感官歧义的特定问题:设置扩展。我们建立在Mahabal等人开发的方法的基础上。 (2018年)最初旨在解决类比问题。我们修改了它们的方法,以至于可以解决复合成分的理解歧义问题。具有近9000种化合物的数据集的实验结果(Ladec,Gagné等人,2019年)表明,这种方法是成功的,但是成功对化合物证明的频率敏感。

In distributional semantic accounts of the meaning of noun-noun compounds (e.g. starfish, bank account, houseboat) the important role of constituent polysemy remains largely unaddressed(cf. the meaning of star in starfish vs. star cluster vs. star athlete). Instead of semantic vectors that average over the different meanings of a constituent, disambiguated vectors of the constituents would be needed in order to see what these more specific constituent meanings contribute to the meaning of the compound as a whole. This paper presents a novel approach to this specific problem of word sense disambiguation: set expansion. We build on the approach developed by Mahabal et al. (2018) which was originally designed to solve the analogy problem. We modified their method in such a way that it can address the problem of sense disambiguation of compound constituents. The results of experiments with a data set of almost 9000 compounds (LADEC, Gagné et al. 2019) suggest that this approach is successful, yet the success is sensitive to the frequency with which the compounds are attested.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源