论文标题
上下文化的单词嵌入编码类似人类的单词感知知识的各个方面
Contextualized Word Embeddings Encode Aspects of Human-Like Word Sense Knowledge
论文作者
论文摘要
了解单词含义的上下文依赖性变化是词典支持的人类语言理解的关键方面。词典资源(例如WordNet)仅捕获某些与上下文有关的变化;例如,它们通常不会编码彼此之间的感官或离散的单词含义的紧密感或离散性。我们的工作调查了NLP的最新进展,特别是上下文化的单词嵌入,捕获了英语单词感官之间的类似人类的区别,例如多义和同义词。我们从一个基于行为的,基于网络的实验中收集数据,其中参与者在二维空间安排任务中提供了对单词多种文字感官的相关性的判断。我们发现,参与者对感官之间相关性的判断与伯特嵌入空间中感官之间的距离有关。同义感官(例如,作为哺乳动物与蝙蝠作为运动器材的蝙蝠)在嵌入空间中相比要彼此相距甚远,而不是多脑(例如,鸡肉作为动物与鸡肉作为肉类的鸡肉)。我们的发现表明,感官含义的连续空间表示的潜在效用。
Understanding context-dependent variation in word meanings is a key aspect of human language comprehension supported by the lexicon. Lexicographic resources (e.g., WordNet) capture only some of this context-dependent variation; for example, they often do not encode how closely senses, or discretized word meanings, are related to one another. Our work investigates whether recent advances in NLP, specifically contextualized word embeddings, capture human-like distinctions between English word senses, such as polysemy and homonymy. We collect data from a behavioral, web-based experiment, in which participants provide judgments of the relatedness of multiple WordNet senses of a word in a two-dimensional spatial arrangement task. We find that participants' judgments of the relatedness between senses are correlated with distances between senses in the BERT embedding space. Homonymous senses (e.g., bat as mammal vs. bat as sports equipment) are reliably more distant from one another in the embedding space than polysemous ones (e.g., chicken as animal vs. chicken as meat). Our findings point towards the potential utility of continuous-space representations of sense meanings.