始终牢记目标：研究语义并提高神经词汇替代的性能

论文标题

始终牢记目标：研究语义并提高神经词汇替代的性能

Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution

论文作者

Arefyev, Nikolay, Sheludko, Boris, Podolskiy, Alexander, Panchenko, Alexander

论文摘要

词汇替代，即可以在给定上下文中替换特定目标单词的合理单词的产生，是一种非常有力的技术，可以用作各种NLP应用的骨干，包括单词感知感应和歧义，词汇关系，数据扩展，数据增强等。在本文中，我们在本文中进行了大量的语言，并且使用了较大的语言，并且使用了较大的词汇和词汇，并且使用了较大的词汇，并且曾经使用词汇效果（曾经是词汇）的效果（最多）（大多数）的效果（最大程度地）（ MLMS），例如Context2Vec，Elmo，Bert，Roberta，Xlnet。我们表明，如果正确注入有关目标单词的信息，SOTA LMS/MLMS所取得的竞争性结果可以进一步改善。使用词汇替代数据集的固有评估和单词感官诱导（WSI）数据集的外部评估，比较了每种LM/MLM的几种现有和新的目标注入方法。在两个WSI数据集上，我们获得了新的SOTA结果。此外，我们分析了目标单词及其由不同模型或注释者给出的替代品之间的语义关系类型。

Lexical substitution, i.e. generation of plausible words that can replace a particular target word in a given context, is an extremely powerful technology that can be used as a backbone of various NLP applications, including word sense induction and disambiguation, lexical relation extraction, data augmentation, etc. In this paper, we present a large-scale comparative study of lexical substitution methods employing both rather old and most recent language and masked language models (LMs and MLMs), such as context2vec, ELMo, BERT, RoBERTa, XLNet. We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly. Several existing and new target word injection methods are compared for each LM/MLM using both intrinsic evaluation on lexical substitution datasets and extrinsic evaluation on word sense induction (WSI) datasets. On two WSI datasets we obtain new SOTA results. Besides, we analyze the types of semantic relations between target words and their substitutes generated by different models or given by annotators.

下载PDF全文

下载文献需遵守相关版权规定

论文标题