论文标题

IRB-NLP在Semeval-2022任务1:探索单词及其语义表示之间的关系

IRB-NLP at SemEval-2022 Task 1: Exploring the Relationship Between Words and Their Semantic Representations

论文作者

Korenčić, Damir, Grubišić, Ivan

论文摘要

单词及其描述,单词及其嵌入之间有什么关系?描述和嵌入都是单词的语义表示。但是,这些表示形式中仍然存在哪些原始单词中的信息?或更重要的是,这两种表示共享的有关一个单词的哪些信息?定义建模和反向字典是解决这些问题的两个相反的学习任务。定义建模任务的目的是调查单词嵌入中的信息的力量,以人类可以理解的方式表达单词的含义 - 作为词典定义。相反,反向字典任务探讨了直接从其定义中预测单词嵌入的能力。在本文中,通过解决这两个任务,我们正在探索单词及其语义表示之间的关系。我们根据在Codwoe数据集上进行的描述性,探索性和预测数据分析介绍了我们的发现。我们详细概述了我们为定义建模和反向字典任务设计的系统,并在几个子任务中获得了Semeval-2022 Codwoe挑战的最高分。我们希望我们有关预测模型和我们提供的数据分析的实验结果将在将来对单词表示及其关系的探索中有用。

What is the relation between a word and its description, or a word and its embedding? Both descriptions and embeddings are semantic representations of words. But, what information from the original word remains in these representations? Or more importantly, which information about a word do these two representations share? Definition Modeling and Reverse Dictionary are two opposite learning tasks that address these questions. The goal of the Definition Modeling task is to investigate the power of information laying inside a word embedding to express the meaning of the word in a humanly understandable way -- as a dictionary definition. Conversely, the Reverse Dictionary task explores the ability to predict word embeddings directly from its definition. In this paper, by tackling these two tasks, we are exploring the relationship between words and their semantic representations. We present our findings based on the descriptive, exploratory, and predictive data analysis conducted on the CODWOE dataset. We give a detailed overview of the systems that we designed for Definition Modeling and Reverse Dictionary tasks, and that achieved top scores on SemEval-2022 CODWOE challenge in several subtasks. We hope that our experimental results concerning the predictive models and the data analyses we provide will prove useful in future explorations of word representations and their relationships.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源