多态语言建模

论文标题

多态语言建模

Multi-Sense Language Modelling

论文作者

Lekkas, Andrea, Schneider-Kamp, Peter, Augenstein, Isabelle

论文摘要

语言模型的有效性受其令牌表示形式的影响，该表示必须编码上下文信息并处理具有多种含义（polysemy）的相同单词形式。当前，没有一个通用的语言建模体系结构明确模型多义。我们提出了一个语言模型，该模型不仅可以预测下一个单词，而且还可以预测上下文的意义。我们认为，这种较高的预测粒度可能对诸如辅助写作之类的最终任务有用，并允许将语言模型与知识库进行更精确的联系。我们发现，多义语言建模需要超越标准语言模型的体系结构，并且在这里提出了一个结构化的预测框架，将任务分解为单词，然后是感官预测任务。为了帮助感官预测，我们利用图形注意力网络，该网络编码单词感官的定义和示例使用。总体而言，我们发现多态语言建模是一项高度挑战的任务，并建议未来的工作集中于创建更多注释的培训数据集。

The effectiveness of a language model is influenced by its token representations, which must encode contextual information and handle the same word form having a plurality of meanings (polysemy). Currently, none of the common language modelling architectures explicitly model polysemy. We propose a language model which not only predicts the next word, but also its sense in context. We argue that this higher prediction granularity may be useful for end tasks such as assistive writing, and allow for more a precise linking of language models with knowledge bases. We find that multi-sense language modelling requires architectures that go beyond standard language models, and here propose a structured prediction framework that decomposes the task into a word followed by a sense prediction task. To aid sense prediction, we utilise a Graph Attention Network, which encodes definitions and example uses of word senses. Overall, we find that multi-sense language modelling is a highly challenging task, and suggest that future work focus on the creation of more annotated training datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题