temeralwiki：终身培训和评估不断发展的语言模型的基准

论文标题

temeralwiki：终身培训和评估不断发展的语言模型的基准

TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models

论文作者

Jang, Joel, Ye, Seonghyeon, Lee, Changho, Yang, Sohee, Shin, Joongbo, Han, Janghoon, Kim, Gyeonghun, Seo, Minjoon

论文摘要

随着世界的变化，语言模型（LMS）变得过时；他们常常无法执行需要近期事实信息的任务，而事实信息在培训过程中没有或不同，这一现象称为时间不对。这尤其是一个具有挑战性的问题，因为研究界仍然缺乏评估LMS对经常更新知识语料库（例如Wikipedia）的适应性的连贯数据集。为此，我们介绍了terualwiki，这是一种终身基准的，对于不断发展的LMS，它利用了英语Wikipedia和英语Wikidata连续快照进行培训和评估之间的差异。因此，基准使研究人员可以定期跟踪LM保留以前的知识并在每个时间点获取更新/新知识的能力。我们还发现，通过连续学习方法培训LM在差异数据上的培训与基准测试中的整个快照相似或更好的困惑，计算成本的12倍，这证实了LMS中的事实知识可以通过最小的培训数据通过持续学习来安全地更新。数据集和代码可在https://github.com/joeljang/temporalwiki上找到。

Language Models (LMs) become outdated as the world changes; they often fail to perform tasks requiring recent factual information which was absent or different during training, a phenomenon called temporal misalignment. This is especially a challenging problem because the research community still lacks a coherent dataset for assessing the adaptability of LMs to frequently-updated knowledge corpus such as Wikipedia. To this end, we introduce TemporalWiki, a lifelong benchmark for ever-evolving LMs that utilizes the difference between consecutive snapshots of English Wikipedia and English Wikidata for training and evaluation, respectively. The benchmark hence allows researchers to periodically track an LM's ability to retain previous knowledge and acquire updated/new knowledge at each point in time. We also find that training an LM on the diff data through continual learning methods achieves similar or better perplexity than on the entire snapshot in our benchmark with 12 times less computational cost, which verifies that factual knowledge in LMs can be safely updated with minimal training data via continual learning. The dataset and the code are available at https://github.com/joeljang/temporalwiki.

下载PDF全文

下载文献需遵守相关版权规定

论文标题