最接近邻居语言模型的适应方法

论文标题

最接近邻居语言模型的适应方法

Adaptation Approaches for Nearest Neighbor Language Models

论文作者

Bhardwaj, Rishabh, Polovets, George, Sunkara, Monica

论文摘要

半参数最近的邻居语言模型（$ k $ nn-LMS）通过利用大规模的邻域检索在外部存储器数据存储中，从而产生了纯粹的参数LMS的令人印象深刻的收益。但是，几乎没有研究将这种模型适应新领域。这项工作试图填补这一空白，并建议适应$ k $ nn-lms的以下方法 - 1）调整基础LM（使用适配器），2）使用知识的退回者模块，在附加的改编数据存储中扩展邻居检索，以及3）调整邻居的权重（得分）。我们分别研究了每个适应策略，以及通过消融实验和在七个适应域进行的一系列评估的综合性能改进。我们的组合适应方法始终超过纯粹的参数适应性和零射击（$ k $ nn-lm）基准，这些基线从适应性数据构建数据存储。平均而言，对于这些基线，跨域的这些基线的困惑改善了17.1％和16％。

Semi-parametric Nearest Neighbor Language Models ($k$NN-LMs) have produced impressive gains over purely parametric LMs, by leveraging large-scale neighborhood retrieval over external memory datastores. However, there has been little investigation into adapting such models for new domains. This work attempts to fill that gap and suggests the following approaches for adapting $k$NN-LMs -- 1) adapting the underlying LM (using Adapters), 2) expanding neighborhood retrieval over an additional adaptation datastore, and 3) adapting the weights (scores) of retrieved neighbors using a learned Rescorer module. We study each adaptation strategy separately, as well as the combined performance improvement through ablation experiments and an extensive set of evaluations run over seven adaptation domains. Our combined adaptation approach consistently outperforms purely parametric adaptation and zero-shot ($k$NN-LM) baselines that construct datastores from the adaptation data. On average, we see perplexity improvements of 17.1% and 16% for these respective baselines, across domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题