分析语言模型的包装方法

论文标题

分析语言模型的包装方法

Analyzing Bagging Methods for Language Models

论文作者

Islam, Pranab, Khosla, Shaan, Lok, Arthur, Saxena, Mudit

论文摘要

现代语言模型利用越来越多的参数来实现自然语言理解任务的性能。将这些模型与特定的配置进行结合以进行下游任务，甚至显示出进一步的性能改进。在本文中，我们对包装语言模型进行了分析，并将单语言模型与在最终模型大小上大致相同的包装合奏进行比较。我们探索了一系列模型包装配置，用于自然语言理解任务，最终合奏尺寸从300m参数到1.5b参数，并确定我们的结合方法充其量大致相当于单个LM基线。我们注意到，根据我们的实验中的发现，例如降低方差和较小的性能改善，在特定情况下进行包装和修剪的其他积极影响。

Modern language models leverage increasingly large numbers of parameters to achieve performance on natural language understanding tasks. Ensembling these models in specific configurations for downstream tasks show even further performance improvements. In this paper, we perform an analysis of bagging language models and compare single language models to bagged ensembles that are roughly equivalent in terms of final model size. We explore an array of model bagging configurations for natural language understanding tasks with final ensemble sizes ranging from 300M parameters to 1.5B parameters and determine that our ensembling methods are at best roughly equivalent to single LM baselines. We note other positive effects of bagging and pruning in specific scenarios according to findings in our experiments such as variance reduction and minor performance improvements.

下载PDF全文

下载文献需遵守相关版权规定

论文标题