论文标题

多臂匪徒,用于资源效率,在线优化语言模型预训练:动态掩蔽的用例

Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic masking

论文作者

Urteaga, Iñigo, Draïdia, Moulay-Zaïdane, Lancewicki, Tomer, Khadivi, Shahram

论文摘要

我们设计和评估了一个贝叶斯优化框架,用于基于变压器的语言模型(TLM)的资源有效培训。 TLM预培训需要高度的计算资源,并引入了许多未解决的设计选择,例如选择其预训练超参数。我们为TLM预训练超参数的顺序选择提出了一个多臂强盗框架,旨在以资源有效的方式优化语言模型性能。我们设计了汤普森采样算法,该算法具有蒙面语言模型(MLM)预训练目标的替代高斯流程奖励模型,以实现其顺序最小化。提出的基于高斯过程的汤普森采样(GP-TS)不是通过固定遮罩概率预训练的MLM预训练,可以通过依次选择掩盖屏蔽超参数来加速预训练以提高性能。我们从经验上证明了GP-TS如何有效地进行培训语言模型,即,在各种环境中,它在更少的时期中会降低MLM损失。此外,GP-TS预先训练的TLM达到了竞争性下游性能,同时避免了昂贵的超参数网格搜索。 GP-TS提供了一个交互式框架,可用于有效,优化的TLM预训练,通过规避昂贵的超参数选择,可以实现大量的计算节省。

We design and evaluate a Bayesian optimization framework for resource efficient pre-training of Transformer-based language models (TLMs). TLM pre-training requires high computational resources and introduces many unresolved design choices, such as selecting its pre-training hyperparameters. We propose a multi-armed bandit framework for the sequential selection of TLM pre-training hyperparameters, aimed at optimizing language model performance, in a resource efficient manner. We design a Thompson sampling algorithm, with a surrogate Gaussian process reward model of the Masked Language Model (MLM) pre-training objective, for its sequential minimization. Instead of MLM pre-training with fixed masking probabilities, the proposed Gaussian process-based Thompson sampling (GP-TS) accelerates pre-training by sequentially selecting masking hyperparameters that improve performance. We empirically demonstrate how GP-TS pre-trains language models efficiently, i.e., it achieves lower MLM loss in fewer epochs, across a variety of settings. In addition, GP-TS pre-trained TLMs attain competitive downstream performance, while avoiding expensive hyperparameter grid search. GP-TS provides an interactive framework for efficient and optimized TLM pre-training that, by circumventing costly hyperparameter selection, enables substantial computational savings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源