核心：伯特有效的粗制训练框架

论文标题

核心：伯特有效的粗制训练框架

CoRe: An Efficient Coarse-refined Training Framework for BERT

论文作者

Yang, Cheng, Wang, Shengnan, Li, Yuechuan, Yang, Chao, Yan, Ming, Zhang, Jingqiao, Lin, Fangquan

论文摘要

近年来，伯特（Bert）在许多自然语言处理任务上取得了重大突破，并引起了人们的关注。尽管BERT模型的准确性获得了，但通常涉及大量参数，并且需要在大量数据集中进行培训，因此培训这种模型在计算上非常具有挑战性且耗时。因此，培训效率应该是一个关键问题。在本文中，我们提出了一个新型的粗制训练框架，以加快BERT的训练。具体而言，我们将BERT的训练过程分为两个阶段。在第一阶段，通过引入快速注意机制并分解了馈电网络子层中的大参数，我们构建了一个放松的BERT模型，该模型比原始BERT的参数要少得多，并且模型复杂性要低得多，因此可以快速训练放松的模型。在第二阶段，我们将受过训练的宽松BERT模型转换为原始BERT，并进一步重新训练该模型。得益于放松模型提供的所需初始化，与训练原始的BERT模型和随机初始化相比，再培训阶段需要更少的训练步骤。实验结果表明，所提出的核心框架可以大大减少训练时间而不降低性能。

In recent years, BERT has made significant breakthroughs on many natural language processing tasks and attracted great attentions. Despite its accuracy gains, the BERT model generally involves a huge number of parameters and needs to be trained on massive datasets, so training such a model is computationally very challenging and time-consuming. Hence, training efficiency should be a critical issue. In this paper, we propose a novel coarse-refined training framework named CoRe to speed up the training of BERT. Specifically, we decompose the training process of BERT into two phases. In the first phase, by introducing fast attention mechanism and decomposing the large parameters in the feed-forward network sub-layer, we construct a relaxed BERT model which has much less parameters and much lower model complexity than the original BERT, so the relaxed model can be quickly trained. In the second phase, we transform the trained relaxed BERT model into the original BERT and further retrain the model. Thanks to the desired initialization provided by the relaxed model, the retraining phase requires much less training steps, compared with training an original BERT model from scratch with a random initialization. Experimental results show that the proposed CoRe framework can greatly reduce the training time without reducing the performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题