边缘的BERT模型的有效微调

论文标题

边缘的BERT模型的有效微调

Efficient Fine-Tuning of BERT Models on the Edge

论文作者

Vucetic, Danilo, Tayaranian, Mohammadreza, Ziaeefard, Maryam, Clark, James J., Meyer, Brett H., Gross, Warren J.

论文摘要

资源受限的设备越来越多地是机器学习应用程序的部署目标。但是，静态模型并不总是足以满足动态环境。对模型的设备培训可以快速适应新方案。随着伯特（Bert）和其他自然语言处理模型（例如，记忆，计算，能量和时间）的增加，深度神经网络的规模不断增加。此外，培训比推断要大得多。因此，资源受限的设备学习非常困难，尤其是在大型BERT样模型中。通过减少微调的记忆使用情况，预训练的BERT模型可以变得足够有效，可以对资源受限的设备进行微调。我们提出了冻结和重新配置（FAR），这是针对BERT样模型的内存有效训练制度，可通过避免不必要的参数更新来减少微调过程中激活图的内存使用。 Distilbert模型和COLA数据集上的微调时间远低于30％，而在内存操作上花费的时间却增加了47％。更广泛地说，胶水和小队数据集的度量性能的降低平均约为1％。

Resource-constrained devices are increasingly the deployment targets of machine learning applications. Static models, however, do not always suffice for dynamic environments. On-device training of models allows for quick adaptability to new scenarios. With the increasing size of deep neural networks, as noted with the likes of BERT and other natural language processing models, comes increased resource requirements, namely memory, computation, energy, and time. Furthermore, training is far more resource intensive than inference. Resource-constrained on-device learning is thus doubly difficult, especially with large BERT-like models. By reducing the memory usage of fine-tuning, pre-trained BERT models can become efficient enough to fine-tune on resource-constrained devices. We propose Freeze And Reconfigure (FAR), a memory-efficient training regime for BERT-like models that reduces the memory usage of activation maps during fine-tuning by avoiding unnecessary parameter updates. FAR reduces fine-tuning time on the DistilBERT model and CoLA dataset by 30%, and time spent on memory operations by 47%. More broadly, reductions in metric performance on the GLUE and SQuAD datasets are around 1% on average.

下载PDF全文

下载文献需遵守相关版权规定

论文标题