Cascadebert：通过校准的完整模型加速预训练的语言模型

论文标题

Cascadebert：通过校准的完整模型加速预训练的语言模型

CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade

论文作者

Li, Lei, Lin, Yankai, Chen, Deli, Ren, Shuhuai, Li, Peng, Zhou, Jie, Sun, Xu

论文摘要

动态的早期退出旨在通过在不经过整个模型的情况下发出内部层中的预测来加速预训练的语言模型（PLM）。在本文中，我们通过经验分析了动态早期退出的工作机制，并发现它在高速速度比率下面临性能瓶颈。一方面，浅层中PLM的表示缺乏高级语义信息，因此不足以进行准确的预测。另一方面，内部分类器做出的退出决定是不可靠的，导致了错误的早期预测。相反，我们提出了一个新的框架来加速PLM的推断Cascadebert，该框架以级联的方式动态选择了适当的大小和完整模型，从而为预测提供了全面的表示。我们进一步设计了一个困难的目标，鼓励模型输出类概率，以反映每个实例的实际难度更可靠的级联机制。实验结果表明，与现有的六个分类任务上的现有动态早期退出方法相比，Cascadebert可以在4 $ \ times $加速下实现总体15 \％的提高，从而产生更加校准和准确的预测。

Dynamic early exiting aims to accelerate the inference of pre-trained language models (PLMs) by emitting predictions in internal layers without passing through the entire model. In this paper, we empirically analyze the working mechanism of dynamic early exiting and find that it faces a performance bottleneck under high speed-up ratios. On one hand, the PLMs' representations in shallow layers lack high-level semantic information and thus are not sufficient for accurate predictions. On the other hand, the exiting decisions made by internal classifiers are unreliable, leading to wrongly emitted early predictions. We instead propose a new framework for accelerating the inference of PLMs, CascadeBERT, which dynamically selects proper-sized and complete models in a cascading manner, providing comprehensive representations for predictions. We further devise a difficulty-aware objective, encouraging the model to output the class probability that reflects the real difficulty of each instance for a more reliable cascading mechanism. Experimental results show that CascadeBERT can achieve an overall 15\% improvement under 4$\times$ speed-up compared with existing dynamic early exiting methods on six classification tasks, yielding more calibrated and accurate predictions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题