论文标题

工作的正确工具:匹配模型和实例复杂性

The Right Tool for the Job: Matching Model and Instance Complexities

论文作者

Schwartz, Roy, Stanovsky, Gabriel, Swayamdipta, Swabha, Dodge, Jesse, Smith, Noah A.

论文摘要

随着NLP模型变得更大,执行训练有素的模型需要大量的计算资源,造成货币和环境成本。为了更好地尊重给定的推理预算,我们提出了对上下文表示微调的修改,在推断期间,它允许从简单实例的神经网络计算中进行早期(快速)“退出”,以便进行艰苦的实例,并延迟(且准确)退出。为了实现这一目标,我们将分类器添加到不同层的BERT层,并使用其校准的置信度得分来提前退出决策。我们在两个任务中测试了五个不同数据集的建议修改:三个文本分类数据集和两个自然语言推理基准。我们的方法在几乎所有情况下都提供了有利的速度/准确性权衡,生产的型号的速度是最高五倍,同时保持其准确性。与基线BERT模型相比,我们的方法还需要几乎不需要额外的培训资源(在时间或参数中)。最后,我们的方法减轻了以不同效率水平的多个模型进行昂贵的多种模型的必要性。我们允许用户通过在推理时设置单个变量来使用单个训练的模型来控制推理速度/准确性权衡。我们公开发布代码。

As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs. To better respect a given inference budget, we propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit" from neural network calculations for simple instances, and late (and accurate) exit for hard instances. To achieve this, we add classifiers to different layers of BERT and use their calibrated confidence scores to make early exit decisions. We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks. Our method presents a favorable speed/accuracy tradeoff in almost all cases, producing models which are up to five times faster than the state of the art, while preserving their accuracy. Our method also requires almost no additional training resources (in either time or parameters) compared to the baseline BERT model. Finally, our method alleviates the need for costly retraining of multiple models at different levels of efficiency; we allow users to control the inference speed/accuracy tradeoff using a single trained model, by setting a single variable at inference time. We publicly release our code.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源