论文标题
提炼神经网络,以使绿色和更快的依赖解析
Distilling Neural Networks for Greener and Faster Dependency Parsing
论文作者
论文摘要
近年来,自然语言处理研究的碳足迹由于依赖大型和效率低下的神经网络实施而增加。蒸馏是一种网络压缩技术,它试图将知识从大型模型传授到较小的模型。我们使用教师蒸馏来提高Biaffine依赖解析器的效率,该解析器在准确性和解析速度方面获得了最先进的性能(Dozat and Manning,2017年)。当蒸馏到原始模型可训练的参数的20%时,我们仅观察到在许多不同的通用依赖树库中,UAS和LAS的平均降低$ \ sim $ 1点,而在临时时间(GPU)上的基线模型比CPU(GPU)的基线模型快2.30倍(1.19倍)。对于某些树库,我们还会观察到压缩至80 \%时的性能略有增加。最后,通过蒸馏,我们达到了一个解析器,它不仅比宾夕法尼亚州立大学银行(Penn Treebank)上最快的现代解析器更快,而且更准确。
The carbon footprint of natural language processing research has been increasing in recent years due to its reliance on large and inefficient neural network implementations. Distillation is a network compression technique which attempts to impart knowledge from a large model to a smaller one. We use teacher-student distillation to improve the efficiency of the Biaffine dependency parser which obtains state-of-the-art performance with respect to accuracy and parsing speed (Dozat and Manning, 2017). When distilling to 20\% of the original model's trainable parameters, we only observe an average decrease of $\sim$1 point for both UAS and LAS across a number of diverse Universal Dependency treebanks while being 2.30x (1.19x) faster than the baseline model on CPU (GPU) at inference time. We also observe a small increase in performance when compressing to 80\% for some treebanks. Finally, through distillation we attain a parser which is not only faster but also more accurate than the fastest modern parser on the Penn Treebank.