论文标题

单个GPU的数据效率:小语言模型转移方法的探索

Data-Efficiency with a Single GPU: An Exploration of Transfer Methods for Small Language Models

论文作者

Albalak, Alon, Shrivastava, Akshat, Sankar, Chinnadhurai, Sagar, Adithya, Ross, Mike

论文摘要

最近已证明多任务学习(MTL),指令调整和提示可以提高大语言模型对新任务的普遍性。但是,在较小的语言模型中,这种方法的好处不太有据可查,一些研究发现了矛盾的结果。在这项工作中,我们探索和隔离了(i)模型大小,(ii)通用MTL,(iii)内域MTL,(iv)指令调整的效果,以及(v)少于5亿个参数的模型的微调微调。我们在零射击设置中的实验表明,模型平均从通用MTL获得了31%的相对改善,并从域中MTL获得了37.6%的相对增益。与大型模型上的先前作品相矛盾,我们发现指令调整为小型模型提供了适度的2%性能提高。

Multi-task learning (MTL), instruction tuning, and prompting have recently been shown to improve the generalizability of large language models to new tasks. However, the benefits of such methods are less well-documented in smaller language models, with some studies finding contradictory results. In this work, we explore and isolate the effects of (i) model size, (ii) general purpose MTL, (iii) in-domain MTL, (iv) instruction tuning, and (v) few-shot fine-tuning for models with fewer than 500 million parameters. Our experiments in the zero-shot setting demonstrate that models gain 31% relative improvement, on average, from general purpose MTL, with an additional 37.6% relative gain from in-domain MTL. Contradictory to prior works on large models, we find that instruction tuning provides a modest 2% performance improvement for small models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源