论文标题

探索和评估代码生成的个性化模型

Exploring and Evaluating Personalized Models for Code Generation

论文作者

Zlotchevski, Andrei, Drain, Dawn, Svyatkovskiy, Alexey, Clement, Colin, Sundaresan, Neel, Tufano, Michele

论文摘要

大型变压器模型实现了自然语言理解任务的最新状态,并越来越成为建模源代码的基线模型体系结构。变压器通常在大型无监督的语料库中进行预训练,学习令牌表示和与通常可用的文本相关的转换,然后按照特定的下游感兴趣的任务进行微调。虽然微调是一种尝试将模型调整到新领域的尝试和真实方法(例如,在给定主题上进行问题的问题)仍然是一个持续的挑战。在本文中,我们探索并评估了变形金刚的模型,以进行个性化。在为Java方法生成单元测试的背景下,我们评估学习使用多种个性化技术对特定软件项目的个性化。我们考虑三种关键方法:(i)自定义微调,这允许调整所有模型参数; (ii)轻巧的微调,它冻结了大多数模型的参数,可以单独调整令牌嵌入和SoftMax层或单独的最终层; (iii)前缀调整,该调整使模型参数冻结,但优化了特定于项目的小前缀矢量。这些技术中的每一个都提供了总计算成本和预测性能的权衡,我们通过代码和特定任务指标,培训时间和总计算操作进行评估。我们比较了代码生成的这些微调策略,并讨论了各种部署方案中每个策略的潜在概括和成本益处。

Large Transformer models achieved the state-of-the-art status for Natural Language Understanding tasks and are increasingly becoming the baseline model architecture for modeling source code. Transformers are usually pre-trained on large unsupervised corpora, learning token representations and transformations relevant to modeling generally available text, and are then fine-tuned on a particular downstream task of interest. While fine-tuning is a tried-and-true method for adapting a model to a new domain -- for example, question-answering on a given topic -- generalization remains an on-going challenge. In this paper, we explore and evaluate transformer model fine-tuning for personalization. In the context of generating unit tests for Java methods, we evaluate learning to personalize to a specific software project using several personalization techniques. We consider three key approaches: (i) custom fine-tuning, which allows all the model parameters to be tuned; (ii) lightweight fine-tuning, which freezes most of the model's parameters, allowing tuning of the token embeddings and softmax layer only or the final layer alone; (iii) prefix tuning, which keeps model parameters frozen, but optimizes a small project-specific prefix vector. Each of these techniques offers a trade-off in total compute cost and predictive performance, which we evaluate by code and task-specific metrics, training time, and total computational operations. We compare these fine-tuning strategies for code generation and discuss the potential generalization and cost benefits of each in various deployment scenarios.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源