探索和评估代码生成的个性化模型

论文标题

探索和评估代码生成的个性化模型

Exploring and Evaluating Personalized Models for Code Generation

论文作者

Zlotchevski, Andrei, Drain, Dawn, Svyatkovskiy, Alexey, Clement, Colin, Sundaresan, Neel, Tufano, Michele

论文摘要

大型变压器模型实现了自然语言理解任务的最新状态，并越来越成为建模源代码的基线模型体系结构。变压器通常在大型无监督的语料库中进行预训练，学习令牌表示和与通常可用的文本相关的转换，然后按照特定的下游感兴趣的任务进行微调。虽然微调是一种尝试将模型调整到新领域的尝试和真实方法（例如，在给定主题上进行问题的问题）仍然是一个持续的挑战。在本文中，我们探索并评估了变形金刚的模型，以进行个性化。在为Java方法生成单元测试的背景下，我们评估学习使用多种个性化技术对特定软件项目的个性化。我们考虑三种关键方法：（i）自定义微调，这允许调整所有模型参数；（ii）轻巧的微调，它冻结了大多数模型的参数，可以单独调整令牌嵌入和SoftMax层或单独的最终层；（iii）前缀调整，该调整使模型参数冻结，但优化了特定于项目的小前缀矢量。这些技术中的每一个都提供了总计算成本和预测性能的权衡，我们通过代码和特定任务指标，培训时间和总计算操作进行评估。我们比较了代码生成的这些微调策略，并讨论了各种部署方案中每个策略的潜在概括和成本益处。

Large Transformer models achieved the state-of-the-art status for Natural Language Understanding tasks and are increasingly becoming the baseline model architecture for modeling source code. Transformers are usually pre-trained on large unsupervised corpora, learning token representations and transformations relevant to modeling generally available text, and are then fine-tuned on a particular downstream task of interest. While fine-tuning is a tried-and-true method for adapting a model to a new domain -- for example, question-answering on a given topic -- generalization remains an on-going challenge. In this paper, we explore and evaluate transformer model fine-tuning for personalization. In the context of generating unit tests for Java methods, we evaluate learning to personalize to a specific software project using several personalization techniques. We consider three key approaches: (i) custom fine-tuning, which allows all the model parameters to be tuned; (ii) lightweight fine-tuning, which freezes most of the model's parameters, allowing tuning of the token embeddings and softmax layer only or the final layer alone; (iii) prefix tuning, which keeps model parameters frozen, but optimizes a small project-specific prefix vector. Each of these techniques offers a trade-off in total compute cost and predictive performance, which we evaluate by code and task-specific metrics, training time, and total computational operations. We compare these fine-tuning strategies for code generation and discuss the potential generalization and cost benefits of each in various deployment scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题