竞争级代码的自动反馈生成

论文标题

竞争级代码的自动反馈生成

Automated Feedback Generation for Competition-Level Code

论文作者

Zhang, Jialu, Li, De, Kolesar, John C., Shi, Hanyuan, Piskac, Ruzica

论文摘要

竞争性编程已成为程序员测试技能的一种流行方式。大规模的在线编程竞赛吸引了数百万经验丰富的程序员相互竞争。竞争级的编程问题本质上是具有挑战性的，参与者通常在第一次尝试时无法解决问题。一些用于竞争编程的在线平台也使程序员也可以解决竞争级别的问题，而对于不正确的练习提交的标准反馈是提交失败的第一个测试案例。通常，失败的测试案例不会为程序员提供足够的信息来解决其代码中的错误，并且在多次失败尝试后放弃了问题。我们提出了Clef，这是第一个数据驱动的工具，可以通过修复程序员的错误提交来自动对竞争级代码产生反馈。关键的发展是，Clef可以通过检查其他程序员随着时间的推移对自己的意见书进行的维修来学习如何为不正确提交的维修。由于不正确的程序与针对同一任务的正确程序之间的差异可能很重要，因此我们引入了一种新的数据结构，合并树，以捕获提交之间的更改。合并树是通用的：它们可以编码大型算法级别的重新设计和小型语句级别的更改。 CLEF应用了它从提交数据库中学习的模式，以生成数据库之外的新提交的维修。我们从CodeForces（世界上最大的竞争性编程平台）评估了Clef的六个现实世界问题。 CLEF在修复程序员的不正确提交方面取得了42.1％的准确性。即使给出了从未独自找到问题解决方案的程序员提交的不正确提交，CLEF也可以在34.1％的时间内修复用户的程序。

Competitive programming has become a popular way for programmers to test their skills. Large-scale online programming contests attract millions of experienced programmers to compete against each other. Competition-level programming problems are challenging in nature, and participants often fail to solve the problem on their first attempt. Some online platforms for competitive programming allow programmers to practice on competition-level problems as well, and the standard feedback for an incorrect practice submission is the first test case that the submission fails. Often, the failed test case does not provide programmers with enough information to resolve the errors in their code, and they abandon the problem after several more unsuccessful attempts. We present Clef, the first data-driven tool that can generate feedback on competition-level code automatically by repairing programmers' incorrect submissions. The key development is that Clef can learn how to generate repairs for incorrect submissions by examining the repairs that other programmers made to their own submissions over time. Since the differences between an incorrect program and a correct program for the same task may be significant, we introduce a new data structure, merge trees, to capture the changes between submissions. Merge trees are versatile: they can encode both large algorithm-level redesigns and small statement-level alterations. Clef applies the patterns it learns from a database of submissions to generate repairs for new submissions outside the database. We evaluated Clef on six real-world problems from Codeforces, the world's largest platform for competitive programming. Clef achieves 42.1% accuracy in repairing programmers' incorrect submissions. Even when given incorrect submissions from programmers who never found the solution to a problem on their own, Clef repairs the users' programs 34.1% of the time.

下载PDF全文

下载文献需遵守相关版权规定

论文标题