论文标题

Itiger:自动发行的标题生成工具

iTiger: An Automatic Issue Title Generation Tool

论文作者

Zhang, Ting, Irsan, Ivana Clairine, Thung, Ferdian, Han, DongGyun, Lo, David, Jiang, Lingxiao

论文摘要

在商业和开源软件中,错误报告或问题用于跟踪错误或功能请求。但是,问题的质量可能会有很大差异。先前的研究发现,质量良好的错误报告往往比质量较差的错误报告更多。作为问题的重要组成部分,标题质量是问题质量的重要方面。此外,通常在列表视图中介绍问题,其中仅存在问题标题和一些元数据。在这种情况下,简洁而准确的标题对于读者要掌握问题的一般概念并促进问题进行分类至关重要。先前的工作将问题生成任务作为一句摘要任务。采用了序列到序列模型来解决此任务。但是,它需要大量特定领域的培训数据,以在发行标题生成中获得良好的表现。最近,从大规模通用语料库中学到的知识的预培训模型在软件工程任务中表现出了很大的成功。 在这项工作中,我们首次尝试使用英语语料库进行预先培训的巴特,以产生问题。我们将微调的BART作为名为ITIGER的网络工具实施,可以根据问题描述提出问题标题。 Itiger在267,094个GitHub问题上进行了微调。我们将Itiger与最先进的方法(即Itape)进行了比较,即在33,438期中进行了比较。自动评估表明,就鲁日-1,鲁日-2,鲁日-L F1分数而言,ITIGER的表现使ITAPE优于ITAPE 29.7%,50.8%和34.1%。手动评估还证明了BART产生的标题是评估者比ITAPE在72.7%的情况下产生的标题更喜欢的。此外,评估人员认为我们的工具是有用且易于使用的。他们也有兴趣将来使用我们的工具。

In both commercial and open-source software, bug reports or issues are used to track bugs or feature requests. However, the quality of issues can differ a lot. Prior research has found that bug reports with good quality tend to gain more attention than the ones with poor quality. As an essential component of an issue, title quality is an important aspect of issue quality. Moreover, issues are usually presented in a list view, where only the issue title and some metadata are present. In this case, a concise and accurate title is crucial for readers to grasp the general concept of the issue and facilitate the issue triaging. Previous work formulated the issue title generation task as a one-sentence summarization task. A sequence-to-sequence model was employed to solve this task. However, it requires a large amount of domain-specific training data to attain good performance in issue title generation. Recently, pre-trained models, which learned knowledge from large-scale general corpora, have shown much success in software engineering tasks. In this work, we make the first attempt to fine-tune BART, which has been pre-trained using English corpora, to generate issue titles. We implemented the fine-tuned BART as a web tool named iTiger, which can suggest an issue title based on the issue description. iTiger is fine-tuned on 267,094 GitHub issues. We compared iTiger with the state-of-the-art method, i.e., iTAPE, on 33,438 issues. The automatic evaluation shows that iTiger outperforms iTAPE by 29.7%, 50.8%, and 34.1%, in terms of ROUGE-1, ROUGE-2, ROUGE-L F1-scores. The manual evaluation also demonstrates the titles generated by BART are preferred by evaluators over the titles generated by iTAPE in 72.7% of cases. Besides, the evaluators deem our tool as useful and easy-to-use. They are also interested to use our tool in the future.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源