论文标题
旨在开发和分析基于公制的软件缺陷严重性预测模型
Towards Developing and Analysing Metric-Based Software Defect Severity Prediction Model
论文作者
论文摘要
在关键的软件系统中,由于缺陷的不断出现,测试人员必须花费大量时间和精力来维护软件。在此类缺陷中,一些严重的缺陷可能会对软件产生不利影响。为了减少测试人员的时间和精力,文献中已经提出了许多机器学习模型,这些机器学习模型使用已记录的缺陷报告自动预测有缺陷的软件模块的严重性。与传统方法相反,在这项工作中,我们提出了一种基于度量的软件缺陷严重性预测(SDSP)模型,该模型使用自训练的半监督学习方法来对有缺陷的软件模块的严重性进行分类。该方法是在未标记和标记的缺陷严重性数据的混合物上构建的。自训练的基于决策树分类器,将伪级标签分配给未标记的实例。这些预测是有希望的,因为自我训练将合适的类标签分配给未标记的实例。 另一方面,许多研究涵盖了提出的预测方法以及缺陷严重性预测模型的方法论方面,从预测模型估算项目属性的差距尚未得到解决。为了弥合差距,我们提出了五种特定项目的措施,例如风险因素(RF),节省预算的百分比(PSB),节省预算的损失(LSB),剩余的服务时间(RST)和无偿服务时间(GST),以从预测中捕获项目胜任。与传统措施相似,这些措施也是根据观察到的混淆矩阵计算得出的。这些措施用于分析预测模型对软件项目的影响。
In a critical software system, the testers have to spend an enormous amount of time and effort to maintain the software due to the continuous occurrence of defects. Among such defects, some severe defects may adversely affect the software. To reduce the time and effort of a tester, many machine learning models have been proposed in the literature, which use the documented defect reports to automatically predict the severity of the defective software modules. In contrast to the traditional approaches, in this work we propose a metric-based software defect severity prediction (SDSP) model that uses a self-training semi-supervised learning approach to classify the severity of the defective software modules. The approach is constructed on a mixture of unlabelled and labelled defect severity data. The self-training works on the basis of a decision tree classifier to assign the pseudo-class labels to the unlabelled instances. The predictions are promising since the self-training successfully assigns the suitable class labels to the unlabelled instances. On the other hand, numerous research studies have covered proposing prediction approaches as well as the methodological aspects of defect severity prediction models, the gap in estimating project attributes from the prediction model remains unresolved. To bridge the gap, we propose five project specific measures such as the Risk-Factor (RF), the Percent of Saved Budget (PSB), the Loss in the Saved Budget (LSB), the Remaining Service Time (RST) and Gratuitous Service Time (GST) to capture project outcomes from the predictions. Similar to the traditional measures, these measures are also calculated from the observed confusion matrix. These measures are used to analyse the impact that the prediction model has on the software project.