论文标题

在机器学习库中检测和理解现实世界的差异性能错误

Detecting and Understanding Real-World Differential Performance Bugs in Machine Learning Libraries

论文作者

Tizpaz-Niari, Saeid, Cerný, Pavol, Trivedi, Ashutosh

论文摘要

降低系统性能的编程错误是广泛的,但是几乎没有工具支持来分析这些错误。我们提出了一种基于差异性能分析的方法---我们发现,尽管大小相同,但其性能会差异很大。为了确保性能的差异是鲁棒的(即,对于大输入也保留),我们比较了单个输入的性能,而且比较了输入类别的性能,在这些输入中,每个类都具有通过其大小参数为参数的相似输入。因此,每个类都由从输入大小到性能的性能函数表示。重要的是,我们还提供了一个解释,说明为什么性能以可以轻易用于修复性能错误的形式而有所不同。 我们方法中的两个主要阶段是与决策树分类器进行模糊和解释的发现,每个分类器都由聚类支持。首先,我们提出了一种进化模糊算法来生成输入。对于这项模糊任务,独特的挑战是,我们不仅需要具有最差性能的输入类,而且还需要一组表现出不同性能的类。我们使用聚类合并类似的输入类,从而显着提高了我们的模糊效率。其次,我们在程序输入和内部分析方面解释了差异性能。我们通过聚类和决策树适应了判别学习方法,以定位可疑的代码区域。 我们将技术应用于一组应用程序。在一组微基准中,我们表明我们的方法在查找输入以表征差异性能方面优于最先进的模糊。在一组案例研究中,我们在流行的机器学习框架中发现并解释了多个性能错误。此后首先报道的这些错误中有四个已由开发人员修复。

Programming errors that degrade the performance of systems are widespread, yet there is little tool support for analyzing these bugs. We present a method based on differential performance analysis---we find inputs for which the performance varies widely, despite having the same size. To ensure that the differences in the performance are robust (i.e. hold also for large inputs), we compare the performance of not only single inputs, but of classes of inputs, where each class has similar inputs parameterized by their size. Thus, each class is represented by a performance function from the input size to performance. Importantly, we also provide an explanation for why the performance differs in a form that can be readily used to fix a performance bug. The two main phases in our method are discovery with fuzzing and explanation with decision tree classifiers, each of which is supported by clustering. First, we propose an evolutionary fuzzing algorithm to generate inputs. For this fuzzing task, the unique challenge is that we not only need the input class with the worst performance, but rather a set of classes exhibiting differential performance. We use clustering to merge similar input classes which significantly improves the efficiency of our fuzzer. Second, we explain the differential performance in terms of program inputs and internals. We adapt discriminant learning approaches with clustering and decision trees to localize suspicious code regions. We applied our techniques to a set of applications. On a set of micro-benchmarks, we show that our approach outperforms state-of-the-art fuzzers in finding inputs to characterize the differential performance. On a set of case-studies, we discover and explain multiple performance bugs in popular machine learning frameworks. Four of these bugs, reported first in this paper, have since been fixed by the developers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源