可解释的规则通过双层次优化非线性决策树的分裂套件进行分类问题发现

论文标题

可解释的规则通过双层次优化非线性决策树的分裂套件进行分类问题发现

Interpretable Rule Discovery Through Bilevel Optimization of Split-Rules of Nonlinear Decision Trees for Classification Problems

论文作者

Dhebar, Yashesh, Deb, Kalyanmoy

论文摘要

对于涉及设计，控制和其他实际目的的监督分类问题，用户不仅有兴趣找到高度准确的分类器，而且还要求可以轻松解释所获得的分类器。尽管分类器的可解释性的定义因情况而异，但在这里，通过可解释的分类器，我们将其限制为以简单的数学术语表示。作为一种新颖的方法，我们使用非线性决策树（NLDT）代表分类器作为简单数学规则的组装。树的每个条件（非末端）节点代表一个涉及特征的非线性数学规则（拆分规则），以将给定条件节点中的数据集划分为两个非重叠子集。这种分区旨在最大程度地减少由此产生的儿童节点的杂质。通过限制在每个条件节点和决策树的深度上的分裂规则的结构，可以确保分类器的可解释性。使用进化的双层优化算法获得了给定条件节点处的非线性拆分规则，尽管高级算法的重点是到达分裂规则的可解释结构，但下层可以实现最适合的单个体重（系数），以最大程度地减少两个含义的儿童nodes node。在许多受控的测试问题，现有的基准问题和工业问题上证明了拟议算法的性能。两到500场问题的结果令人鼓舞，并为将拟议方法应用于更具挑战性和复杂的分类任务的进一步范围。

For supervised classification problems involving design, control, other practical purposes, users are not only interested in finding a highly accurate classifier, but they also demand that the obtained classifier be easily interpretable. While the definition of interpretability of a classifier can vary from case to case, here, by a humanly interpretable classifier we restrict it to be expressed in simplistic mathematical terms. As a novel approach, we represent a classifier as an assembly of simple mathematical rules using a non-linear decision tree (NLDT). Each conditional (non-terminal) node of the tree represents a non-linear mathematical rule (split-rule) involving features in order to partition the dataset in the given conditional node into two non-overlapping subsets. This partitioning is intended to minimize the impurity of the resulting child nodes. By restricting the structure of split-rule at each conditional node and depth of the decision tree, the interpretability of the classifier is assured. The non-linear split-rule at a given conditional node is obtained using an evolutionary bilevel optimization algorithm, in which while the upper-level focuses on arriving at an interpretable structure of the split-rule, the lower-level achieves the most appropriate weights (coefficients) of individual constituents of the rule to minimize the net impurity of two resulting child nodes. The performance of the proposed algorithm is demonstrated on a number of controlled test problems, existing benchmark problems, and industrial problems. Results on two to 500-feature problems are encouraging and open up further scopes of applying the proposed approach to more challenging and complex classification tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题