论文标题

具有最佳决策树桩的非参数变量筛选

Nonparametric Variable Screening with Optimal Decision Stumps

论文作者

Klusowski, Jason M., Tian, Peter M.

论文摘要

决策树及其合奏赋予了一组丰富的诊断工具,用于预测模型中的排名和筛选变量。尽管广泛使用基于树的可变重要性度量,但固定其理论属性仍然具有挑战性,因此在很大程度上没有探索。为了解决理论和实践之间的差距,我们使用单层推车决策树(决策树桩)得出有限的样本性能保证在非参数模型中可变选择。在可变筛选文献中的标准操作假设下,我们发现每个变量和环境维度的边际信号强度可以分别弱和更高,而不是最先进的非参数变量选择方法。此外,与以前试图通过截断的基础扩展直接估算每个边际投影的边际筛选方法不同,此处使用的拟合模型是一个简单的,简约的决策树桩,从而消除了调整基础项数量的需求。因此,令人惊讶的是,即使出于估计目的而言,决策残势是高度不准确的,但它们仍然可以用于执行一致的模型选择。

Decision trees and their ensembles are endowed with a rich set of diagnostic tools for ranking and screening variables in a predictive model. Despite the widespread use of tree based variable importance measures, pinning down their theoretical properties has been challenging and therefore largely unexplored. To address this gap between theory and practice, we derive finite sample performance guarantees for variable selection in nonparametric models using a single-level CART decision tree (a decision stump). Under standard operating assumptions in variable screening literature, we find that the marginal signal strength of each variable and ambient dimensionality can be considerably weaker and higher, respectively, than state-of-the-art nonparametric variable selection methods. Furthermore, unlike previous marginal screening methods that attempt to directly estimate each marginal projection via a truncated basis expansion, the fitted model used here is a simple, parsimonious decision stump, thereby eliminating the need for tuning the number of basis terms. Thus, surprisingly, even though decision stumps are highly inaccurate for estimation purposes, they can still be used to perform consistent model selection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源