论文标题
推断算法 - 敏捷可变重要性的一般框架
A general framework for inference on algorithm-agnostic variable importance
论文作者
论文摘要
在许多应用中,要评估特征(或特征子集)对预测响应的目的的相对贡献(换句话说,是评估特征的可变重要性)的相对贡献。关于可变重要性评估的最新工作集中在描述特定预测算法范围内特征的重要性。但是,这种评估并不一定会表征特征的预测潜力,并且可能会对这些特征的内在价值产生误导性的反映。为了解决这一限制,我们提出了一个通用框架,用于对可解释的算法 - 无关可变重要性的非参数推断。我们将可变的重要性定义为所有可用功能的甲骨文预测性与所有功能之外的所有功能之间的甲级级别对比度。我们提出了一个非参数有效估计程序,即使使用机器学习技术,也可以构建有效的置信区间。我们还概述了检验无效的重要性假设的有效策略。通过模拟,我们表明我们的建议具有良好的操作特征,并说明了它与针对HIV-1感染的抗体研究中的数据一起使用。
In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response -- in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment does not necessarily characterize the prediction potential of features, and may provide a misleading reflection of the intrinsic value of these features. To address this limitation, we propose a general framework for nonparametric inference on interpretable algorithm-agnostic variable importance. We define variable importance as a population-level contrast between the oracle predictiveness of all available features versus all features except those under consideration. We propose a nonparametric efficient estimation procedure that allows the construction of valid confidence intervals, even when machine learning techniques are used. We also outline a valid strategy for testing the null importance hypothesis. Through simulations, we show that our proposal has good operating characteristics, and we illustrate its use with data from a study of an antibody against HIV-1 infection.