论文标题

生存内核:可扩展和可解释的深内核生存分析,具有准确的保证

Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee

论文作者

Chen, George H.

论文摘要

内核生存分析模型借助内核函数估算了个体生存分布,该分布衡量了任意两个数据点之间的相似性。可以使用深内核存活模型来学习这种内核函数。在本文中,我们提出了一种名为“生存内核”的新的深内核存活模型,该模型以模型解释和理论分析的方式将大型数据集扩展到大型数据集。具体而言,基于最近开发的用于分类和回归的训练集压缩方案,将训练数据分为簇,称为内核网,我们将其扩展到生存分析设置。在测试时,每个数据点表示为这些簇的加权组合,每个数据点可以可视化。对于生存核的特殊情况,我们建立了一个有限样本误差,该误差绑定在预测的生存分布上,该分布是原木因子最佳的。尽管使用上述内核网络压缩策略可以实现测试时间的可伸缩性,但训练过程中的可伸缩性是通过基于XGBOOST等树木合奏的温暖启动程序和加速神经建筑搜索的启发式方法来实现的。在四个不同大小的标准生存分析数据集(大约300万个数据点)上,我们表明与在时间依赖性的一致性指数方面相比,生存核具有很高的竞争力。我们的代码可在以下网址找到:https://github.com/georgehc/survival-kernets

Kernel survival analysis models estimate individual survival distributions with the help of a kernel function, which measures the similarity between any two data points. Such a kernel function can be learned using deep kernel survival models. In this paper, we present a new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis. Specifically, the training data are partitioned into clusters based on a recently developed training set compression scheme for classification and regression called kernel netting that we extend to the survival analysis setting. At test time, each data point is represented as a weighted combination of these clusters, and each such cluster can be visualized. For a special case of survival kernets, we establish a finite-sample error bound on predicted survival distributions that is, up to a log factor, optimal. Whereas scalability at test time is achieved using the aforementioned kernel netting compression strategy, scalability during training is achieved by a warm-start procedure based on tree ensembles such as XGBoost and a heuristic approach to accelerating neural architecture search. On four standard survival analysis datasets of varying sizes (up to roughly 3 million data points), we show that survival kernets are highly competitive compared to various baselines tested in terms of time-dependent concordance index. Our code is available at: https://github.com/georgehc/survival-kernets

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源