提升和差异化的决策树

论文标题

提升和差异化的决策树

Boosted and Differentially Private Ensembles of Decision Trees

论文作者

Nock, Richard, Henecka, Wilko

论文摘要

促进决策树（DT）分类器的合奏在国际比赛中非常受欢迎，但据我们所知，关于如何使它们\ textit {也} dixial私人（DP），尚无正式知道，直到目前随机森林在DP阶段中占主导地位。我们的论文首先证明了DT的隐私与增强图片涉及一个显着且一般的技术权衡：对于任何适当的损失而言，灵敏度往往会随着损失的提高率而提高。 DT诱导算法从根本上是迭代的，我们的发现意味着选择或调整损失以平衡噪声与效用与分裂节点的效用。为了解决这个问题，我们制造了一个新的参数损失，称为M $α$ -LOSS，正如我们所显示的那样，它允许在敏感性与增强保证的完整范围内精心调整折衷。然后，我们引入\ textIt {客观校准}，作为一种在DT归纳期间适应权衡的方法，以限制花费的隐私预算，同时正式能够在具有很高可能性的有限深度节点上不断增强符合符合性的融合。对19个UCI域的广泛实验表明，即使在无DP环境中，客观校准也具有很高的竞争性。我们的方法往往会非常明显地击败随机森林，尤其是在高DP机制（$ \ varepsilon \ leq 0.1 $）上，即使在较少十倍的树木中，这对于保持DT模型的关键特征至关重要，这对于保持差异私密性：可解释性至关重要。

Boosted ensemble of decision tree (DT) classifiers are extremely popular in international competitions, yet to our knowledge nothing is formally known on how to make them \textit{also} differential private (DP), up to the point that random forests currently reign supreme in the DP stage. Our paper starts with the proof that the privacy vs boosting picture for DT involves a notable and general technical tradeoff: the sensitivity tends to increase with the boosting rate of the loss, for any proper loss. DT induction algorithms being fundamentally iterative, our finding implies non-trivial choices to select or tune the loss to balance noise against utility to split nodes. To address this, we craft a new parametererized proper loss, called the M$α$-loss, which, as we show, allows to finely tune the tradeoff in the complete spectrum of sensitivity vs boosting guarantees. We then introduce \textit{objective calibration} as a method to adaptively tune the tradeoff during DT induction to limit the privacy budget spent while formally being able to keep boosting-compliant convergence on limited-depth nodes with high probability. Extensive experiments on 19 UCI domains reveal that objective calibration is highly competitive, even in the DP-free setting. Our approach tends to very significantly beat random forests, in particular on high DP regimes ($\varepsilon \leq 0.1$) and even with boosted ensembles containing ten times less trees, which could be crucial to keep a key feature of DT models under differential privacy: interpretability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题