论文标题
个性化的PATE:具有个人隐私保证的私人机器学习
Individualized PATE: Differentially Private Machine Learning with Individual Privacy Guarantees
论文作者
论文摘要
将机器学习(ML)应用于敏感域,需要通过正式的隐私框架(例如差异隐私(DP))对基础培训数据进行隐私保护。但是,通常,培训数据的隐私是以由此产生的ML模型实用程序为代价的。原因之一是DP在所有培训数据点上使用一个统一的隐私预算Epsilon,必须与所有数据持有人之间遇到的最严格的隐私要求保持一致。实际上,不同的数据持有人具有不同的隐私要求,并且数据持有人的数据点具有较低的要求,可以为ML模型的培训过程提供更多信息。为了解决这一需求,我们根据教师合奏(PATE)框架的私人聚合提出了两种新颖的方法,以支持具有个性化隐私保证的ML模型的培训。我们正式描述了这些方法,提供了对其隐私范围的理论分析,并使用MNIST,SVHN和成人收入数据集对其对最终模型效用的影响进行实验评估。我们的经验结果表明,与非个人化基线相比,个性化的隐私方法产生的ML准确性更高。因此,在不同的数据持有人同意在不同的个人隐私级别上贡献其敏感数据的情况下,我们改善了隐私 - 实用性权衡。
Applying machine learning (ML) to sensitive domains requires privacy protection of the underlying training data through formal privacy frameworks, such as differential privacy (DP). Yet, usually, the privacy of the training data comes at the cost of the resulting ML models' utility. One reason for this is that DP uses one uniform privacy budget epsilon for all training data points, which has to align with the strictest privacy requirement encountered among all data holders. In practice, different data holders have different privacy requirements and data points of data holders with lower requirements can contribute more information to the training process of the ML models. To account for this need, we propose two novel methods based on the Private Aggregation of Teacher Ensembles (PATE) framework to support the training of ML models with individualized privacy guarantees. We formally describe the methods, provide a theoretical analysis of their privacy bounds, and experimentally evaluate their effect on the final model's utility using the MNIST, SVHN, and Adult income datasets. Our empirical results show that the individualized privacy methods yield ML models of higher accuracy than the non-individualized baseline. Thereby, we improve the privacy-utility trade-off in scenarios in which different data holders consent to contribute their sensitive data at different individual privacy levels.