分层的交叉验证，用于公正和隐私的联合学习

论文标题

分层的交叉验证，用于公正和隐私的联合学习

Stratified cross-validation for unbiased and privacy-preserving federated learning

论文作者

Bey, R., Goussault, R., Benchoufi, M., Porcher, R.

论文摘要

电子记录的大规模收集既是开发更准确的预测模型的机会，又是对隐私的威胁。为了限制隐私曝光，新的增强隐私技术正在出现，例如联合学习，可以实现大规模的数据分析，同时避免在独特的数据库中集中记录，这将代表一个关键的失败点。尽管对隐私保护有希望，但联邦学习可以阻止使用一些数据清洁算法，从而引起新的偏见。在这项工作中，我们着重于重复记录的复发问题，即如果无法正确处理，可能会对模型的性能产生过度欣赏的估计。我们介绍和讨论分层的交叉验证，这是一种验证方法，利用分层技术来防止联合学习设置中的数据泄漏，而无需依赖要求重复数据删除算法。

Large-scale collections of electronic records constitute both an opportunity for the development of more accurate prediction models and a threat for privacy. To limit privacy exposure new privacy-enhancing techniques are emerging such as federated learning which enables large-scale data analysis while avoiding the centralization of records in a unique database that would represent a critical point of failure. Although promising regarding privacy protection, federated learning prevents using some data-cleaning algorithms thus inducing new biases. In this work we focus on the recurrent problem of duplicated records that, if not handled properly, may cause over-optimistic estimations of a model's performances. We introduce and discuss stratified cross-validation, a validation methodology that leverages stratification techniques to prevent data leakage in federated learning settings without relying on demanding deduplication algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题