论文标题
通过自动编码器潜在空间嵌入进行协作数据共享的隐私机器学习
Privacy-Preserving Machine Learning for Collaborative Data Sharing via Auto-encoder Latent Space Embeddings
论文作者
论文摘要
数据共享过程中的隐私机器学习是一项持续关键的任务,可以对机器学习(ML)模型进行协作培训,而无需共享原始数据源。当组织必须确保在整个ML管道(即培训和推理阶段)中敏感数据保持私密时,这尤其重要。本文提出了一个创新的框架,该框架使用自动编码器使用表示形式学习来生成隐私保护嵌入式数据。因此,组织可以共享数据表示形式,以增加机器学习模型在方案中的性能,其中一个以上的数据源可用于共享的预测性下游任务。
Privacy-preserving machine learning in data-sharing processes is an ever-critical task that enables collaborative training of Machine Learning (ML) models without the need to share the original data sources. It is especially relevant when an organization must assure that sensitive data remains private throughout the whole ML pipeline, i.e., training and inference phases. This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data. Thus, organizations can share the data representation to increase machine learning models' performance in scenarios with more than one data source for a shared predictive downstream task.