论文标题
数据窃取对医学图像的攻击:从数据湖中导出网络安全吗?
Data Stealing Attack on Medical Images: Is it Safe to Export Networks from Data Lakes?
论文作者
论文摘要
在保护隐私的机器学习中,很常见的是,学习模型的所有者对数据没有任何物理访问。取而代之的是,仅授予对模型所有者的安全远程访问,而没有任何能够从数据湖检索数据的能力。但是,模型所有者可能希望从远程存储库定期导出受过训练的模型,并且出现问题是否可能导致数据泄漏的风险。在本文中,我们介绍了神经网络出口期间数据窃取攻击的概念。它包括隐藏出口网络中的一些信息,这些信息允许最初存储在该数据湖中的图像数据湖之外的重建。更确切地说,我们表明可以训练可以执行有损失图像压缩的网络,同时解决一些实用程序任务,例如图像分割。然后,通过将压缩解码器网络与一些图像代码一起导出,从而导致数据湖外的图像重建。我们探讨了此类攻击对CT和MR图像数据库的可行性,表明可以获得目标数据集的感知有意义的重建,并且可以随时使用被盗数据集来解决广泛的任务。全面的实验和分析表明,数据窃取攻击应被视为敏感成像数据源的威胁。
In privacy-preserving machine learning, it is common that the owner of the learned model does not have any physical access to the data. Instead, only a secured remote access to a data lake is granted to the model owner without any ability to retrieve data from the data lake. Yet, the model owner may want to export the trained model periodically from the remote repository and a question arises whether this may cause is a risk of data leakage. In this paper, we introduce the concept of data stealing attack during the export of neural networks. It consists in hiding some information in the exported network that allows the reconstruction outside the data lake of images initially stored in that data lake. More precisely, we show that it is possible to train a network that can perform lossy image compression and at the same time solve some utility tasks such as image segmentation. The attack then proceeds by exporting the compression decoder network together with some image codes that leads to the image reconstruction outside the data lake. We explore the feasibility of such attacks on databases of CT and MR images, showing that it is possible to obtain perceptually meaningful reconstructions of the target dataset, and that the stolen dataset can be used in turns to solve a broad range of tasks. Comprehensive experiments and analyses show that data stealing attacks should be considered as a threat for sensitive imaging data sources.