论文标题

在神经网络中对独特特征的意外记忆

Unintended memorisation of unique features in neural networks

论文作者

Hartley, John, Tsaftaris, Sotirios A.

论文摘要

神经网络由于倾向于记住和泄漏训练数据而构成隐私风险。我们表明,在训练数据中仅发生一次的独特功能通过在基准成像数据集中训练的歧视性多层感知和卷积神经网络记住。我们为无法提供敏感培训数据的设置设计方法,例如医学成像。我们的设置知道独特的功能,但不知道训练数据,模型权重或唯一功能的标签。我们通过比较经过修改的分布映像,通过比较模型输出分布的KL差异来开发一个分数,以估计模型对独特功能的敏感性。我们发现,防止过度拟合的典型策略并不能防止独特的特征记忆。无论图像的其他功能如何,包含独特功能的图像具有很大的影响力。我们还发现训练种子的记忆存在显着差异。这些结果表明,神经网络对很少发生私人信息构成隐私风险。这种风险在医疗保健应用中更为明显,因为由于数据固定过程不完善,因此可以记住敏感的患者信息时,可以记住它。

Neural networks pose a privacy risk due to their propensity to memorise and leak training data. We show that unique features occurring only once in training data are memorised by discriminative multi-layer perceptrons and convolutional neural networks trained on benchmark imaging datasets. We design our method for settings where sensitive training data is not available, for example medical imaging. Our setting knows the unique feature, but not the training data, model weights or the unique feature's label. We develop a score estimating a model's sensitivity to a unique feature by comparing the KL divergences of the model's output distributions given modified out-of-distribution images. We find that typical strategies to prevent overfitting do not prevent unique feature memorisation. And that images containing a unique feature are highly influential, regardless of the influence the images's other features. We also find a significant variation in memorisation with training seed. These results imply that neural networks pose a privacy risk to rarely occurring private information. This risk is more pronounced in healthcare applications since sensitive patient information can be memorised when it remains in training data due to an imperfect data sanitisation process.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源