注意您的体重：一项关于移动应用程序中机器学习模型保护不足的大规模研究

论文标题

注意您的体重：一项关于移动应用程序中机器学习模型保护不足的大规模研究

Mind Your Weight(s): A Large-scale Study on Insufficient Machine Learning Model Protection in Mobile Apps

论文作者

Sun, Zhichuang, Sun, Ruimin, Lu, Long, Mislove, Alan

论文摘要

设备机器学习（ML）在移动应用程序中迅速越来越受欢迎。它允许离线模型推断，同时保留用户隐私。但是，被认为是模型所有者的核心知识特性的ML模型现在存储在数十亿个不受信任的设备上，并受到潜在盗窃的影响。泄漏的模型可能会导致严重的财务损失和安全后果。本文介绍了移动设备上ML模型保护的首次实证研究。我们的研究旨在用定量证据回答三个开放问题：应用程序中使用的模型保护程度如何？现有的模型保护技术的鲁棒性如何？有什么影响（被盗）模型会产生？为此，我们建立了一个简单的应用分析管道，并分析了从美国和中国应用市场收集的46,753个流行应用程序。我们确定了涵盖所有流行应用类别的1,468毫升应用程序。我们发现，令人震惊的是，有41％的ML应用程序根本无法保护其模型，这些应用程序可能会从应用程序包中偷偷偷走。即使对于那些使用模型保护或加密的应用程序，我们也能够通过未教育的动态分析技术从66％的模型中提取模型。提取的型号主要是商业产品，用于面部识别，易度检测，ID/银行卡识别和恶意软件检测。我们定量估计了泄漏模型的潜在财务和安全影响，这对于不同的利益相关者可能会达到数百万美元。我们的研究表明，设备模型目前处于泄漏的高风险。攻击者很有动力窃取此类模型。从我们的大规模研究中得出，我们报告了对这个新兴安全问题的见解，并讨论了技术挑战，希望激发对移动设备的强大和实用模型保护的未来研究。

On-device machine learning (ML) is quickly gaining popularity among mobile apps. It allows offline model inference while preserving user privacy. However, ML models, considered as core intellectual properties of model owners, are now stored on billions of untrusted devices and subject to potential thefts. Leaked models can cause both severe financial loss and security consequences. This paper presents the first empirical study of ML model protection on mobile devices. Our study aims to answer three open questions with quantitative evidence: How widely is model protection used in apps? How robust are existing model protection techniques? What impacts can (stolen) models incur? To that end, we built a simple app analysis pipeline and analyzed 46,753 popular apps collected from the US and Chinese app markets. We identified 1,468 ML apps spanning all popular app categories. We found that, alarmingly, 41% of ML apps do not protect their models at all, which can be trivially stolen from app packages. Even for those apps that use model protection or encryption, we were able to extract the models from 66% of them via unsophisticated dynamic analysis techniques. The extracted models are mostly commercial products and used for face recognition, liveness detection, ID/bank card recognition, and malware detection. We quantitatively estimated the potential financial and security impact of a leaked model, which can amount to millions of dollars for different stakeholders. Our study reveals that on-device models are currently at high risk of being leaked; attackers are highly motivated to steal such models. Drawn from our large-scale study, we report our insights into this emerging security problem and discuss the technical challenges, hoping to inspire future research on robust and practical model protection for mobile devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题