论文标题
一种基于机器学习的方法来检测生物周期DNA存储系统中的威胁
A Machine Learning-based Approach to Detect Threats in Bio-Cyber DNA Storage Systems
论文作者
论文摘要
数据存储是本世纪的主要计算问题之一。存储设备不仅会收敛到严格的物理限制,而且用户生成的数据量也以令人难以置信的速度增长。为了面对这些挑战,数据中心在过去几十年中不断增长。但是,这种增长是有代价的,尤其是从环境的角度来看。在各种有前途的媒体中,DNA是最迷人的候选人之一。在我们以前的工作中,我们提出了一种自动档案结构,该档案结构使用生物工程细菌存储和检索以前编码为DNA的数据。这种存储技术是生物学媒体如何提供发电量的储存解决方案的一个例子。这些生物学媒体与古典媒体之间的相似之处也可能是一个缺点,因为恶意政党可能使用生物仪器和技术来复制对以前档案系统的传统攻击。在本文中,首先,我们分析存储系统的主要特征以及可以在其上执行的不同类型的攻击。然后,为了识别持续的攻击,我们提出和评估依赖传统指标和机器学习算法的检测技术。我们为此目的确定并适应两个合适的指标,即概括性熵和信息距离。此外,我们训练有素的模型的AUROC超过0.99,而AUPRC超过0.91。
Data storage is one of the main computing issues of this century. Not only storage devices are converging to strict physical limits, but also the amount of data generated by users is growing at an unbelievable rate. To face these challenges, data centres grew constantly over the past decades. However, this growth comes with a price, particularly from the environmental point of view. Among various promising media, DNA is one of the most fascinating candidate. In our previous work, we have proposed an automated archival architecture which uses bioengineered bacteria to store and retrieve data, previously encoded into DNA. This storage technique is one example of how biological media can deliver power-efficient storing solutions. The similarities between these biological media and classical ones can also be a drawback, as malicious parties might replicate traditional attacks on the former archival system, using biological instruments and techniques. In this paper, first we analyse the main characteristics of our storage system and the different types of attacks that could be executed on it. Then, aiming at identifying on-going attacks, we propose and evaluate detection techniques, which rely on traditional metrics and machine learning algorithms. We identify and adapt two suitable metrics for this purpose, namely generalized entropy and information distance. Moreover, our trained models achieve an AUROC over 0.99 and AUPRC over 0.91.