在开发人员文本讨论中，对IoT安全方面的物联网安全方面的实证研究

论文标题

在开发人员文本讨论中，对IoT安全方面的物联网安全方面的实证研究

An Empirical Study of IoT Security Aspects at Sentence-Level in Developer Textual Discussions

论文作者

Mandal, Nibir Chandra, Uddin, Gias

论文摘要

物联网是一个快速新兴的范式，现在几乎涵盖了我们现代生活的各个方面。因此，确保物联网设备的安全至关重要。物联网设备与传统计算可能有所不同，从而在物联网设备中设计和实施适当的安全措施可能具有挑战性。我们观察到，物联网开发人员在堆栈溢出（SO）等开发人员论坛中讨论了与安全相关的挑战。但是，我们发现在SO中，IoT安全讨论也可以埋葬在非安全性讨论中。在本文中，我们旨在了解物联网开发人员面临的挑战，同时将安全实践和技术应用于IoT设备。我们有两个目标：（1）开发一个模型，该模型可以自动在SO中找到与安全有关的物联网讨论，并且（2）研究模型输出以了解与IoT开发人员安全相关的挑战。首先，我们从中下载了53k帖子，因此包含有关物联网的讨论。其次，我们手动将53k帖子的5,919个句子标记为1或0。第三，我们使用此基准测试来研究一套深度学习变压器模型。最佳性能模型称为SECBOT。第四，我们将SECBOT应用于整个帖子，并找到大约30K安全性的句子。第五，我们将主题建模应用于与安全有关的句子。然后，我们标记并分类主题。第六，我们分析了主题的演变。我们发现（1）SECBOT是基于深度学习模型Roberta的重建。 SECBOT提供的最佳F1分数为0.935，（2）SECBOT错误分类的样本中有六个错误类别。当关键字/上下文是模棱两可的（例如，网关可以是安全网关或简单网关）时，Secbot主要是错误的，（3）（3）有9个安全主题分为三个类别：软件，硬件和网络，以及（4）最高的主题属于软件安全性，然后是网络安全。

IoT is a rapidly emerging paradigm that now encompasses almost every aspect of our modern life. As such, ensuring the security of IoT devices is crucial. IoT devices can differ from traditional computing, thereby the design and implementation of proper security measures can be challenging in IoT devices. We observed that IoT developers discuss their security-related challenges in developer forums like Stack Overflow(SO). However, we find that IoT security discussions can also be buried inside non-security discussions in SO. In this paper, we aim to understand the challenges IoT developers face while applying security practices and techniques to IoT devices. We have two goals: (1) Develop a model that can automatically find security-related IoT discussions in SO, and (2) Study the model output to learn about IoT developer security-related challenges. First, we download 53K posts from SO that contain discussions about IoT. Second, we manually labeled 5,919 sentences from 53K posts as 1 or 0. Third, we use this benchmark to investigate a suite of deep learning transformer models. The best performing model is called SecBot. Fourth, we apply SecBot on the entire posts and find around 30K security related sentences. Fifth, we apply topic modeling to the security-related sentences. Then we label and categorize the topics. Sixth, we analyze the evolution of the topics in SO. We found that (1) SecBot is based on the retraining of the deep learning model RoBERTa. SecBot offers the best F1-Score of 0.935, (2) there are six error categories in misclassified samples by SecBot. SecBot was mostly wrong when the keywords/contexts were ambiguous (e.g., gateway can be a security gateway or a simple gateway), (3) there are 9 security topics grouped into three categories: Software, Hardware, and Network, and (4) the highest number of topics belongs to software security, followed by network security.

下载PDF全文

下载文献需遵守相关版权规定

论文标题