从文本中识别和提取与网络安全相关的实体

论文标题

从文本中识别和提取与网络安全相关的实体

Recognizing and Extracting Cybersecurtity-relevant Entities from Text

论文作者

Hanks, Casey, Maiden, Michael, Ranade, Priyanka, Finin, Tim, Joshi, Anupam

论文摘要

网络威胁智能（CTI）是描述威胁媒介，漏洞和攻击的信息，通常用作基于AI的网络防御系统（例如网络安全知识图（CKG））的培训数据。强烈需要开发社区访问的数据集来培训现有的基于AI的网络安全管道，以有效，准确地从CTI中提取有意义的见解。我们已经从各种开放源中创建了一个初始的非结构化CTI语料库，我们使用SPACY框架并探索自学习方法来自动识别网络安全实体，用于训练和测试网络安全实体模型。我们还描述了应用网络安全域实体与Wikidata现有世界知识联系起来的方法。我们未来的工作将调查和测试Spacy NLP工具，并创建方法，以连续整合从文本中提取的新信息。

Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools and create methods for continuous integration of new information extracted from text.

下载PDF全文

下载文献需遵守相关版权规定

论文标题