CATBERT：上下文意识到的小伯特，用于检测社会工程电子邮件

论文标题

CATBERT：上下文意识到的小伯特，用于检测社会工程电子邮件

CATBERT: Context-Aware Tiny BERT for Detecting Social Engineering Emails

论文作者

Lee, Younghoo, Saxe, Joshua, Harang, Richard

论文摘要

有针对性的网络钓鱼电子邮件正在上升，并有助于每年从组织中盗窃数十亿美元。虽然可以通过传统的恶意软件签名或机器学习技术检测到电子邮件中附加文件或恶意URL的恶意信号，但识别不包含任何恶意代码并且不会与已知攻击共享单词选择的手工制作的社交社会工程电子邮件，这是一项挑战。为了解决这个问题，我们通过用简单的适配器替换一半的变压器块来微调预训练的BERT模型，以有效地学习自然语言语法和语义的复杂表示。我们的上下文感知网络还从电子邮件标题中学习了电子邮件的内容和上下文功能之间的上下文表示。与Distilbert，LSTM和Logistic回归基线相比，我们的CATBERT（上下文感知的Tiny Bert）的检测率分别为83％，79％和54％的检测率，分别为1％，检测率分别达到87％。我们的模型也比竞争变压器方法更快，并且对对抗性攻击有弹性，这些攻击故意用错别字或同义词替换关键字。

Targeted phishing emails are on the rise and facilitate the theft of billions of dollars from organizations a year. While malicious signals from attached files or malicious URLs in emails can be detected by conventional malware signatures or machine learning technologies, it is challenging to identify hand-crafted social engineering emails which don't contain any malicious code and don't share word choices with known attacks. To tackle this problem, we fine-tune a pre-trained BERT model by replacing the half of Transformer blocks with simple adapters to efficiently learn sophisticated representations of the syntax and semantics of the natural language. Our Context-Aware network also learns the context representations between email's content and context features from email headers. Our CatBERT(Context-Aware Tiny Bert) achieves a 87% detection rate as compared to DistilBERT, LSTM, and logistic regression baselines which achieve 83%, 79%, and 54% detection rates at false positive rates of 1%, respectively. Our model is also faster than competing transformer approaches and is resilient to adversarial attacks which deliberately replace keywords with typos or synonyms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题