论文标题
通过图神经网络进行恶意软件家族分析的序列特征提取
Sequence Feature Extraction for Malware Family Analysis via Graph Neural Network
论文作者
论文摘要
恶意软件(恶意软件)对我们的设备和生活造成了很大的伤害。我们渴望了解恶意软件的行为及其威胁。恶意软件的大多数记录文件都是可变长度和基于文本的文件,并带有时间戳,例如事件日志数据和动态分析配置文件。使用时间戳记,我们可以将这些数据分类为基于序列的数据以进行以下分析。但是,很难处理具有可变长度的基于文本的序列。此外,与自然语言文本数据不同,信息安全性中的大多数顺序数据具有特定的属性和结构,例如循环,重复调用,噪声等。以深入分析API调用序列及其结构,我们使用图来表示序列,这些序列可以进一步研究信息和结构,例如Markov模型。因此,我们设计并实施了注意力集中的图形神经网络(AWGCN)来分析API调用序列。通过AWGCN,我们可以获取序列嵌入以分析恶意软件的行为。此外,分类实验结果表明,AWGCN在类似呼叫的数据集中的其他分类器优于其他分类器,并且嵌入可以进一步改善经典模型的性能。
Malicious software (malware) causes much harm to our devices and life. We are eager to understand the malware behavior and the threat it made. Most of the record files of malware are variable length and text-based files with time stamps, such as event log data and dynamic analysis profiles. Using the time stamps, we can sort such data into sequence-based data for the following analysis. However, dealing with the text-based sequences with variable lengths is difficult. In addition, unlike natural language text data, most sequential data in information security have specific properties and structure, such as loop, repeated call, noise, etc. To deeply analyze the API call sequences with their structure, we use graphs to represent the sequences, which can further investigate the information and structure, such as the Markov model. Therefore, we design and implement an Attention Aware Graph Neural Network (AWGCN) to analyze the API call sequences. Through AWGCN, we can obtain the sequence embeddings to analyze the behavior of the malware. Moreover, the classification experiment result shows that AWGCN outperforms other classifiers in the call-like datasets, and the embedding can further improve the classic model's performance.