论文标题

将极端长度的序列分类,并将内存不断应用于恶意软件检测

Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

论文作者

Raff, Edward, Fleshman, William, Zak, Richard, Anderson, Hyrum S., Filar, Bobby, McLean, Mark

论文摘要

机器学习中的最新作品一直在解决越来越多的大小的投入,网络安全呈现序列分类问题特别极端。对于Windows可执行的恶意软件检测,输入可能超过$ 100 $ MB,这对应于$ t = 100,000,000 $步骤的时间序列。迄今为止,处理此类任务的最接近方法是Malconv,Malconv是一个卷积神经网络,能够处理高达$ t = 2,000,000 $步骤。 CNNS的$ MATHCAL {O}(t)$内存阻止了CNN在恶意软件中的进一步应用。在这项工作中,我们开发了一种新的时间最大池池的方法,这使得序列长度$ t $不变。这使Malconv $ 116 \ times $ $更高的内存效率更高,并且在其原始数据集中训练$ 25.8 \ times $,同时删除输入长度限制到Malconv。我们通过开发一种新的全球渠道门控设计,将这些收益重新投资以改善Malconv架构,从而使我们有一个能够以有效的方式学习跨1亿个时间步长的注意力机制,这是原始的Malconv CNN所缺乏的功能。我们的实施可以在https://github.com/neuromorphiccomputationresearchprogram/malconv2上找到

Recent works within machine learning have been tackling inputs of ever-increasing size, with cybersecurity presenting sequence classification problems of particularly extreme lengths. In the case of Windows executable malware detection, inputs may exceed $100$ MB, which corresponds to a time series with $T=100,000,000$ steps. To date, the closest approach to handling such a task is MalConv, a convolutional neural network capable of processing up to $T=2,000,000$ steps. The $\mathcal{O}(T)$ memory of CNNs has prevented further application of CNNs to malware. In this work, we develop a new approach to temporal max pooling that makes the required memory invariant to the sequence length $T$. This makes MalConv $116\times$ more memory efficient, and up to $25.8\times$ faster to train on its original dataset, while removing the input length restrictions to MalConv. We re-invest these gains into improving the MalConv architecture by developing a new Global Channel Gating design, giving us an attention mechanism capable of learning feature interactions across 100 million time steps in an efficient manner, a capability lacked by the original MalConv CNN. Our implementation can be found at https://github.com/NeuromorphicComputationResearchProgram/MalConv2

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源