攻击与良性网络入侵流量分类

论文标题

攻击与良性网络入侵流量分类

Attack vs Benign Network Intrusion Traffic Classification

论文作者

Andrecut, M.

论文摘要

入侵检测系统（ID）用于监视网络或系统以攻击活动或违反政策。这样的系统应该能够成功识别出与正常流量行为的异常偏差。在这里，我们讨论了使用CSE-CIC-IDS2018数据集构建基于异常ID的机器学习方法。由于该数据集发布了相对较大的论文，因此大多数论文介绍了IDS架构和基于复杂的机器学习方法的结果，例如深神经网络，梯度增强分类器或隐藏的Markov模型。在这里，我们表明可以使用非常简单的最近的邻居分类方法获得类似的结果，从而避免了训练此类复杂模型的固有并发症。最近的邻居算法的优点是：（1）实现非常简单；（2）非常健壮；（3）它没有参数，因此不能过分拟合数据。该结果还表明，当前有一种在机器学习社区中开发过度设计解决方案的趋势。这样的解决方案基于复杂的方法，例如深度学习神经网络，甚至没有考虑与简单但有效方法相对应的基线解决方案。

Intrusion detection systems (IDS) are used to monitor networks or systems for attack activity or policy violations. Such a system should be able to successfully identify anomalous deviations from normal traffic behavior. Here we discuss the machine learning approach to building an anomaly-based IDS using the CSE-CIC-IDS2018 dataset. Since the publication of this dataset a relatively large number of papers have been published, most of them presenting IDS architectures and results based on complex machine learning methods, like deep neural networks, gradient boosting classifiers, or hidden Markov models. Here we show that similar results can be obtained using a very simple nearest neighbor classification approach, avoiding the inherent complications of training such complex models. The advantages of the nearest neighbor algorithm are: (1) it is very simple to implement; (2) it is extremely robust; (3) it has no parameters, and therefore it cannot overfit the data. This result also shows that currently there is a trend of developing over-engineered solutions in the machine learning community. Such solutions are based on complex methods, like deep learning neural networks, without even considering baseline solutions corresponding to simple, but efficient methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题