论文标题
使用容器化进行机器学习的交通生成
Traffic Generation using Containerization for Machine Learning
论文作者
论文摘要
目前,由于缺乏足够的数据,在良性和攻击流量方面,数据驱动网络入侵检测方法的设计和评估目前都被阻止了。现有的数据集主要集中在包含虚拟机的孤立实验室环境中,以提供对计算机交互的更多控制,并防止任何恶意代码逃脱。但是,此过程导致数据集缺乏四个核心属性:异质性,地面真相流量标签,大数据大小和当代内容。在这里,我们提出了一个基于Docker容器的新型数据生成框架,该框架系统地解决了这些问题。为此,我们将合适的容器安排到相关的流量通信方案和subscenarios中,这些方案和仿真的适当输入随机化和WAN。通过通过容器化依靠过程隔离,我们可以将流量事件与单个过程匹配,并实现单个流量场景的可扩展性和模块化。我们执行两个实验来评估框架的可重复性和流量性能,并在交通分类示例中演示框架的有用性。
The design and evaluation of data-driven network intrusion detection methods are currently held back by a lack of adequate data, both in terms of benign and attack traffic. Existing datasets are mostly gathered in isolated lab environments containing virtual machines, to both offer more control over the computer interactions and prevent any malicious code from escaping. This procedure however leads to datasets that lack four core properties: heterogeneity, ground truth traffic labels, large data size, and contemporary content. Here, we present a novel data generation framework based on Docker containers that addresses these problems systematically. For this, we arrange suitable containers into relevant traffic communication scenarios and subscenarios, which are subject to appropriate input randomization as well as WAN emulation. By relying on process isolation through containerization, we can match traffic events with individual processes, and achieve scalability and modularity of individual traffic scenarios. We perform two experiments to assess the reproducability and traffic properties of our framework, and demonstrate the usefulness of our framework on a traffic classification example.