恶意NPM软件包的实际自动检测

论文标题

恶意NPM软件包的实际自动检测

Practical Automated Detection of Malicious npm Packages

论文作者

Sejfia, Adriana, Schäfer, Max

论文摘要

NPM注册表是JavaScript和Typescript生态系统的支柱之一，托管超过170万个包装，从简单的实用程序库，复杂的框架和整个应用程序。由于NPM的极大流行，它已成为恶意演员的主要目标，他们发布了新的包裹或折衷现有软件包，以引入恶意软件，这些恶意软件使用或剥夺了安装这些软件包或任何（传统上）的任何软件包的用户的敏感数据。防御此类攻击对于维持软件供应链的完整性至关重要，但是包装更新的数量使全面的手动审查变得不可行。我们提出了Amalfi，这是一种基于机器学习的方法，用于自动检测由三种互补技术组成的潜在恶意软件包。我们从经过培训的已知示例和良性套件的分类器开始。如果分类器将软件包标记为恶意，则将其检查是否包含有关其源存储库的元数据，如果是的，是否可以从其源代码中复制该软件包。可从源可重现的包裹通常不是恶意的，因此此步骤使我们可以淘汰误报。最后，我们还采用了一种简单的文本克隆检测技术来识别分类器可能遗漏的恶意软件包的副本，从而减少了错误的负面因素的数量。 Amalfi对最轻巧的技术的状况有所改善，每个包装只需要几秒钟即可提取功能并运行分类器，并在练习中给出良好的结果：在一个星期内发布的96287个包装版本上运行它，我们能够识别95个以前未知的恶意软件样本，并具有可管理的虚假误点。

The npm registry is one of the pillars of the JavaScript and TypeScript ecosystems, hosting over 1.7 million packages ranging from simple utility libraries to complex frameworks and entire applications. Due to the overwhelming popularity of npm, it has become a prime target for malicious actors, who publish new packages or compromise existing packages to introduce malware that tampers with or exfiltrates sensitive data from users who install either these packages or any package that (transitively) depends on them. Defending against such attacks is essential to maintaining the integrity of the software supply chain, but the sheer volume of package updates makes comprehensive manual review infeasible. We present Amalfi, a machine-learning based approach for automatically detecting potentially malicious packages comprised of three complementary techniques. We start with classifiers trained on known examples of malicious and benign packages. If a package is flagged as malicious by a classifier, we then check whether it includes metadata about its source repository, and if so whether the package can be reproduced from its source code. Packages that are reproducible from source are not usually malicious, so this step allows us to weed out false positives. Finally, we also employ a simple textual clone-detection technique to identify copies of malicious packages that may have been missed by the classifiers, reducing the number of false negatives. Amalfi improves on the state of the art in that it is lightweight, requiring only a few seconds per package to extract features and run the classifiers, and gives good results in practice: running it on 96287 package versions published over the course of one week, we were able to identify 95 previously unknown malware samples, with a manageable number of false positives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题