PypiGuard: A novel meta-learning approach for enhanced malicious package detection in PyPI through static-dynamic feature fusion

CCFC

Key Points

  • 静态分析和动态分析结合

Problems

作者提出的问题:

RQ1:如何将静态元数据与动态 API 调用行为相结合,以提高开源存储库中恶意软件包检测的准确性和可靠性?

RQ2:混合集成元学习框架与传统机器学习和深度学习方法在检测恶意软件包方面相比如何?

Notes

References

通过识别与流行包非常相似的名字来识别恶意包

Neupane S, Holmes G, Wyss E, Davidson D, De Carli L. Beyond typosquatting:an in-depth look at package confusion. In: Proceedings of the 32nd USENIX conference on security symposium. USA: USENIX Association; 2023.

一石二鸟,静态分析:研究了软件包名称、作者详细信息以及依赖结构等元数据信息

Zhang J, Huang K, Huang Y, Chen B, Wang R, Wang C, et al. Killing two birds with one stone: Malicious package detection in NPM and PyPI using a single model of malicious behavior sequence. ACM Trans Softw Eng Methodol 2024.http://dx.doi.org/10.1145/3705304.

  • 一些研究探究了基于元数据的检测方式

使用机器学习与静态元数据结合

Halder S, Bewong M, Mahboubi A, Jiang Y, Islam MR, Islam MZ, et al.Malicious package detection using metadata information. New York, NY, USA:Association for Computing Machinery; 2024, p. 1779–89. http://dx.doi.org/10.1145/3589334.3645543.

深度学习在识别Android恶意软件

Manzil R, Haidros H, Naik S M. DeepMetaDroid: Real-time android malware detection using deep learning and metadata features. Cloud Comput Data Sci 2024;203–25. http://dx.doi.org/10.37256/ccds.5220244503.

整合机器学习进一步改进了静态分析

Charoenwet W, Thongtanunam P, Pham V-T, Treude C. An empirical study of static analysis tools for secure code review. In: Proceedings of the 33rd ACM SIGSOFT international symposium on software testing and analysis. ISSTA 2024, New York, NY, USA: Association for Computing Machinery; 2024, p. 691–703.http://dx.doi.org/10.1145/3650212.3680313.


  • 动态分析

OSCAR

Zheng X, Wei C, Wang S, Zhao Y, Gao P, Zhang Y, et al. Towards robust detection of open source software supply chain poisoning attacks in industry environments. In: Proceedings of the 39th IEEE/ACM international conference on automated software engineering. New York, NY, USA: Association for Computing Machinery;2024, p. 1990–2001. http://dx.doi.org/10.1145/3691620.3695262.

DONAPI

Huang C, Wang N, Wang Z, Siqi, Li L, Chen J, et al. DONAPI: Malicious NPM packages detector using behavior sequence knowledge mapping. 2024, arXiv:2403.08334.

使用TF-IDF和滑动窗口的高级预处理技术在为深度学习模型准备API调用序列方面显示出有效性

Kim M, Kim H. A dynamic analysis data preprocessing technique for malicious code detection with TF-IDF and sliding windows. Electronics 2024;13(5). http://dx.doi.org/10.3390/electronics13050963, [Online].Available: https://www.mdpi.com/2079-9292/13/5/963.

将特征集扩展到API调用之外

Ilić S, Gnjatović M, Tot I, Jovanović B, Maček N, Gavrilović Božović M.Going beyond API calls in dynamic malware analysis: A novel dataset. Elec-tronics 2024;13(17). http://dx.doi.org/10.3390/electronics13173553, [Online].Available: https://www.mdpi.com/2079-9292/13/17/3553.

CTIMD,展示了监控带参数的API调用序列的有效性

Chen T, Zeng H, Lv M, Zhu T. CTIMD: Cyber threat intelligence enhanced malware detection using API call sequences with parameters. ComputSecur 2024;136:103518. http://dx.doi.org/10.1016/j.cose.2023.103518,[Online].Available:https://www.sciencedirect.com/science/article/pii/S0167404823004285.


混合元学习框架在检测有限训练数据下的恶意软件方面已被证明是有效的,因此非常适合集成静态和动态检测技术

Tapu SU, Shopnil SAA, Tamanna RB, Dewan MAA, Alam MGR. Malicious data classification in packet data network through hybrid meta deep learning. IEEE Access 2023;11:140609–25. http://dx.doi.org/10.1109/ACCESS.2023.3341911.

  • 跨语言

跨语言检测,js 和 Python

Ladisa P, Ponta SE, Ronzoni N, Martinez M, Barais O. On the feasibility of cross-language detection of malicious packages in npm and PyPI. In: Proceedings of the 39th annual computer security applications conference. New York, NY, USA: Association for Computing Machinery; 2023, p. 71–82. http://dx.doi.org/ 10.1145/3627106.3627138.

跨语言检测,Cerebro 模型” 应用微调的 BERT 模型,使用统一的行為序列来检测恶意软件包

Zhang J, Huang K, Huang Y, Chen B, Wang R, Wang C, et al. Killing two birds with one stone: Malicious package detection in NPM and PyPI using a single model of malicious behavior sequence. ACM Trans Softw Eng Methodol 2024. http://dx.doi.org/10.1145/3705304.

跨语言的可行性

Ohm M, Boes F, Bungartz C, Meier M. On the feasibility of supervised machine learning for the detection of malicious software packages. In: Proceedings of the 17th international conference on availability, reliability and security. New York, NY, USA: Association for Computing Machinery; 2022, http://dx.doi.org/ 10.1145/3538969.3544415.


  • 结合元数据和代码分析的混合框架

Amalfi,机器学习模型,将分类器和元数据验证相结合

Sejfia A, Schäfer M. Practical automated detection of malicious npm packages. New York, NY, USA: Association for Computing Machinery; 2022, p. 1681–92. http://dx.doi.org/10.1145/3510003.3510104.

Ea4mp,1+1>2, 融合深度代码行为分析与调用图

Sun X, Gao X, Cao S, Bo L, Wu X, Huang K. 1+1>2: Integrating deep code behaviors with metadata features for malicious PyPI package detection. In: Proceedings of the 39th IEEE/ACM international conference on automated software engineering. New York, NY, USA: Association for Computing Machinery; 2024, p. 1159–70. http://dx.doi.org/10.1145/3691620.3695493.

MalHyStack,采用堆叠集成方法,将决策树和深度学习模型结合

Roy KS, Ahmed T, Udas PB, Karim ME, Majumdar S. MalHyStack: A hybrid stacked ensemble learning framework with feature engineering schemes for obfuscated malware analysis. Intell Syst Appl 2023;20:200283. http://dx.doi.org/10.1016/j.iswa.2023.200283, [Online]. Available: https://www.sciencedirect. com/science/article/pii/S2667305323001084.


  • 基于 BERT 的框架也已成为跨生态系统恶意软件检测的强大工具

准确率很高,但面临计算开销的挑战

Ladisa P, Sahin M, Ponta SE, Rosa M, Martinez M, Barais O. The hitchhiker’s guide to malicious third-party dependencies. In: Proceedings of the 2023 workshop on software supply chain offensive research and ecosystem defenses. New York, NY, USA: Association for Computing Machinery; 2023, p. 65–74. http://dx.doi.org/10.1145/3605770.3625212.

使用语义代码分析的基于 LLM 的检测框架在识别复杂攻击模式方面表现出高精度

Zahan N, Burckhardt P, Lysenko M, Aboukhadijeh F, Williams L. Shifting the lens: Detecting malicious npm packages using large language models. 2024, arXiv:2403.12196.

其实就是Leveraging Large Language Models to Detect npm Malicious Packages

基于深度学习的跨平台方法在检测混淆恶意软件方面取得了更高的准确率

Bhavya RA, Bindhu Shree GV, Chandan Gowda N, Sanjana S, ShwethaShree KV. ML-based cross-platform malware detection. In: 2024 international conference on knowledge engineering and communication systems, vol. 1. 2024, p. 1–6. http://dx.doi.org/10.1109/ICKECS61492.2024.10616557.


  • 数据集

MalwareBench,将静态元数据特征与动态 API 行为结合

Zahan N, Burckhardt P, Lysenko M, Aboukhadijeh F, Williams L. MalwareBench: Malware samples are not enough. In: Proceedings of the 21st international conference on mining software repositories. New York, NY, USA: Association for Computing Machinery; 2024, p. 728–32. http://dx.doi.org/10.1145/3643991. 3644883.

PyRadar 引入了一个数据集,解决了 PyPi 中普遍存在的元数据不准确的问题

Gao K, Xu W, Yang W, Zhou M. Pyradar: Towards automatically retrieving and validating source code repository information for PyPI packages. Proc ACM Softw Eng 2024. http://dx.doi.org/10.1145/3660822.

BadSnakes

Vu D-L, Newman Z, Meyers JS. Bad snakes: Understanding and improving python package index malware scanning. In: 2023 IEEE/ACM 45th international conference on software engineering. 2023, p. 499–511. http://dx.doi.org/10. 1109/ICSE48619.2023.00052.

OSS 恶意软件包野外分析

Zhou X, Zhang Y, Niu W, Liu J, Wang H, Li Q. OSS malicious package analysis in the wild. 2024, arXiv:2404.04991.

MalPacDetector: An LLM-Based Malicious NPM Package Detector

CCFA

KeyPoints

  • NPM
  • LLM

Methods

作者采用了大语言模型来生成特征

问题

作者为什么要使用大模型?

  • 现有机器学习类检测器完全依赖专家手动定义特征,存在耗时、主观、难以应对混淆技术的问题;程序分析类检测器则因规则通用性强导致高误报率,且需手动更新规则。——LLMs 具备强大的代码语义理解与行为总结能力,可自动识别恶意代码片段并生成特征,无需专家介入
  • 恶意行为不断进化(如新型混淆、隐蔽后门),传统手动定义的特征无法实时适配——LLMs 可通过迭代分析(如自适应提示优化)持续学习新恶意行为,当新行为出现频率超过阈值时,自动更新特征集,确保检测时效性
  • 填补 LLM 在代码安全领域的应用空白

MainWork

  • 作者提出了MalPacDetector检测器
  • 作者提出了一组所需的基准数据集特征来评估恶意NPM包检测器。这些特征指导我们收集一个新的基准数据集,称为MalnpmDB。
  • 用MalnpmDB来评估MalPacDetector和现有的公开可用的检测器的有效性

Baseline

  • OSS Detect Backdoor
  • MalWuKong
  • Ohm et al.
    • M. Ohm, F. Boes, C. Bungartz, and M. Meier, “On the feasibility of supervised machine learning for the detection of malicious software packages,” in Proc. 17th Int. Conf. Availability, Rel. Secur., Aug. 2022, pp. 1–10.
  • Amalfi

Limitation

  • 在代码混淆方面,抽象语法树(AST)能够应对基本的代码压缩,但对复杂的代码压缩无效。
  • 采用的是静态检测技术,处理动态行为方面的能力有限
    • 一个包可能包含多种恶意特征,但是并未实施恶意行为,模型对此类包的误报率会较高。
  • 作者将本模型的方案用于Python包时,其结果不如NPM包的表现,作者推断是Python中可用的恶意样本数量有限且多样性较低
  • 特征生成依赖于大模型性能,对于新型恶意包,大语言模型可能无法提取恶意行为片段,导致特征失效。

Notes

  • 假阳性和假阴性:假阳性:误报;假阴性:漏报。类比医学,假阳性是将无病报成有病,假阴性是把有病报成无病。
  • 现有对恶意NPM包检测采用两种方法:程序分析和机器学习
    • 程序分析主要采用模式匹配、克隆检测和动态分析
      • 缺陷:这种方法经常使用规则来检测恶意包并导致高误报,因为这些规则是相对通用的。 此外,有些工具是重量级的或不实用的。
    • 机器学习方法使用专家定义的特征训练检测器
      • 缺陷:高质量的特征通常定义起来很乏味,很难定义(例如,当攻击利用混淆技术时),并且定义起来很主观(例如,不同的专家可能定义不同的特征)。 这些问题解释了为什么现有的检测器是无效的(例如,产生高假阳性和/或高假阴性[8],[12])和难以使用(即,重量级或手动更新规则[7]),并且有有限的用例

References

相关工作:

  • 恶意NPM包检测
    • 基于程序分析的检测器D.-L. Vu, F. Massacci, I. Pashchenko, H. Plate, and A. Sabetta, “LastPyMile: Identifying the discrepancy between sources and packages,” in Proc. 29th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., Athens, Greece, Aug. 2021, pp. 780–792.
  • 恶意NPM包数据集
  • 大语言模型在代码安全中的应用