Papers
arxiv:2404.04991

An Analysis of Malicious Packages in Open-Source Software in the Wild

Published on Apr 7, 2024
Authors:
,
,
,
,
,

Abstract

A large dataset and knowledge graph were developed for analyzing OSS malware, revealing issues such as code reuse, dependency hiding, and the importance of security reports for context.

AI-generated summary

The open-source software (OSS) ecosystem suffers from security threats caused by malware.However, OSS malware research has three limitations: a lack of high-quality datasets, a lack of malware diversity, and a lack of attack campaign contexts. In this paper, we first build the largest dataset of 24,356 malicious packages from online sources, then propose a knowledge graph to represent the OSS malware corpus and conduct malware analysis in the wild.Our main findings include (1) it is essential to collect malicious packages from various online sources because their data overlapping degrees are small;(2) despite the sheer volume of malicious packages, many reuse similar code, leading to a low diversity of malware;(3) only 28 malicious packages were repeatedly hidden via dependency libraries of 1,354 malicious packages, and dependency-hidden malware has a shorter active time;(4) security reports are the only reliable source for disclosing the malware-based context. Index Terms: Malicious Packages, Software Analysis

Community

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2404.04991 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2404.04991 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.