npm
and 35% (1,965) were from PyPI
among the malicious packages.exfiltration via Burp Collaborator
and pre-install command execution in NPM scripts
.beautifulsoup‑numpy
, djangoo
).The Top 10 YARA rules that matched > 75% of the malicious packages are shown below. The rules were curated from the YARA Forge project. This is an indicator of most common TTPs used by malicious actors.
The burp_collab
rule match is an indicator of attackers leveraging Burp Collaborator for exfiltration of data. In our past research, we identified multiple malicious packages that use Burp Collaborator for exfiltration of data.
The following table shows the MITRE ATT&CK mapping for the top behaviors.
Behavior | MITRE ATT&CK ID | Tactic |
---|---|---|
Burp Collaborator usage | T1041 – Exfiltration Over C2 Channel | Exfiltration |
npm preinstall arbitrary exec | T1059 – Command and Script Interpreter | Execution |
System info harvesting | T1082 – System Information Discovery | Discovery |
External IP discovery (ipify.org ) | T1016.001 – Internet Connection Discovery | Discovery |
Runtime phone home / beacon | T1071.001 – Web Protocols (C2) | Command and Control |
Hardcoded host for exfiltration | T1567.002 – Exfiltration over Web Service | Exfiltration |
setuptools custom command exec | T1059.006 – Command and Script Interpreter: Python | Execution |
Hard‑coded IP callbacks | T1071.001 – Web Protocols (C2) | Command and Control |
System info + upload | T1567.002 – Exfiltration over Web Service | Exfiltration |
Sensitive file access | T1552.001 – Credentials in Files | Credential Access |
The distribution of file extensions that contributed to evidence (signals) based on which a package was classified as malicious are shown below.
The JSON
files are classified due to extensive use of npm
install hooks in package.json
files.
Interestingly, the size of the malicious packages is very small with 90% of them being less than 10KB
.
Typosquatting is a technique used by attackers to trick users into installing malicious packages. In this technique, attackers create packages that are similar to popular packages but with a few characters changed. Below are the most common typosquatting attempts on popular libraries:
Typosquat Name | Target Package Name | Registry | Count |
---|---|---|---|
expresss | express | npm | 7 |
reqests | requests | PyPI | 5 |
djangoo | django | PyPI | 4 |
lodashs | lodash | npm | 4 |
reactjs | react | npm | 3 |
beautifulsoup‑numpy | beautifulsoup4 + numpy (combo bait) | PyPI | 3 |
pandas3 | pandas | PyPI | 3 |
flaskk | flask | PyPI | 2 |
asyncioo | asyncio | PyPI | 2 |
webpackjs | webpack | npm | 2 |
Common patterns observed:
expresss
, reqests
, djangoo
, flaskk
) remain the easiest way attackers catch fat‑finger installs.pandas3
) and combo bait (beautifulsoup‑numpy
) try to appear “new” or “feature‑rich.”Dependency Confusion Attacks were another most common technique observed in the dataset. Following are some of the examples based on unusually high version numbers:
Package Name | Version |
---|---|
32red-admin | 999.9.9 |
32red-analytics | 999.9.9 |
32red-api | 999.9.9 |
32red-api-client | 999.9.9 |
32red-auth | 999.9.9 |
While the examples are insufficient, the common observation is red teamers often use high version numbers in dependency confusion attempts to gain access to the target organization.
We at SafeDep build and maintain a code analysis engine optimized for scanning open source packages for malicious code. This engine uses a hybrid approach consisting of:
network:connect
, fs:write
, process:exec
etc.)We use this code analysis engine to continuously scan all open source packages published to supported registries such as npm
, PyPI
, RubyGems
etc. Data from this analysis engine is used by vet, our free and open source supply chain security tool to protect users against malicious OSS packages in near realtime.
The LLM usage in the analysis workflow makes the overall system probabilistic due to the inherent nature of LLMs. To be able to maintain and improve quality of analysis, we needed an evaluation dataset. DataDog’s Malicious Packages Dataset was a right fit for our needs.
The analysis was performed using the SafeDep Package Analysis engine with necessary customizations to support the analysis of malicious packages from zip
files instead of analysing artifacts directly from package registries. The reason being, many of the malicious packages in the dataset are already removed from the package registries due to being malicious.
The following customizations were made:
infected
password from input DataDog’s malicious packages dataset into local file systemWhile the scanning engine that we used is not open source at this time, anyone can use it with vet, our free and open source supply chain security tool. Developers can build custom tools using the Package Analysis API.
While the goal of this analysis was to evaluate and create a benchmark for our code analysis engine, the results are useful for the community to understand the nature of malicious open source packages and how they are distributed in the wild. In fact, this analysis helps us fine tune our analysis engine to be more accurate and reduce false positives. Interested readers can use the data for their own analysis and research.
Join thousands of developers and organizations who trust SafeDep to protect their software supply chain.