At SafeDep we build and operate a large scale open source package monitoring and static code analysis infrastructure. The goal of this infrastructure is to continuously analyse open source packages published in package registries such as npm
, pypi
, rubygems
, Go Proxy
etc. and find malicious packages. This service in turn powers tools like vet that provides developer and CI/CD guardrails against malicious open source packages. To learn more about this service, refer to Malicious Package Analysis docs.
The static code analysis workflow currently consists of
While we are continuously extending this system by improving our code analysis based tools and eliminating known false positives, we believe static analysis will always have false positives and negatives. While the inherent benefit of static code analysis approach is to look at code — the source of truth, it is restricted by Halting Problem and in our use-case, by the Rice’s Theorem. For us, the security research & engineering problem is to make the right trade-off that makes the system effective in real life with the need for human intervention (manual analysis) reducing over time.
We started exploring the idea of building a complementary system that can verify and correlate static analysis findings. Thats where dynamic analysis comes in ie. the ability to run an open source package in an observed environment and determine its safety status based on real behavior at runtime. For us, the design goals of this system were:
We ended up building a solution by leveraging OSS tools with our custom platform tooling that can
npm
)As a first step, our goal was to build a system that runs in parallel to our existing static code analysis infrastructure and perform the following
At this stage, we felt our technical requirements are similar to what is being solved by Google’s (or OpenSSF is it?) Package Analysis project. We did take inspiration from the system but unfortunately decided to build our own because
strace
but to hook into kernel’s system call interfacesleep
or fake rdtsc
)We however took inspiration on some of the solutions in this project especially on running open source packages as implemented in dynamicanalysis package.
We had to solve the following problems before we can start working on an implementation
We decided to start with simple installation commands ie. execute npm install ..
or pip install ..
However, future work involves attempting to load package files and execute exported functions on a best effort basis. Several challenges exist here:
An example malicious package that cannot be analyzed by simply running npm install ...
is described at Malicious NPM Package Express Cookie Parser.
One of our design goals is to have the dynamic analysis system deployed initially in parallel to the existing static analysis system. This is to enable us to perform the R&D required to manually observe the runtime behaviour of known malicious packages and establish a baseline for non-malicious npm
, python
and other packages at install time.
To ensure simplicity of design and loose coupling, we introduced
The purpose of sandbox is to limit the blast radius because we are running untrusted and “expected” to be malicious packages with unknown payloads. While only air gap can provide true sandbox in an era when VMMs are being exploited. We made practical choices in terms of “reasonable” isolation, avoid unnecessary complexity while keeping options open for additional layers of protection for guarding against sophisticated attacks such as container escape, kernel exploits etc.
We considered the following threats as part of the sandbox design because we are executing untrusted and malicious code in our infrastructure
We decided to go for Docker container based sandbox with DIND. The choice primarily stems from
While the sandbox implementation choice was primarily for engineering simplicity and scalability, we did consider the threats as part of the infrastructure.
Threat | Mitigation |
---|---|
Execute malicious code to spawn a reverse shell to gain interactive access in the sandbox | No mitigation. We want this to happen. Containers are ephemeral with hard deadline of execution to avoid long term persistence. Dedicated node pool and planned VPC isolation protects the rest of the infrastructure |
Exploit vulnerabilities for privilege escalation and container escape | DIND provides additional layer of isolation using Linux kernel namespaces. Two container escapes will be required to break out of the sandbox into the underlying Kubernetes node. We plan to implement gVisor in future enhancement along with VPC isolation. |
Exploit vulnerabilities in operating system kernel | Use dedicated Kubernetes node pool with taints and tolerations to guarantee that ONLY the executor pods are running in the node pool. No secret mounted into these pods. Leveraging Kubelet token for lateral movement across the cluster will be an interesting attack to observe |
Exploit network services within the locally accessible network | Kubernetes Network Policy to prevent egress to local network CIDR from Pod. Google Cloud firewall rules at VM level to allow access only to Kubernetes API Server and container registry (read-only) required by Kubelet running in the node to pull the images |
Exploit vulnerabilities in virtual machine monitors (VMM) to break out of virtualization | No mitigation. Our entire infra will be pwned with VMM escape vulnerabilities. The low level details are available in the paper Large Scale Cluster Management with Borg. |
The purpose of runtime monitoring solution is to observe package execution at system call level, generate events using a standard schema, use rules to classify “interesting” events and store the events in an event log for manual or automated analysis. The runtime monitoring solution is completely decoupled from the rest of the system and has the single responsibility of performing systems monitoring. However, we did consider following goals
strace
)We decided to go ahead with Falco as our observability technology. The choice was between
Going ahead with Falco was pretty much obvious for us because we are rolling out an R&D focussed experimental service. It doesn’t makes sense to invest in heavy lifting required for building and productionizing an eBPF based solution till we reach the limitations of the currently available options. In any case, Falco is a CNCF graduated project which indicates the maturity of the solution and meets all our current requirements especially the ability to write custom rules to match various system call parameters.
emptyDir
volume for Falco gRPC socketLets start with quick stats and challenges
ip != X
based filtering because X
is resolved at the time of rule parsing*n2d-standard-4*
is used as the Node type for dynamic analysis, consisting of 4 CPU and 16 GB RAM per node100m
(10% of 1 CPU) CPU and 256Mb memory limit each100m
CPU and 256Mb memory limit500m
CPU and 512Mb memory limitThe executor, consisting of NATS listener and DIND container shows low CPU usage indicating potential over provisioning of resources.
The Falco and Event Handler pods (DaemonSet) on the other hand shows CPU spikes. This is almost certainly due to Falco containers that has higher CPU load due to rule matching on system calls.
Executor (NATS Listener and DIND) memory usage is pretty stable with occasional spikes. It will be curious to identify which packages caused these spikes. At this point, we are guessing that they are caused by packages that required platform specific compilation during installation.
Falco containers however shows sign of memory leak. This may be related to a known issue. Our Go based Event Handler may also contribute due to high throughput of event processing and the way Go GC works. Deeper analysis is required to identify the root cause of this memory usage patterns.
The technology stack that we built and deployed for runtime analysis is tuned for scale. As per the initial design goal, we have the necessary infrastructure in place to start correlating runtime behavior with our static analysis results. We now can independently perform research on OSS package install time behavior at scale. However, we believe the core value to our users and the large community lies in
There are multiple challenges to solve when it comes to classifying runtime behavior that we have observed so far such as
esbuild
node-gyp
make
and other build tools required to build platform specific binaries by Python, Node or other ecosystem packagesFor us the next steps are to identify heuristics and create an approach for base-lining ecosystem specific package installation behavior to identify anomalies. Future work also involves
Join thousands of developers and organizations who trust SafeDep to protect their software supply chain.