This article was originally written for and published in the February 2023 issue of CrossTalk, The Journal for Defense Software Engineering. Some formatting edits have been added to this post. Used with permission as the original author.
On the morning of September 1st, 2022, in the heart of Moscow, there was a massive traffic jam at an address on Kutuzovsky Prospekt, the 6-lane major avenue running across the city. Central Moscow was brought to a standstill as cab drivers from the largest taxi service, Yandex Taxi, were all sent to the same place at the same time.
The exact location? Hotel Ukraine.
No one knows it yet, but the traffic jam is caused by binary bullets cyber warfare on an entirely different scale. An attack on the ride dispatching application programming interface (API), led by the hacking group Anonymous and supported by the IT Army of Ukraine, has just demonstrated how to weaponize vulnerabilities in software to cause a physical denial of service.
While some may see this as a prank of the #oprussia campaign, we can learn from this incident by asking ourselves the following questions:
- Could we see future cyber warfare engagements cripple entire towns by causing large traffic jams just before an invasion or counter-offensive?
- Could threat actors use similar attack vectors against software to disrupt and deny access to critical functions and communications at the most inopportune times?
- Exactly where are those attack vectors in the software we build?
This article will explore some of the more interesting attack vectors adversaries can leverage in the real world to exploit the software that drives our digital world. From weaponizing software bill of materials (SBOM) metadata to abusing third-party dependency trust, we will look at areas of the software development lifecycle that can cause fragility and risk to data and infrastructure, especially as APIs are built.
These days, almost everything is driven by software. The digital transformation towards a more heterogeneous information environment (specifically in the volume, velocity, and variability with which information flows between systems) exposes us to far greater risk than ever before.
It is said that APIs now drive over 80% of all traffic traversing the Internet. Those APIs are used by the hardware and software we now rely on for everything from making coffee to launching munitions.
Software is malleable. And so are the processes used to design, build, and deploy it. An Engineer from the 309th Software Engineering Group once described this as being “distilled human intelligence.” In essence, through experience and learning, our knowledge of building more secure software matures. It comes at a cost; we learn from our mistakes.
And herein lies the dilemma. Application security is still in its infancy, especially when focusing on APIs. There is still a lot to be learned. We rely heavily on software working safely, securely, and steadily. That’s a big gamble when we build software systems that may outlive a person’s time on the project.
The lifespan of a typical weapon system tends to be decades. And it’s not uncommon for older weapon systems to outlive the software products and technologies which operate them. An adversary looking to slip a supply chain cyber munition into the software of a weapon system could lie dormant for some time, only to be abused later at the most inopportune of moments.
The Developer Mindset of Blind Trust
When we build software, we tend to trust the code and components (libraries, frameworks, etc.) that we include in our projects. We use these “ingredients” because they make our lives easier. They allow us to not reinvent the wheel each time and help us move toward achieving our goals faster.
The problem is that this mindset of blind trust can expose us to unnecessary risk.
In November 2018, it was discovered that a malicious update to the library had been pushed. It included a backdoor that would exfiltrate cryptocurrency wallets from applications that used it. This backdoor had been added by a developer who had taken control of the project after its original maintainer had left it inactive for over two years.
This is not an isolated incident. It happens more often than we realize.
This form of dependency trust, especially in open-source software components, offers several attack vectors for malicious code to be injected into the dependency tree of produced APIs.
How Adversaries Inject Malicious Code into Trusted Components
There are two key ways an adversary can inject malicious code into APIs. The first is by injecting new packages into software through confusion or coercion. The second is by infecting existing packages that we have already formed a dependency on.
Method #1: Injecting New Packages
The first way an adversary can get malicious code into APIs is by injecting new packages into a project’s dependencies. This can be done in a few ways:
- By creating a malicious package that imitates a real package but is named slightly differently to trick people into installing it, either by naming convention or typo
- By creating a useful package containing a trojan horse (hidden malicious code that can be executed later)
- By replacing a legitimate package with a malicious one through package management synchronization due to dependency confusion
In all of these cases, the end goal is the same: get the malicious code into projects so that it can be executed when the APIs are used.
This technique allows the attacker to distribute malware disguised as a legitimate package.
We’ve seen several typosquatting package impersonation attacks over the years, including:
- bzip impersonating bz2file
- crossenv impersonating cross-env
- crypt impersonating crypto
- django-server impersonating django-server-guardian-api
- electorn impersonating electron
- jquerry impersonating jquery
- python3-dateutil impersonating dateutil
- setup-tools impersonating setuptools
- telnet impersonating telnetsrvlib
- urllib and urlib3 impersonating urllib3
These are just a few examples that showcase how adversaries are using this attack vector. Cybercriminals have been using this as a way to distribute crypto-mining malware across popular applications for some time now.
The Trojan Horse
Sometimes, adversaries can leverage typosquatting to deliver a slightly modified package to provide value while also embedding a trojan horse. It dupes developers into using contaminated open-source libraries. Examples of this include jdb.js and db-json.js, which mimicked the legitimate NodeJS-based database libraries.
Unfortunately for the thousands of developers affected, it also installed the malicious njRAT (aka Bladabindi) remote access trojan (RAT) onto the developer systems, allowing adversaries access to the source code and build systems of the applications being developed.
This is a particularly insidious form of attack as it can be difficult for developers to spot, especially if they’re unfamiliar with security best practices. A trojan horse like this allows adversaries to maintain persistence on developer workstations and exfiltrate intellectual property or project data as it’s being developed.
Another attack vector leveraged through a trojan horse includes abusing global variables through tainted packages. A seemingly benign component can include ways to abuse global variables to manipulate other existing components APIs rely on.
Whatever you call it, if a malicious package has the ability to modify global variables and code execution from within another dependency of an API, then that’s Game Over.
Dependency confusion refers to the inability of a development environment to distinguish between a private, internally created package in the software build and a package by the same name available in a public repository. It sometimes goes by the name of package name squatting or namespace shadowing.
In 2021, a security researcher was able to infiltrate the software of over 35 major tech companies including Microsoft, Apple, Shopify, PayPal, and Tesla using dependency confusion. By uploading a malicious package of the same name but with a higher version number, build systems from these companies downloaded the hostile package and ran arbitrary code that collected internal data and then exfiltrated it back to the security researcher’s systems.
A vital aspect of this vulnerability is the ability to match the internal package name with a malicious one of the same name uploaded to the public package repository. Some developers argue that it is difficult for an adversary to determine internal naming without privileged access.
Yet, in 2022, a novel timing attack was discovered against the npm’s registry API that can be exploited to potentially disclose private packages used by organizations, putting developers at risk of further supply chain threats.
The point is clear. The benefits of build and package management that developers rely on for dependency control can easily be weaponized by adversaries to gain control of APIs.
Method #2: Infecting Existing Packages
According to the GitHub State of the Octoverse Report, open-source projects have an average of 180 package dependencies. Many of these packages are maintained and contributed to by a small handful of volunteers. Adversaries are noticing this.
“The pervasiveness has attracted the attention of attackers, who in recent years have increasingly turned their focus to the open-source software supply chain.”Microsoft Digital Defense Report, September 2020
There are three critical ways in which adversaries can infect existing packages our APIs depend on:
- They can inject malicious code directly into the source code
- They can inject malicious data into the build process
- They can inject malicious artifacts directly into the package repository system
Inject Into Source
There are two ways an adversary can inject malware into packages APIs have a dependency on.
The first is simply acting as a contributor and submitting pull requests (pr) for the target component. In the form of “pr sneaking,” an adversary may try to sneak in malicious code by encoding it in a way that human reviewers will see as harmless but that compilers see as invisible, hostile code.
This is commonly referred to as a trojan source attack.
An example of how this could be exploited comes from homoglyph attacks, where two different characters may look quite similar but represent entirely different things. Where a “zero” looks like the letter “O,” or a lowercase “L” looks like an uppercase “I.” We see this more amplified when Unicode is thrown into the mix. The Cyrillic letter “H” and the Latin letter “H” represent two entirely different characters. Yet, in source code, chances are it would be easy to miss with the naked eye.
You can see how an adversary could quite easily route code execution in a malicious way in the sample below:
Another way to sneak in malicious code changes is by taking advantage of Unicode to place invisible characters in – that is, characters in Unicode that render to the absence of a glyph – such as Zero Width Space (ZWSP) into string literals used in comparisons. Consider this example of Python rendered in a browser on GitHub:
Other than a shift in color, can you tell the difference? The use of the Unicode character
U+200B changes the code execution path if you are not careful. Now let’s look at that code in a text editor that displays Unicode character codes directly:
See the difference more clearly now? During a code review of a pull request, package maintainers could easily miss it since many tools used to review the code are not guaranteed to catch this and render out the Unicode characters. So, approving the pull request and committing it to source control could give the adversary precisely what they want.
This brings us to the second way to inject malware into the source: the package maintainer itself. They have the ultimate authority to commit code to the repo, be it through trojan source pull requests or direct code changes they make.
This was exactly how the event-stream incident played out. A bad actor tricked a maintainer into giving them contributor rights to the project and ultimately got rights on npm. After committing clean code to gain trust and more access, they eventually added a malicious dependency they controlled and slipped backdoor code into the repo.
Synk did a fascinating post-mortem of the incident, reminding us how fragile the open-source model can be if not respected and how easy it can be to inject malicious code into the dependencies we trust for APIs.
Inject Into Build
Sneaking in during the build process is another avenue attackers can use to get their code into production. We saw this during the Solarwinds breach. By taking advantage of a Continuous Integration/Continuous Delivery (CI/CD) pipeline, an attacker can add their malicious code as part of the build process and have it go unnoticed all the way to production.
In the case of SolarWinds, the attackers were able to insert themselves into the CI/CD process and signed their malicious code with a valid certificate that allowed it to bypass security controls. The result was devastating, with numerous high-profile organizations being compromised through this supply-chain attack.
As developers, we need to be aware of these types of attacks and take steps to ensure our build pipelines are secure. What makes this difficult is the fact that attackers can chain attacks together to further penetrate the process, like the dependency confusion attack that can allow an adversary to insert themselves directly onto the build servers.
Good cybersecurity and IT hygiene are paramount; while our APIs may have a relatively porous perimeter, our build systems cannot. Always lock down these resources.
Inject Into Package Repository System
In 2021, a popular package called ua-parser-js was embedded with malicious code intended to harvest user credential information and install a cryptocurrency miner. With over eight million weekly downloads and over 1,200 other packages depending on it, the package used to detect browser engine, OS, CPU, and device type/model from User-Agent data became an attack vector for any code that relied upon it.
How did this happen? The npm account of the developer responsible for the package was hijacked, and a bad actor uploaded the malicious package directly to the npm repository. Big tech companies like Amazon, Microsoft, Google, Facebook, Instagram, Mozilla, Elastic, Intuit, Slack, and Reddit all relied on this package and were exposed.
Sometimes, vulnerabilities in package management repositories themselves become an attack vector. In 2022, we saw CVE-2022-29176 published, in which any RubyGems.org user could remove and replace any package dependencies (gems) even if that user was not authorized to do so.
This doesn’t just happen in public package repositories. There have been examples of private JFrog Artifactory and Atlassian BitBucket repos having malicious artifacts uploaded to them that were then embedded into commercial COTS software releases.
What would happen if an adversary could upload a cyber munition as an artifact of a package used in critical weapon systems software that is also massively linked to other weapon systems to both protect and destroy? How could they be affected?
How Adversaries can Detect Vulnerabilities in Trusted Components
In May of 2021, President Biden issued Executive Order 14028 to improve the nation’s security. The order was designed to strengthen US government cybersecurity defenses in the wake of several significant hacks, including the SolarWinds, Kaseya, and Microsoft Exchange Server incidents, which impacted numerous federal agencies and private companies. The order’s importance was underscored by the ransomware attack on the Colonial Pipeline that occurred just a few weeks later.
A vital component of the order is the requirement for a software bill of materials (SBOM) that software vendors are now required to provide as part of the government’s procurement process. The SBOM is expected to detail the exact software components utilized in a given product, making it much easier and faster for federal agencies to determine whether they are subject to a vulnerability uncovered in one of these components.
SBOM is a critical piece of the software security puzzle. By understanding the dependencies and components used in our software, we can more easily identify potential vulnerabilities and attack vectors.
Tools providing dependency graph analysis can help us visualize these relationships and quickly highlight potential problems. This allows us to patch vulnerabilities before they can be exploited.
By using secure, vetted components and by leveraging SBOM data to understand our dependencies, we can make it much more difficult for adversaries to exploit our software supply chain.
But it does offer adversaries an opportunity to take advantage of a larger window of vulnerability that can be exposed through SBOM because of transitive dependency trust. Most SBOMs do not include the internal SBOM of third-party components that the vendor themselves rely on. And adversaries can use that to their advantage when wanting to attack our APIs.
How Adversaries Can Use SBOM To Find Vulnerabilities
An SBOM typically holds associated metadata and describes a set of software elements broken down into components, services, and dependencies. SBOM documents, like the Linux Foundation Software Package Data Exchange (SPDX) and OWASP’s CycloneDX SBOM, provide a simple machine-readable text format that can be easily consumed and parsed.
Adversaries can use that to discover known vulnerabilities in components APIs rely on. As an example, they can use the spdx-to-osv tool to produce an Open Source Vulnerability (OSV) JSON file based on information in an SPDX document. They can then query the Open Source Vulnerability database and return an enumeration of vulnerabilities present in the software’s declared components.
The Open Web Application Security Project (OWASP) has actually started researching the idea of mapping vulnerabilities to the SBOM directly using the Vulnerability Exploitability eXchange (VEX) and embedding that right in the CycloneDX metadata to convey the exploitability of vulnerable components in the context of the software in which they’re used.
So consider this. Imagine your API (called Component A) relies on a third-party dependency (called Component B).
Component A depends on Component B. Component B depends on another library, called Component C. Does Component A (your API) even know when there is an exploitable vulnerability in Component C? We saw this issue a lot during the log4j debacle because software developers took on dependencies of components and libraries without even considering the impact to their products from this third-level order of trust.
It gets worse. The internal team or external vendor of Component C might have the best developers maintaining their code. But even still, it may take the organization several days to create and test a patch. Once it’s released, the developers of Component B must create and test their own patch to replace the vulnerable component(s), which will take even more time. And it keeps going and going, for however deep the dependency trust goes.
Can you see the cascading effects here? Getting an exploitable vulnerability in a trusted component fixed in code can take days, weeks, or even months.
According to the 2022 Open Source Security and Risk Analysis Report (OSSRA), 85% of the codebases audited had open-source components that were more than four years old, and 88% had components that had no active development in the last two years. That means a majority of organizations are behind on keeping their third-party dependencies up to date.
And herein lies the advantage of SBOM for adversaries. If they track all the component dependencies, they may very well be able to leverage a vulnerability detected in a nested component to give them a much larger Window of Exposure for the target API. It means they can use things like OSV and VEX and spider your dependencies to exploit weaknesses in your APIs before a security patch is even applied. And before you are even aware of the security issue.
How Adversaries Use Reverse Engineering to Find Vulnerabilities
As adversaries inventory component dependencies of APIs with the likes of SBOM, it’s important to realize WHY that is done.
When a vulnerability is found in a third-party dependency and gets fixed, adversaries can reverse engineer the patch to detect changes between versions. This is even easier when the component is simple to decompile or is open-sourced, where they can look at recent commits and merges to see delta diffs.
It allows an attacker to find dangerous code that has been fixed, usually referred to as the sink, from the patch. They can then conduct taint analysis to map that back to sources of untrusted data input that they can control, more commonly called the source.
Mapping these sources and sinks together allows an adversary to trace precisely where to manipulate how the API functions to trigger the vulnerability found in the third-party dependency.
This isn’t easy. It’s a lot of hard work to trace data flows through the right code paths, which gets even more complicated these days as many frameworks abstract a lot of stuff, obfuscating where user-controlled data is traversing and modified.
Adversaries have to be motivated. Or at least, that used to be the case.
Recent research in dynamic taint analysis has started to expose more streamlined ways to accomplish this. In fact, it is now possible to use tools like CodeQL to map the sources and sinks of your dependencies through constructed queries. It can conduct a lot of the data flow analysis automatically. This approach can even go so far as to provide taint tracking of source input data as it’s transformed through your API and consumed by dangerous sinks.
It’s a remarkable time for both attackers and defenders. Being able to treat code like data with CodeQL has huge advantages. It enables the fast search for security vulnerabilities and bugs, modeled as queries that can be executed against databases extracted from code. As adversaries get more familiar with your APIs and their dependencies, it’s only a matter of time before they start indexing components and finding reliable vulnerabilities they can exploit before you even know about them.
It’s easy to see that efforts to build secure APIs may be nullified by vulnerabilities that may exist in the trusted third-party dependencies.
But there is good news. Everything covered in this article has defensive countermeasures that can be deployed:
- Follow supply chain risk management (SCRM);
- Pinning approved dependencies;
- Limiting where and how third-party components are used within the codebase;
- Rolling out a central governance body to review, audit, and approve the dependencies APIs rely on
Vigilance is required to ensure that they are regularly reviewing updates and changes to those dependencies. In fact, when possible, try to isolate yourself from third parties. Become the source of truth for all components included, and require all build systems to pull from internal repositories, never from uncontrolled public sources.
Don’t be consumers of third-party dependencies, but instead, become curators.
Harden and lock down build systems. Enforce strong authentication on critical accounts used for managing the build, CI/CD pipelines, and package management. Assume breach, and design the DevOps processes to be ephemeral, predictable, and trustworthy.
Above all else, follow former President Ronald Regan’s advice . . . “Trust but verify.”
Like what you are reading? Keep up with my work by joining the API Hackers Inner Circle newsletter. New content is published weekly!