November 28, 2023

API Hacking Mindset, API Hacking Techniques

Using Chaos Engineering To Hack An API

Have you ever thought about using chaos engineering to hack an API? I sure have.

While chaos engineering is typically used for testing and improving software system resilience, it can also be harnessed for discovering vulnerabilities in APIs.

In this article, I want to explore how chaos engineering can be applied to hacking APIs. With any luck, you should walk away with new ideas for approaching your security testing differently.

And hey… we can all use a little more chaos in our lives. 🤣

What is Chaos Engineering?

Chaos Engineering involves intentionally injecting failures into a system to test its resilience and identify potential weaknesses. It was popularized by Netflix, who developed a tool called “Chaos Monkey” to randomly terminate instances in their production environment and see how the system would handle it.

The idea behind Chaos Engineering is that failures are inevitable, and it’s better to discover and address them proactively rather than reactively. By introducing controlled chaos, teams can gain insights into how their systems respond under stress and make improvements accordingly.

The reality is that Chaos Engineering is typically seen as operational tests against the infrastructure and architecture of a system. Developers and DevOps regularly focus on what happens when a component fails. But how often is it applied to how the application or API responds during a failure condition?

Let’s talk about that.

How Chaos Engineering Works in API Hacking

When hacking an API, or any software system for that matter, one of the key constructs you should follow is seeing how the system responds when you taint the data or starve it from getting what it expects.

In a previous article I wrote on attacking APIs by tainting data in weird places, I went into depth about tampering with data in everything from headers to request payloads. Chaos Engineering comes into play when you start considering how you go about actually doing that.

Consider this. How does an API you’re testing handle a disconnection for a dependent system, like the core datastore or caching system? When it fails, does it “fail closed”?

In software security, the term “fail closed” means that in the event of a malfunction, the system defaults to a secure or closed state, thus preventing any further actions that could compromise the system’s security.

But does it always work?

I once found an API that fetched its signing key used for authN and authZ from an external secrets management system. As I had a foothold on the internal system of the API server, I was able to poison the local DNS cache to change the IP of the server hosting the secrets to something that didn’t even exist. When the request to get the signing key timed out, it fell back to using a preconfigured signing key, which I knew and could see in the configuration.

The result? I was able to use the well-known static key to forge and manipulate my own tokens, which gave me complete administrative control of the API and all cross-tenant data.

Controlling Chaos

So when hacking an API, think about how you might inject some controlled chaos to get the API to work in ways unexpected. Because it may have negative implications on production systems, this should be done in a way that all stakeholders are aware and ready for the fault injection.

Chaos Engineering defines a process for this.

The process of Chaos Engineering when testing an API

Identify an area of the API you want to test for weakness. In the previous example, it might be the “unavailability of the secrets management server”.
Create a Chaos Experiment. Document it for all stakeholders to read and understand. This is a process that would be run during testing to introduce the previously identified weakness. Our example could be “simulating an unreachable secrets management server.”
Define metrics and triggers for failure conditions. These metrics are measures of system health, such as response time or error rates. The triggers determine when these measurements indicate a failure in the system. In our example, it might be how the API functions when it can’t get appropriate signing keys.
Run the experiment in a controlled environment that replicates production systems. This could involve setting up a test server or using a simulated API client.
Monitor the impact on the API and its components. Watch for any unexpected behavior or failures resulting from the chaos experiment.
Analyze and learn from the results. Assess how well the API and its components handled the chaos and identify any potential vulnerabilities or weak points.
Repeat the process with different experiments and scenarios to continuously improve the API’s resilience against unexpected events.

Using chaos engineering in API hacking allows for a more systematic and deliberate approach to finding vulnerabilities. It also helps practitioners better understand how their APIs behave under stress, giving them valuable insights for improving overall security.

Chaos Engineering isn’t a replacement for pentesting

While chaos engineering can be beneficial for finding vulnerabilities, it should not replace traditional security testing methods. It should be used as a complementary tool in the API security arsenal alongside techniques like fuzzing and traditional penetration testing.

Where to Apply Fault Injection to an API

The previous example I gave covered the idea of applying chaos engineering to external dependencies of the API. But there are other places to apply fault injection.

APIs typically work on data contracts. You know, those object models that allow data to move between endpoints and even external systems. So it’s imperative that you think about that from a fault injection standpoint.

A perfect example might be that of JSON injection. If the API expects a JSON object, what happens when you send it something unexpected like an array or string? Or a model that it isn’t expecting. Does the API handle it gracefully and return an error message, or does it crash and expose sensitive information?

How does the endpoint handle modified partial updates? If you know the object model includes properties that aren’t usually included in the JSON body, what happens when you do include it? This is precisely how Broken Object Property Level Authorization vulnerabilities come about.

What about changing business logic behavior by modifying application state data? If an API relies on client-side state data, can you inject a fault into that to grant you more access to API resources? I once found an API that depended on feature flags to limit access to features in the application, which were being stored in the browser’s local storage cache. While it was signed in a way to prevent tampering, the developers didn’t plan for the feature flag database to be missing. Simply deleting that data from the cache gave me full access to all API endpoints, even though I had no previous rights to it.

Where to learn more about Chaos Engineering

There is a lot on the Internet that you can find about Chaos Engineering. However, little of it is focused on security testing, and attacking APIs. However, here are a few resources I think you will find helpful:

Check out this awesome curated list of Chaos Engineering resources. It includes a ton of articles, books, whitepapers, and videos all about Chaos Engineering.
Here is an interesting article focused on the developer side of Building resilient APIs with chaos engineering. Think about how you can apply that approach from a security testing perspective.
This article on Validating the resilience of your API gateway with Chaos Engineering is an interesting take on the API Gateway side of things.
Check out this Postman template that helps to simulate external API outages, starves CPU resources of an API, and can even kill containers in a Kubernetes cluster.
Nordic APIs wrote an interesting blog post that reviews Gremlin and shows how to apply Chaos Engineering to APIs.

Conclusion

One of the things I love about the idea of introducing Chaos Engineering into API security testing is that it allows us to break things on purpose to see how resilient the code may be. While I’ve only given a few examples here, I hope it’s opened your mind’s eye to some of the possibilities.

It can expose a lot of dark debt that developers aren’t thinking about while minimizing the blast radius of any vulnerabilities you do find. As we are doing this in a controlled manner with all stakeholders aware, it allows us to find things before our adversaries do without critically impacting the consumers of the API.

Remember, chaos engineering is not just about breaking things; it’s about learning and improving. It’s a powerful tool that can help you uncover hidden vulnerabilities in your API and make it more robust and secure.

So, embrace the chaos and start hacking!

Dana Epp

Hey, I’m Dana, aka SilverStr. I build and break software for a living, and am a Microsoft Regional Director and Developer Security MVP. I’ve spent decades as a security architect that focuses on helping secure software, data, and infrastructure on both blue and red teams. As of late, I have been focusing more on my offensive tradecraft to help developers and IT administrators see the impact of exploitation on vulnerabilities in their work. This blog is my chance to give back to the community by sharing my experiences and war wounds from the trenches.