From Tsunami to Twitter: How Rigorous API Testing Can Prevent Critical System Outages During Disasters

If there is anything we can learn from the latest earthquake in Japan, it’s how important communications and alerts are during a disaster. And how critical API testing is in this day and age.

Let me explain.

After the 7.6 magnitude earthquake hit on New Year’s Day, it triggered huge waves prompting the Japan Meteorological Agency (JMA) to issue a major tsunami warning. Those along the country’s western coastline were urged to evacuate inland to higher ground. In fact, panicked-sounding Japanese news anchors were shouting at viewers to leave their televisions and start running immediately.

Not good.

However, despite the urgency of the situation, the popular NERV Disaster Prevention app said it was being prevented from posting emergency alerts or providing updates on its Twitter accounts. For over three hours, they could not share potentially lifesaving information about the ongoing disaster.

Why?

Rate limiting.

This is what NERV had to say:

Our accounts appear to have been rate-limited due to the frequent posting of information updates regarding the Ishikawa Earthquake and Tsunami.

So what can we learn from all this?

How about the fact that it’s important to test the upper boundaries of any rate limiting in play in any third-party APIs being consumed?

Let me explain why that’s important… and how that can help your security testing.

Rate limiting as a security control

Rate limiting is a common security control used to protect web apps & APIs from abuse or exploitation. By restricting the number of requests that can be made within a specific period of time, rate limiting prevents malicious actors from launching automated attacks and brute force attempts.

However, it’s important to remember that rate limiting can also adversely affect legitimate users and applications, as seen in the case of NERV’s disaster prevention app. If not adequately tested and configured, rate limiting can also result in disaster during a disaster – allowing malicious actors to slip through and cause denial of service attacks against critical apps and infrastructure at the most inopportune of times.

As such, it’s important to test the boundaries of rate limiting tied to API calls, especially when it’s coming from third parties you are taking a dependency on.

An offensive security perspective on rate limit testing

When looking at testing rate limits in an API, start by checking the docs. Had NERV looked, Twitter has always been clear about its rate limits in their developer docs. A critical app designed for alerting during a disaster needs to understand those limits and verify them beforehand.

How?

There are common headers you can look for to verify the rate limiting. A few of them include:

  • x-rate-limit-limit – Tells you the upper limit/ceiling for the given endpoint.
  • x-rate-limit-remaining – How many requests are left in the “rate limiting window.”
  • x-rate-limit-reset – The remaining time before the rate limit resets.

TIP: Sometimes, the header names don’t have a hyphen between the words “rate” and “limit.” ie: x-ratelimit-limit. Look closely to make sure you are identifying the right ones.

Once a rate limit is exceeded, the API should start returning 429 “Too many requests” HTTP response codes. When this occurs, there is a good chance there also exists a special header (or property in the body if returned in JSON format) named Retry-After that will show how long to wait until the next request.

Discovering the “Rate Limiting Window”

Hopefully, the API docs will explain precisely what the “rate limiting window” is. This may be per second, per minute, per hour, per day, etc… or maybe a fixed period of time or number of requests.

If you are unsure, you can write a script that hammers the endpoint with requests and incrementally backs off until it no longer triggers the rate limiting. Use the maximum value found in the Retry-After header to determine the average “rate limiting window.”

You have to be careful here though.

If you are too aggressive, you might find yourself blocked. This is why it’s important to consider using ephemeral IPs and possibly different accounts for each run. You might want to read my article on Bypassing API rate limiting using IP rotation in Burp Suite to get an idea, or just use a VPN service that lets you change servers regularly so you can iterate through different outbound IPs when you do get blocked.

Once you can determine the “rate limit window,” you can hopefully get the “rate limit ceiling” ( the maximum requests within a window allowed) from the x-rate-limit-limit header. If that’s not available, you can figure it out manually by incrementally increasing the number of requests within a “rate limit window” until you trigger a 429 “Too many requests” error code.

Why bother finding the “Rate Limit Window & Ceiling”?

It’s a reasonable question to ask why I recommend you do all this extra work. And it all comes down to knowing the boundaries of the API.

Developers are creatures of habit. Once you understand how they implement rate limiting, you can abuse that in other areas. Like brute forcing or password spraying the login endpoint. Or fuzz testing admin endpoints. Or parameter testing. The list goes on and on.

The point is that knowing the rate-limiting window and ceiling serves your methodology. And it’s even more critical when you detect third-party APIs that are being consumed. If you can abuse the external calls to the API, you can find the limits that can cause disruption and denial of service through external dependency trust.

This can get tricky. In many bug bounty programs and vulnerability disclosure programs, the scope and rules of engagement explicitly call out that no denial of service attacks are allowed. But this is different.

For example, look at Twitter’s BBP on HackerOne. It clearly calls out that “Issues that result in Denial of Service (DoS) to X’s servers at the network or application layer” are Ineligible Issues.

However, when testing the boundaries of rate limiting, it is causing potential DoS to the external app that is consuming the Twitter API, NOT to Twitter’s service itself.

It’s fair game.

Conclusion

Whether it’s functional testing or security testing, knowing the limits of an API is vital to understanding its potential resiliency against service disruption. Especially when consuming third-party APIs.

When an entire country relies on apps like NERV for early warnings about disasters, the last thing needed is loss of life because an alert can’t get out due to rate limiting.

It’s essential we stress-test APIs to find these limits.

Knowing how to identify and understand rate limiting can be a valuable tool for developers AND security researchers alike. By learning the “Rate Limit Window & Ceiling,” you can gain insights into API implementations, determine how to abuse those limits for potential vulnerabilities, and ultimately improve your overall methodology for testing APIs.

Remember, don’t just rely on the documentation provided by the API provider; take time to test and validate the rate limit parameters to get a more comprehensive understanding of the API’s actual behavior.

And do it BEFORE a disaster strikes. I’m sure NERV is learning that lesson big time.

One last thing…

API Hacker Inner Circle

Have you joined The API Hacker Inner Circle yet? It’s my FREE weekly newsletter where I share articles like this, along with pro tips, industry insights, and community news that I don’t tend to share publicly. Subscribe at https://apihacker.blog.

Dana Epp

Discover more from Dana Epp's Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading