December 6, 2022

How to extract artifacts from OpenAPI docs to help attack APIs

Don’t you just love OpenAPI docs? They have so much useful information in them. When written well, you can really get a good understanding of how an API endpoint functions.

Heck, you don’t even have to have well-written API docs. You can always generate your own rogue API docs if they don’t exist.

The thing is, those API docs get cumbersome quickly. There is so much information in them; you can wade through them for hours, trying to extract the information you need as you go about your API security testing.

In this article, I’m gonna show you some clever ways to extract the most vital information you need when hacking APIs.

Let’s go!

Your best friend… JQ

So these API docs are nothing more than complex JSON formatted files that follow the OpenAPI specification (OAS). We can use the popular jq tool to extract the information we want.

jq is a lightweight and flexible command-line JSON processor. It’s like sed for JSON data – you can slice, filter, map, and transform structured data with the same ease that sed, awk, grep, and friends let you play with text.

In our case, we will use it to extract the more important bits we want to use when API hacking.

So you will need to make sure you have installed jq on your system.

Standalone install

jq is written in C and has no runtime dependencies, so it should be possible to build it for nearly any platform. Prebuilt binaries are available for Linux, OS X, and Windows. You can find the download page here.

Linux install

Check your package manager. Something like this might work:

$ sudo apt install jq

MacOS install

Use brew:

$ brew install jq

Securing our target OpenAPI document

Now that you have jq installed, you will need an OpenAPI document that we can work with for the article. I am going to use the OpenAPI document from OWASP crAPI. You can download it here.

Feel free to use any swagger/OpenAPI doc you want. For this article, I am going to be parsing out an OpenAPI 3.0 document.

Extracting the API endpoint routes

One of the first things we want to do after securing the API doc is to extract the endpoint routes. I like to always produce a custom wordlist so I can understand the form and function of the API layout. You can learn a lot from the behavior of developers by understanding how they construct the paths to their endpoints.

In fact, at times it’s possible to identify the development framework being used by understanding the paths. However, that is an article for another day.

So, to extract the paths to be used in a wordlist, try something like this:

$ jq -r '.paths | keys []' openapi-spec.json > wordlist.txt

What you will find in the wordlist file is a clean list of the API routes for crAPI:

$ cat wordlist.txt
/community/api/v2/community/posts
/community/api/v2/community/posts/recent
/community/api/v2/community/posts/{postId}
/community/api/v2/community/posts/{postId}/comment
/community/api/v2/coupon/new-coupon
/community/api/v2/coupon/validate-coupon
/identity/api/auth/forget-password
/identity/api/auth/login
/identity/api/auth/signup
/identity/api/auth/v2.7/user/login-with-token
/identity/api/auth/v2/check-otp
/identity/api/auth/v3/check-otp
/identity/api/auth/v4.0/user/login-with-token
/identity/api/v2/admin/videos/{video_id}
/identity/api/v2/user/change-email
/identity/api/v2/user/dashboard
/identity/api/v2/user/pictures
/identity/api/v2/user/reset-password
/identity/api/v2/user/verify-email-token
/identity/api/v2/user/videos
/identity/api/v2/user/videos/convert_video
/identity/api/v2/user/videos/{video_id}
/identity/api/v2/vehicle/add_vehicle
/identity/api/v2/vehicle/resend_email
/identity/api/v2/vehicle/vehicles
/identity/api/v2/vehicle/{vehicleId}/location
/workshop/api/mechanic/
/workshop/api/mechanic/mechanic_report
/workshop/api/mechanic/receive_report
/workshop/api/mechanic/service_requests
/workshop/api/mechanic/signup
/workshop/api/merchant/contact_mechanic
/workshop/api/shop/apply_coupon
/workshop/api/shop/orders
/workshop/api/shop/orders/all
/workshop/api/shop/orders/return_order
/workshop/api/shop/orders/{order_id}
/workshop/api/shop/products
/workshop/api/shop/return_qr_code

You can immediately notice a few things.

The developers for crAPI use a format of {MODULE}/api/
Some modules include versioning, while others do not.
The modules are community, identity, and workshop.

This is all useful. We could use this custom wordlist and jam it into kiterunner to see if there are any other interesting endpoints. But we will cover that sort of thing another day.

For now, you have a clean wordlist representing all the key API routes to get started and an understanding of the different modules of the API.

Side Note: If you haven’t worked with OWASP crAPI before, here’s an interesting tidbit. The web application modules are all written in different programming languages. The community module is written in Go. The identity module is written in Java. And the workshop module is written in Python.

This demonstrates that while the API document shows it as a heterogenous API, the fact is that the app has several different tech stacks to attack. Being able to identify this during your recon phase will allow you to better craft your payloads by understanding the programming languages being used. If the concept of fingerprinting the target programming language is new to you, I suggest you check out my article on How to Detect the Programming Language of an API.

Extracting HTTP Status Codes Used

One of the more interesting things we can understand when looking through an OpenAPI document is how the developers use HTTP status codes. It’s quite common to see patterns in how they use them to represent authentication and authorization errors, success statuses, and whatnot.

This query is a bit more complex. I will walk you through it.

$ jq -r '[.paths[][].responses? | keys? | .[]] | group_by(.) | map({code:.[0],count:length}) | sort_by(-.count) | map(.code + "\t" + (.count | tostring))[]' openapi-spec.json

So let’s break down this query and discuss the jq filters I used:

paths[][].responses? | keys? | .[] : Creates an array of every HTTP response status code found. The use of the question mark tells jq not to return an error if there is nothing there (i.e., the element is missing).
group_by(.) : Group the array elements (the status codes) by their value.
map({code:.[0],count:length}) : Applies a filter and maps the status code to code and the number of elements in count.
sort_by(-.count) : Sorts the new mapped array in descending order by count
map(.code + “\t” + (.count | tostring))[] : Does a final map filter to give a more human-readable output and flattens the array to a raw format that can be displayed

The end result will look something like this:

So how is this valuable? We get an idea of what response codes are used. Immediately interesting are things like HTTP 500 codes being (mis)used. You can check through the API document and quickly see that the API returns 500 response codes even when it ISN’T an “internal server error”. And you can see that only one place uses 401 error codes, and it’s NOT in the login endpoint. It too returns a 500 code on a failed login. All this helps you understand how mature a developer is in their coding practices and how you should think about how the API will react to you sending over tainted data.

Extracting additional endpoint data to get a better view of the API

OK, let me show you one more creative way to quickly understand the OpenAPI document you are interrogating. While having a wordlist of all endpoints is useful, and understanding what status codes are used is enlightening, getting a quick human-readable summary is even more valuable.

This query is gonna look ugly. But it’s really helpful. Credit to Arnaud Lauret (aka the API handyman) for this one. I learned it from him a few years back.

Brace yourself.

$ jq -r '.paths | to_entries | map(select(.key | test("^x-") | not)) | map ( .key as $path | .value | to_entries | map( select( .key | IN("get", "put", "post", "delete", "options", "head", "patch", "trace")) | { method: .key, path: $path, summary: .value.summary?, deprecated: .value.deprecated? })[] ) | map( .method + "\t" + .path + "\t" + .summary + (if .deprecated then " (deprecated)" else "" end)) []' openapi-spec.json

The end results are really nice though:

It’s at this point that I highly recommend you start looking at using a somewhat unknown feature of jq. When more complex queries like this start cluttering your command line, it might make more sense to move the query into a file and then use the -f parameter to load it.

You can download your own copy here from my GitHub gist: dump-endpoints.jq.

Here’s how to use it:

$ jq -r -f dump-endpoints.jq openapi-spec.json

I’m not going to break down this query. You can always look at the jq manual if something doesn’t quite make sense. The idea though is it will reach in, parse out the API spec documentation and return the HTTP method, route, and description of every endpoint defined in the API.

It’s through these results we can quickly see a mapping of all the endpoints we might be interested in, allowing us to cross-reference with the API doc later to get even more details as required.

Extract full details of a specific API endpoint

Time for one last query. Once you have the ability to quickly dump your endpoint wordlist, determine how HTTP status codes are used and have good visibility into the different endpoints, now you want to be able to extract the full details of a specific endpoint.

You can do this by querying the paths array and extracting individual endpoints by their HTTP method.

Try something like this:

$ jq -r '.paths["/workshop/api/shop/products"]["post"]' openapi-spec.json

Notice what I did there? By quickly passing in the endpoint route and the detected HTTP method, you can dump the detailed JSON output for the exact API endpoint you wanna attack and know everything from the schema used in the body to the expected response codes and response structure.

I picked this specific route because it supports more than one HTTP method. Let’s query for the GET and see how it differs:

$ jq -r '.paths["/workshop/api/shop/products"]["get"]' openapi-spec.json

Nothing to it. Easy to move around the API document with a bit of jq query magic.

Conclusion

This article has shown you how to use jq, a powerful JSON processing tool, to quickly extract useful information from the OpenAPI documents you may be using while hacking APIs. You can use this information to better understand the API and determine which endpoints may be of interest to you to attack. Additionally, by using jq you can extract full details of a specific endpoint, including the schema and response codes expected, so you don’t have to waste time wading through tons of structured data in an overly complex JSON file.

You can build out more complex jq queries and store them in your own files for easy reuse, just like I did to dump details for all the endpoints using dump-endpoints.jq. In the end, building out this library of jq filters lets you move around your target’s API quickly, hone in on what is interesting, and get right to work.

I hope you found this creative approach helpful!

Like what you saw? Then check out my free ebook of The Ultimate Guide to API Hacking Resources where you can find tons more online resources focused on this sort of thing.

Dana Epp

Hey, I’m Dana, aka SilverStr. I build and break software for a living, and am a Microsoft Regional Director and Developer Security MVP. I’ve spent decades as a security architect that focuses on helping secure software, data, and infrastructure on both blue and red teams. As of late, I have been focusing more on my offensive tradecraft to help developers and IT administrators see the impact of exploitation on vulnerabilities in their work. This blog is my chance to give back to the community by sharing my experiences and war wounds from the trenches.