June 20, 2023

A “cewl” way for API discovery

Imagine this. You’ve been working on a target for some time now, investing to map out all the API endpoints. You have test coverage against everything you can see and everything you could document when you produced your own rogue OpenAPI documentation.

But is there more?

Probably.

I’m going to show you a couple of interesting ways to find out.

The unspoken rule about API discovery

So here’s the thing. Read any book or watch any video on hacking web apps and APIs and they will eventually talk about recon. The concept of discovering new API endpoints through both passive and active reconnaissance is an important skill to learn.

Being able to document the attack surface and understand how endpoints are being called is a valuable endeavor when looking at existing API infrastructure. But you aren’t guaranteed that you will see everything. That’s because you may not be in the right privilege context to have access to everything.

There is an unspoken rule about API discovery that you won’t find in any book. It comes from experience and the war wounds of failure. At least for me.

In one sense, it’s pretty clear and logical. If your account doesn’t have enough privileges to access particular endpoints, you may not know it’s even there. A good example might be an admin module that you will never see as a normal user.

But in the API world, it goes deeper than that. More mature APIs will take advantage of scopes and roles, along with feature flags, to limit what someone can see in an app or API. And if you don’t know about these roles, scopes, or flags, you might be missing a huge area of an API.

What are feature flags?

Think of a feature flag as a tool to control access to the software at runtime without needing to deploy new code. Developers like to use feature flags in application programming interfaces because they can run code in parallel in production without affecting all users.

So if a company has a beta version of some new features in an API endpoint instead of hosting an entirely new endpoint they can flag off the code so only certain users can access it.

In many cases, this is nothing more than a glorified if / else block of code that runs depending if the flag is set.

The use case for feature flags is pretty clear. It allows for better product testing and allows for canary launches, rolling out new updates and features to a small set of customers at a time. This is quite useful for web apps and APIs. And is a software architecture pattern many APIs use.

What are roles and scopes?

Roles and scopes are key components of an access token for APIs. These determine what a user can access and how they can interact with the APIs.

Roles are the groups of permissions associated with a user that indicate what they can access in the API. These can be used to restrict access to certain functions or areas of the API and limit what the user can do.

Scopes are the individual permissions within a role that determines what a user can do with the API. This can be anything from reading data from a certain table to editing records, or from running a certain query to deleting data.

Together, roles and scopes form an important part of an API security model, as they can restrict even those users with an access token to a limited set of resources.

How feature flags and scopes expose endpoints

So back to the whole API discovery thing. Now that we know about how feature flags work and how roles and scopes usually work in an API, we can start to glue together a recon methodology for discovering potentially new features.

Break open your access tokens. If possible, use a proxy so you can capture API traffic and collect several different tokens to see how they differ. Try to capture these tokens within different privilege contexts (ie: standard user vs admin) so you can see what changes.

You will usually find a structure to the permissions of an endpoint. As an example, you might see in the Scopes claim of your access token permissions that look like <action>:<endpoint>. ie: Being able to create a user might look like create:user. Or it could be in an array. Or in single words.

The exact format isn’t perfect here. And it will differ between APIs, and maybe even different endpoints in the same API.

But you will usually see patterns. This is helpful, as it might expose undocumented endpoints in the Roles and Scopes itself.

Automated API discovery tools don’t normally detect feature flags and roles & scopes to map to endpoints. You need to manually review this and try to pinpoint how permissions in access tokens may expose features you don’t normally have access to. Usually, this is lit up on the frontend web app but has now crept into the API backend.

Finding features in other “cewl” ways

So cracking open access tokens is useful, but I have another innovative way to discover API endpoints.

Innovative. But not original. In fact, you probably know how to do this already but just haven’t put it together yet.

All you need is a custom wordlist generator that can produce an output file of parsed words taken from the release notes, changelog, or product roadmap pages of the target’s website.

A great tool to use for this is something called CeWL.

What is CeWL?

CeWL is pronounced cool. CeWL stands for Custom Word List generator.

It’s a ruby app that spiders a given URL to a specified depth, which can optionally follow external links to other sites and returns a list of unique words that can then be used for things like password cracking or bruteforce discovery.

The CeWL project is free open-source software (FOSS) that can be found on GitHub. You will find the command line app pre-installed in some systems like Kali Linux and is available in most packaging systems like brew, apt, and rpm.

Using CeWL in a cool way

Knowing CeWL is a custom word list generator that can collect unique words from a URL, this is perfect for your API discovery process. Point it to release notes, changelogs, or product roadmaps that might naturally expose features, functions, or component names in its documentation.

Chances are, this extra meta data won’t yet be in published API documents. However, when creating APIs it’s common for developers to use naming conventions that map to what ends up being described in roadmaps and release notes.

Here are a few tricks to passing arguments to cewl to make your custom word lists more useful when generated:

Set a minimum word length using the -m argument based on the patterns you see from existing endpoint names. By default, cewl is configured to 3 letters.
If you find you aren’t collecting enough words consider changing the search depth using the -d argument. The default is to spider down 2 levels. Just remember it takes considerably longer to spider much deeper.
If you know that the API framework in use only uses a lowercase naming convention, use the –lowercase argument.
Use the -H argument to set custom headers or pass in bearer tokens.

A real-life CeWL usage example

I want to give you an example of how I have used CeWL in the past to get early access to an API endpoint that I shouldn’t have generally known about (yet).

So let me show you how I keep tabs on the Microsoft Graph API.

Now, I’ve talked before about how to attack the Microsoft Graph with Postman. But on several occasions, I have been alerted to new endpoints before Microsoft actually published them in their Postman collection.

Here’s how.

A “cewl” way to monitor the Microsoft Graph

Microsoft keeps a changelog of everything that goes on in the Microsoft Graph. If you aren’t aware, the Microsoft Graph is the gateway to data and intelligence in Microsoft 365. It’s core to pretty much everything in the Microsoft cloud.

You can use its complex filtering to narrow the search down. I like to look for new additions to the beta version of the Graph API.

Even more interesting is that you can access the RSS feed for a filtered changelog at developer.microsoft.com/en-us/graph/changelog/rss/?search=&filterBy=beta,Addition

All we need to do is point cewl to that URL so it can start collecting unique words and other associated meta data.

In my case, I have a cron job that runs every week that generates a new wordlist and then runs a delta diff against the previous week’s list to find new words. I filter out any word that is less than 12 letters because Microsoft’s naming conventions for endpoints in the graph are far too descriptive to be just a few letters.

A typical cewl command might look something like this:

cewl 'https://developer.microsoft.com/en-us/graph/changelog/rss/search=&filterBy=beta,Addition' -m 10 -w graph-beta-adds.txt

The result is a curated list of words that might represent endpoint names. As I usually focus specifically on Identity & Access Management in the Microsoft Graph, I’ve had success with this query:

cewl 'https://developer.microsoft.com/en-us/graph/changelog/rss/?search=&filterBy=beta,Addition,Identity%20and%20access' -m 10 -w graph-beta-iam-adds.txt

YMMV of course. But it’s surprising how easy it is to extract a custom wordlist directly from a changelog feed. Now consider that for product roadmaps and release notes.

Can you see the potential here?

At this point, you can feed this wordlist to your favorite API discovery bruteforcer like Burp Intruder, feroxbuster, ffuf, gobuster, kiterunner etc etc and let it go to town hunting for API endpoint paths. Just remember to follow the known path naming conventions you’ve seen in everything else you have tested.

Conclusion

CeWL is a powerful tool that can be used to generate custom wordlists for API discovery. Using CeWL with release notes, changelogs, and product roadmaps gives you the opportunity to discover endpoints before they are publicly documented.

This can help you stay one step ahead of your competitors in the world of API security testing.

Do you know how else you can stay ahead? By subscribing to my personal newsletter, the API Hacker Inner Circle. I’ll see you over there.

Dana Epp

Hey, I’m Dana, aka SilverStr. I build and break software for a living, and am a Microsoft Regional Director and Developer Security MVP. I’ve spent decades as a security architect that focuses on helping secure software, data, and infrastructure on both blue and red teams. As of late, I have been focusing more on my offensive tradecraft to help developers and IT administrators see the impact of exploitation on vulnerabilities in their work. This blog is my chance to give back to the community by sharing my experiences and war wounds from the trenches.