Last fall it was reported on InfoWorld that some researchers uncovered thousands of active API security tokens embedded in popular open-source software repositories like Docker, NPM, and PyPI.
It’s not uncommon for developers to hardcode credentials in their apps and configs and forget about them. I talked about that last year before this research was even published but in the context of mobile apps that access APIs.
In any case, what WAS surprising was the fact that they found 66.5% of the secrets were hidden in layers within Docker images.
Today, I want to show you how we can build upon that research and find more than just API secrets within the layers of containers you may have under security testing. We can find credentials, database connection strings, and even API artifacts like source code and byte code that may have been deleted before publishing the Docker images.
That’s right. I’m going to show you how to recover sensitive files and configs even if they were deleted which may aid you in your gray box testing.
It all starts by understanding what Docker layers are all about.
What Are Docker Layers and Why You Should Care About Them
When you pull a docker image from a repository, do you ever notice it comes down in “layers”? When looking at a Dockerfile config, any time it executes a RUN, ADD, or COPY command it creates a new layer.
These layers are overlayed onto themselves to construct a final image that you run in Docker. In modern deployments of Docker, this is done through a special file system called OverlayFS.
I’ll get more into that in a moment. But first, here is an image that helps to demonstrate what I am getting at:
Notice at the bottom you have the “Image layer”. As the image is run in a container, it becomes the “Container layer”, and any changes within the container are ultimately merged into the “Container mount”.
The takeaway I want you to understand ultimately is that the “Image layer” contains one or more intermediate layers and is stored in the “LowerDir” of OverlayFS. The currently running container layer is called the “UpperDir”. This will all come together later when we analyze hidden layers and want to access actual files… even if they were deleted or modified by a later Docker command that creates a new layer on top of it.
If you want to get an idea of what these layers look like, you can run this docker command:
docker image history <IMAGE NAME OR ID>
Using OWASP’s Completely Ridiculous API (crAPI) as our target API, look at the history of its layers might look something like this:
docker image history crapi/crapi-identity
While this command has some value, in a moment I will show you a better tool for analyzing layers. But first, let’s inspect our target Docker image in more detail.
Inspecting Docker Images
Docker includes several commands to work with images. If you want to get a list of all images on the current Docker host, you can simply use
docker images. If you want to filter for all images in a repository like crapi, you could do something like
docker images "crapi/*".
An even more powerful command to inspect images is
docker image inspect <IMAGE ID>. It brings back a huge amount of data in a JSON format. Try this command on the identity container for crAPI to see what I mean:
docker image inspect crapi/crapi-identity
A huge amount of information. Let’s parse out the layer data that matters to us.
Parsing Layer Data
Since Docker’s inspect command returns to us well-formed JSON data, we can use the jq tool to parse it out. If jq is new to you, I suggest you check out the primer I gave in a previous article.
In our case, we want to extract the data stored in the GraphDriver node. It might look something like this:
docker image inspect crapi/crapi-identity | jq '..GraphDriver.Data'
Now things are starting to come into focus. By parsing out the layer data, you can start to see the LowerDir and UpperDir properties of the Docker image. In fact, if you look closely at the LowerDir property, you will notice that it is colon-separated, representing several layers, as they have been stacked on top of each other within OverlayFS.
OverlayFS is a file system that provides the ability to “overlay” multiple directories so that when a file is requested from one directory, it will be retrieved from the other directories within the OverlayFS stack if it isn’t found in the first directory. This allows for a Docker image to combine several layers into one complete image.
In addition, because of the way OverlayFS works, it is possible to access the intermediate layers that were used to construct the Docker image (the so-called LowerDir of the stack). This can be invaluable when attempting to discover sensitive files, secrets, and settings that have been hidden in a layer.
Typically you will find these layers stored in
To see OverlayFS on a modern MacOS though, you need to get into the VM where Docker is actually running, which you can do with the following command:
docker run -it --privileged --pid=host debian nsenter -t 1 -m -u -n -i sh
Now that you know how to parse our layer data and know where OverlayFS is actually storing physical files and directories, let me show you a new tool that can make exploring these layers even easier.
Analyzing Hidden Layers with Dive
There is an open-source tool called Dive that provides an easier and more efficient way to explore the hidden layers of any Docker image. Dive allows developers and security professionals to perform a deep analysis of the layers in their image without having to manually dig through the OverlayFS directories.
It provides a graphical UI for visualizing each layer, including its contents and differences from other layers in the stack. In addition, Dive allows users to quickly access detailed information about each layer such as its size, version history, creation date, content hashes, and dependencies.
With this powerful toolset at your disposal, you can quickly identify potential vulnerabilities in Docker images under test or uncover sensitive files and secrets that were intentionally changed or deleted in lower layers.
Once dive is installed, to run it you just type
dive <IMAGE NAME OR ID>. Once run it will fetch the image (if needed), analyze and cache it before loading it into its UI. Here is what it looks like when loading the identity image for crAPI:
As you can see on the left pane, it lists out all the layers of the image. You can use the up and down arrow keys to move between layers, which will reflect and update changes in the Layer Details as well as the content on the right pane.
The right pane showcases all the content within that layer. You can move between the left and right panes by using the TAB key.
While in the content pane, you can use several shortcut hotkeys to manipulate what you see:
- CTRL+U will toggle showing “Unmodified files”.
- CTRL+A will toggle “Added files”.
- CTRL+M will toggle “Modified files”.
- CTRL+R will toggle “Removed files”
By default, all files are shown. So here is a useful tip that can help you zoom into suspicious files you might want to investigate…
Look for Removed files first. This is usually a good indication someone is trying to hide something they don’t want you to see. Then check for any Modified files, especially configuration files where they may be changing settings and keys. Then look for Added files to find the files added to run the software.
So when you first tab over to the contents pane hit CTRL+U, then CTRL+A, then CTRL+M to hide all but Removed files. Use the down arrow key to cycle between each layer and see what files may have been deleted in a previous layer.
Then move back to the first layer by hitting the up arrow until you get to the top of the layers, and hit CTRL+M to add back in the Modified files and cycle between layers again, and watch what files are modified at each layer. Then complete this cycle one last time by hitting CTRL+A to show Added files.
You will be surprised how quickly you can determine what files have been added, deleted, or modified to the Docker image using this approach.
Now that we know how to look for stuff, let’s talk about the types of things you should be looking for.
API artifacts to look for
Every web application and API will be different. So it’s impossible to give you a single answer as to what to look for. However, there are some things that are good leading indicators. Here are just a few examples:
- Look for environment variable files like
- Look for config files with API Keys, credentials, DB strings, etc. Examples include files like
- Look for any API and web app source code or byte code. Consider using Daniel Miessler’s SecLists web-extensions wordlist to search for specific web-friendly source code files.
- Look at the final layer’s Command (Found in the Layer Details) to determine what is being run and from what directory. That will usually point to a shell script or a command line that runs an interpreter (Bash/Python/Node/Java/DotNet etc.) that runs the main application within the container image.
Here is a tip on something I typically do. I took Daniel’s wordlist and added several byte code specific extensions (
.dll, etc.) and search across all layers of an image using the
find command. It looks something like this:
find . -regex '.*\.\(asp\|aspx\|bat\|c\|cfm\|cgi\|class\|com\|dll\|exe\|hta\|htm\|html\|inc\|jar\|jhtml\|js\|jsa\|jsp\|mdb\|php\|php2\|php3\|php4\|php5\|php6\|php7\|phps\|pht\|phtml\|pl\|reg\|sh\|shtml\|sql\|swf\)$’ 2>&1
I find in most of my gray box testing I want to try to identify all the source code and byte code that exists in an image so I can reverse engineer the API as quickly as I can. That command within a shell script can iterate across each layer and search OverlayFS in a matter of seconds.
Accessing Files in Hidden Layers
Thanks to Dive it becomes easy to hone in on exactly which files we want to take a look at. However, Dive doesn’t allow you to actually look at the contents of files. For that, we want to combine the information we find in Dive with some of the jq goodness we were working with earlier to find exactly where these files are on disk within OverlayFS.
First, I want to show you one of my favorite commands I run that helps with this:
docker image inspect <IMAGE ID OR NAME> | jq '..GraphDriver.Data.UpperDir + ":" + ..GraphDriver.Data.LowerDir | split(":") | reverse'
Let me explain what that command does:
- It inspects the target Docker image to get a list of all its layers
- It parses out the UpperDir layer path and appends it to the LowerDir layer paths of the image.
- It splits the LowerDir paths into an array
- It reverses the array so that the oldest layer is on top, MATCHING what you see in Dive.
So now when you see a suspicious file in Dive, you can simply count which layer it is in and then match it to the output from the above command. You will find the file(s) there.
It’s no secret that Docker containers have quickly become the go-to choice for many companies when it comes to deploying their applications and APIs. Unfortunately, this ease of use often comes at the cost of security. It’s easy to forget about code or settings tucked away in hidden layers and as a result, leave them exposed.
As an API hacker, it’s important to know where these suspect settings and files can hide and how to find them.
By using the tools and techniques described in this post you can easily track down any sensitive data that may be embedded within Docker container images. With a little bit of creativity, you can even use the same techniques to search for code and settings in other application layers. So the next time you need to find API secrets or maybe even the source within a Docker image, you will know exactly where to look!
Want to learn more techniques for API hacking? Download my free PDF on The Ultimate Guide to API Hacking Resources.