April 9, 2024

Breaking APIs with Naughty Strings

We all know poor input validation is a critical attack vector for exploiting software. But did you know that a data set codenamed the Big List of Naughty Strings (BLNS) takes that to an entirely different level?

Yep. There is.

Let me show you how to use these naughty strings to break APIs.

What exactly is BLNS?

So, the Big List of Naughty Strings is an evolving list of strings that are highly likely to cause issues when used as user input data. Max Woolf has maintained it since back in 2015 and now includes over 500 different strings that can potentially abuse inputs.

These strings attempt to test for several conditions, including how systems interpret special characters, emojis, and Unicode. It also attempts several forms of script injection, SQL injection, and even Server code injection to see how the system responds.

It’s typically stored in a .txt file, which by its very nature, may trip up your text viewer when opened. So you should use an editor that visually renders hidden Unicode and special characters.

You can find blns.txt here.

If you open that with your browser, you might notice that GitHub even warns you about hidden Unicode characters.

But here’s the thing… I don’t want you to use this version of the file.

I want you to use the version found in the FUZZING section currently maintained in Daniel Miessler’s SecLists repo called big-list-of-naughty-strings.txt.

And here’s why.

Don’t be evil with naughty strings

Max’s blns.txt file is fine but hasn’t been updated in years. More importantly, it includes destructive strings for things like SQL injection that will actually DROP database tables if the injection is successful.

You don’t want to destroy things when doing input validation testing.

So don’t be evil… use the SecLists version instead.

Postman + BLNS = ❤️

Conducting input validation checks in the context of API security testing is easy… if you use the right tools.

For me, this is the perfect place to use Postman and its Collection Runner, especially when you use a custom collection for your security tests.

If you aren’t doing that yet, I highly recommend you check out my article on The Beginners Guide to Writing API Security Tests in Postman.

When using naughty strings in tools like Postman, they too can bork the app when processing malicious strings. We will want to do a bit of pre-processing so we don’t trip up our testing tools.

Turning BLNS into something Postman can handle

The Postman Collection Runner includes functionality to load/import data for each run. This data can be in CSV or JSON file format. Postman includes some good documentation on working with data files, but it fails to mention a few important things.

Things that will break Postman…

… namely, that special characters, commas, and quotes will cause issues with the data load.

And that makes sense. CSVs rely on commas to separate columns, while JSON files can’t easily handle double quotes in the elements. It’s standard stuff.

That’s OK. We can handle this ourselves with a bit of Python code to process the BLNS file itself, encode the data, and structure it so that Postman can use it.

Introducing “txt_to_postman_b64_json.py”

Max had already included a script to convert the BLNS text file into JSON in his repo. He also included a shell script that could Base64 encode the naughty strings so they wouldn’t trip up other tools.

I decided to merge the two ideas into a single Python script that also formats the JSON output in a format that Postman supports. Notice how I create a new property called “encodedNaughtyString” and place the Base64 encoded string in the value. You will need to know that later when we build a script to handle this data.

The code looks something like this:

#!/usr/bin/env python3

from argparse import ArgumentParser, Namespace
import os
import base64
import json

def main(srcFile: str, dstFile:str) -> None:

        if not os.path.isfile(srcFile):
                print( '-s argument is invalid. Is it a proper BLNS txt file? Aborting!' )
                return

        with open(srcFile, 'r') as f:
                # put all lines in the file into a Python list
                content = f.readlines()

                # above line leaves trailing newline characters; strip them out
                content = [x.strip('\n') for x in content]

                # remove empty-lines and comments
                content = [x for x in content if x and not x.startswith('#')]

                # Base64 encode all content to make Postman Collection Runner parser to not break
                content = [base64.b64encode(x.encode('utf-8')).decode('utf-8') for x in content]

                # insert empty string since all are being removed
                content.insert(0, "")

        encodedContent: list = []
        for c in content:
                encodedContent.append({"encodedNaughtyString": c})

        with open(dstFile, 'w') as f:
                # write JSON to file; note the ensure_ascii parameter
                json.dump(encodedContent, f, indent=2, ensure_ascii=False)


if __name__ == '__main__':
        parser = ArgumentParser()
        parser.add_argument( '-s', '--src', help='The source BLNS txt file to convert', type=str, required=True)
        parser.add_argument( '-d', '--dst', help='The destination filename of the encoded BLNS CSV', type=str, required=True)

        args: Namespace = parser.parse_args()
        main(args.src, args.dst)

You can also directly download the Python code here.

Usage is as simple as:

./txt_to_postman_b64_json.py -s big-list-of-naughty-strings.txt -d blns.json

Now that we have a data file that Postman can use let me demonstrate how you can use it for your own input validation testing.

Test Case: Login form in crAPI

So if you have read enough of my articles, you know I like to demonstrate my API hacking on OWASP’s Completely Ridiculous API (crAPI). It would seem a great place to demo input validation testing with naughty strings would be to attack the login form.

Let’s go do that.

Step 1: Testing for expected behavior in crAPI login

The first step we want to accomplish is to walk through the login process and document how the API functions. For crAPI, the API endpoint we are testing is /identity/api/auth/login.

Look for several behavioral characteristics, including:

What is the response to a successful login?
What is the response if both the email and password are incorrect?
What is the response if the email is correct, but the password is incorrect?
What is the response if the email is empty?
What is the response if the password is empty?
Are there any length or character restrictions on the email?
Are there any length or character restrictions on the password?

We can author our tests to account for and filter out expected behavior by mapping it out. We only want to FAIL a test when a naughty string makes the endpoint respond in ways NOT INTENDED.

What we see with crAPI login

In our case, here is what can be learned by interrogating the login endpoint:

A successful login returns a 200 HTTP response code, along with an access token in the token property of the JSON response.
Failed credentials return a 500 HTTP response code, with a body that says “UserDetailsService returned null, which is an interface contract violation”
A correct email but incorrect password returns a 500 HTTP response code, with the body showing “Bad credentials”.
A blank email returns a 400 HTTP response code, with a JSON response that includes a stack trace in the “details” property that includes the text “[must not be blank]”.
A blank password returns a 400 HTTP response code, with a JSON response that includes a stack trace in the “details” property that includes the text “[must not be blank]”.
When using a really short or long email, it returns a 400 HTTP response code with a JSON response that includes a stack trace in the “details” property that includes the text “[size must be between 3 and 60]”.
When using a really short or long password, it returns a 400 HTTP response code with a JSON response that includes a stack trace in the “details” property that includes the text “[size must be between 4 and 40]”.

That’s helpful. From this information, we can author our tests to account for these conditions and ignore them. Anything else we see will be suspect and should fail the test so we can investigate further.

Step 2: Set up test scenario in our Security Tests collection

As the login form has two separate fields, we should set up individual tests for each one. I recommend you group these in its own folder under your main collection. I called mine “Login Input Validation”.

In that folder, duplicate the Login endpoint from the collection holding the API docs (my docs collection was imported as “OWASP crAPI API”) and place it in your new folder. Rename the new endpoint to “Inject tainted data in email field.” Duplicate it again and rename that to “Inject tainted data in password field.”

When you are done, it should look something like this:

Step 3: Set up the “naughtyString” collection variable

During the test runs with the Collection Runner, we will want to inject the raw naughty strings directly into the payload of each request. To do this, we will use a placeholder to inject them directly into a collection variable called {{naughtyString}}.

Go to the Variables tab on the Collection and add a variable called “naughtyString”.

Step 4: Write a Pre-Request Script to process the encoded naughty strings

If you recall, my Python script encoded the naughty strings into Base64. We will need a way to decode that back into its raw naughty string format. We can do that in a Pre-Request Script.

The code looks like this:

// Need the atob library for Base64 decoding
var b64decode = require('atob');

// Get the base64 encoded naughty string injected by the Collection Runner
var encodedString = pm.variables.get("encodedNaughtyString");

// Set the decoded value as the "naughtyString" variable in the collection
pm.collectionVariables.set("naughtyString", JSON.stringify(b64decode(encodedString)))

A quick explanation of what the code does:

It loads up the atob() function into the sandbox so we can access it for Base64 decoding.
It extracts the current Base64 encoded variable on each iteration and stores it in the local encodedString variable. “encodedNaughtyString” is the property name we wrote out in the Python script.
It Base64 decodes the encodedString variable and then stringifies it into a JSON element. We do this so that special characters like double quotes are properly escaped in the JSON schema structure.
It stores that newly decoded and properly quoted raw naughty string into the collection variable {{naughtyString}} that we set in the previous step.

Step 5: Update the body payload to inject the naughty string

Head to the Body tab of the request. Update the field you want to test by inserting the collection variable {{naughtyString}}.

NOTE: The collection variable being used was already quoted when we did the JSON.stringify(). So do NOT include quotes around the variable as you typically do in the payload itself. If you forget this, tests will fail as the value will not be properly quoted/escaped.

Step 6: Write your tests to detect when naughty strings do bad things

OK, now it’s time to write our tests. We have several conditions to check for. For article brevity, I will just show you the code I added for the test conditions I am aware of for the email field:

pm.test("Test tainted input on email property", function () {
    if( pm.response.code === 500 ) {
        pm.response.to.have.body("UserDetailsService returned null, which is an interface contract violation");
    }
    else if( pm.response.code === 400 ) {
        let jsonData = pm.response.json();
        let expectedResponses = ["[must not be blank]", "[size must be between 3 and 60]"];

        pm.expect(jsonData.details).to.contain.oneOf(expectedResponses);
    }
    else {
        console.warn( "Unexpected response from login process in email field. Encoded injection str: " + 
            pm.variables.get("encodedNaughtyString"));
    }

});

One thing to note is that you don’t see me checking for a successful login. That would be done in the normal test coverage for positive testing. As we are purposely trying to cause unexpected behavior, I am looking for negative conditions outside of the norms we know about.

Step 7: Run the tests through the Collection Runner

Almost there. Time to see what happens.

Right-click on your test scenario folder and click “Run folder”.

For now, uncheck “Inject tainted data in password field.” Later, you can go back to that and update the Pre-Request Script, Body, and Test tabs to match the conditions for testing the password field.

On the right pane, Under Data click the button to Select file. Find the generated JSON file you created with my Python script and select it. Postman will load and parse your encoded naughty strings for you.

To verify that it loaded up correctly, you should see that Postman updated the Iterations field to the number of strings it could load. You can also click the Preview button to see what it parsed.

One last thing. Click the “Persist responses for a session”. This will let you look at the responses of each iteration (especially on failed tests) to debug how the API server responded.

It’s time. Click the orange button that says “Run Security Tests for crAPI”.

Watch what happens.

In our case, we tested over 500 different naughty strings in under 48 seconds against the email field in the login form, and crAPI handled it just fine.

Had it not, the test would have failed, and you could have immediately zoomed into it.

If you want to see what naughty string was sent in a request that caused the failure, you can click on the test and then the Request tab.

Conclusion

As you can see, injecting naughty strings to test input validation is relatively easy with Postman and the BLNS. You could very easily expand this to include your favorite payloads that are not on the naughty list and get a ton more input validation test coverage in no time.

Any time you see a place where you can inject data, you should consider using this approach. Need some more ideas? Maybe read my article on Attacking APIs by tainting data in weird places. This same approach could be used to tamper with headers, abuse query parameters, and taint payload data in any POST or PUT operation.

Have fun with it. Just don’t be evil. 😈

One last thing…

Have you joined The API Hacker Inner Circle yet? It’s my FREE weekly newsletter where I share articles like this, along with pro tips, industry insights, and community news that I don’t tend to share publicly. If you haven’t, subscribe at https://apihacker.blog.

Dana Epp

Hey, I’m Dana, aka SilverStr. I build and break software for a living, and am a Microsoft Regional Director and Developer Security MVP. I’ve spent decades as a security architect that focuses on helping secure software, data, and infrastructure on both blue and red teams. As of late, I have been focusing more on my offensive tradecraft to help developers and IT administrators see the impact of exploitation on vulnerabilities in their work. This blog is my chance to give back to the community by sharing my experiences and war wounds from the trenches.