The Exploit Had a Docstring

Google's GTIG confirmed the first AI-generated zero-day: a 2FA logic flaw found by a criminal using an LLM, caught before mass exploitation. The code tells you what made it.

May 15, 20268 min read

Dark red gradient cover with glowing red accent and the post title in white

A docstring. Not a comment. A properly formatted, pedagogically correct docstring at the top of a Python function, explaining its parameters, its preconditions, and its expected return value. Inside an exploit script. Written to bypass two-factor authentication on a popular web-based administration tool.

That detail, from Google's GTIG report published May 11, is the one I keep returning to.

A human threat actor doesn't document their parameters. They wrote the function. They know what it does. The docstring exists because the tool that produced the code was trained to produce docstrings. It doesn't know it's supposed to hide what it is.

What happened

Google's Threat Intelligence Group documented the first known instance of a criminal threat actor using AI to develop a working zero-day exploit. Not a mutation of a known technique. Not a script-kiddie automation. A novel vulnerability, found in a popular open-source web-based admin tool, that traditional security tooling had not surfaced before the attacker found it first.

The vulnerability class is the critical thing to understand: this was not a buffer overflow, not a SQL injection, not a race condition. It was a semantic logic flaw. A developer had hardcoded an exception somewhere in the application's 2FA enforcement path. That exception created a contradictory trust assumption: valid credentials would unlock a bypass route that 2FA was supposed to block regardless. The code ran without errors. The control flow was technically correct. The exploit path only appears when you hold both the authorization model and the implementation in view simultaneously and ask whether one contradicts the other.

That's exactly what the AI did.

The actor planned a mass exploitation campaign across multiple targets. Google's proactive threat research found the exploit in the wild, worked with the vendor on responsible disclosure, and patched before the campaign deployed. As close calls go, the response was clean.

Why this flaw class doesn't show up in scans

Thirty years of security tooling was built around a specific model of what vulnerabilities look like: they have structural signatures. Memory safety bugs leave crash traces. Input validation failures show up in taint analysis. Cryptographic mistakes match known patterns. Fuzzers find crashes. Static analysis finds sinks. Both tools assume the vulnerability has a detectable shape in the code.

Semantic logic flaws don't follow that model. Consider a simplified version of what went wrong here:

def require_2fa(user, request):
    # Skip 2FA check for pre-authenticated sessions
    if user.session_flags & SESSION_PRE_AUTH:
        return True
    return user.totp_verified
 
def admin_action(user, request, action):
    if not require_2fa(user, request):
        raise PermissionDenied("2FA required")
    # ... proceed

The code reads fine. The function is named correctly. The comment explains the intent. The bug is that SESSION_PRE_AUTH can be set by a different code path that only validates credentials, not the second factor. There is no crash, no dangerous sink, no uninitialized variable. The code does exactly what it says. What it says contradicts the system's security model.

Finding this requires reading both the 2FA enforcement function and every code path that touches session flags, understanding the intended invariant, and noticing the contradiction. In a real codebase that's spread across dozens of files, multiple middleware layers, and several thousand lines of session management code.

GTIG's report is explicit about why LLMs are effective here: frontier models excel at identifying high-level semantic flaws, detecting hardcoded static anomalies, and performing contextual reasoning to correlate conflicting logic. Reading developer intent and surfacing contradictions. The capability is real. Traditional tools have no surface area on this flaw class.

The fingerprint

The code tells you what made it.

GTIG identified several specific artifacts in the exploit script: an abundance of educational docstrings, a hallucinated CVSS score embedded directly in the code as metadata, detailed help menus, a clean ANSI color formatting class with structure typical of LLM training data, standard library patterns in textbook arrangement. Professional, well-formatted Python. The kind of code a developer produces when they're trying to write something legible and maintainable.

None of that is characteristic of criminal exploit tooling. Exploit development tends toward working code, not clean code. You build the thing to function, you give it cryptic variable names if you name them at all, you don't ship a help menu. The LLM produced code that satisfied its internal quality standard. That standard is not the attacker's standard.

The hallucinated CVSS score is the most revealing detail. The model invented a severity rating and embedded it in the script as though it were fact. A human exploit developer either looks up the real CVSS (once a CVE has been assigned) or omits the field. The LLM included a confident, specific, incorrect number. The actor apparently didn't catch the fabrication, or understood enough to know it didn't affect the exploit's function.

Google notes they don't believe Gemini was used. The specific model doesn't matter much for the fingerprint analysis. These artifacts are characteristic of LLM output broadly, not of any particular system.

The implementation gap

The campaign failed, at least partly because of errors in the actor's implementation. GTIG notes the mistakes "likely interfered with successful use."

There's a real and important observation here. The actor used an AI to find and write the exploit but apparently couldn't fully validate or debug the output. They had a working zero-day in conceptual form and couldn't reliably operationalize it. The tool generated something technically correct and the operator couldn't tell where it was wrong.

That's a meaningful friction point right now. AI-generated exploit code can be sophisticated enough that the person directing the AI doesn't understand all of it. Debugging requires deeper expertise than generating.

But this is a temporary constraint. The gap between "AI produces the exploit" and "operator can use the exploit reliably" closes in two directions. Attackers get better at directing models, validating output, and iterating on failures. The models get better at producing exploit code that works without requiring expert-level debugging. Both trends are moving in the same direction. The implementation errors that broke this campaign are a data point about today, not a structural barrier.

The fingerprint won't last

The docstrings, the hallucinated CVSS, the textbook formatting: these are real signals today. Security teams who find Python scripts on attacker infrastructure carrying this signature have a new data point worth weighting.

The window is short. Threat actors now know what GTIG documented. The fingerprint is easy to strip. Ask the model to produce minimal working code. Remove comments. Obfuscate variables. Omit metadata fields. Ask for "exploit-style" output instead of "readable" output. The tell is a default behavior artifact, not an irreducible property of AI-generated code.

The specific signatures in the May 11 report will not be the signatures in reports six months from now.

What defenders can use

The toolkit for finding semantic logic flaws is underdeveloped compared to the toolkit for memory safety and input validation. That gap exists because the flaw class was expensive to find and rare enough that human review filled it adequately. Both parts of that calculus are shifting.

Some concrete starting points.

Start with the authentication and authorization layer. Ask an LLM to enumerate the invariants the system is supposed to enforce (every admin action requires verified 2FA, every session token must be bound to a specific IP range, etc.). Then ask it to find code paths that could violate each invariant. This is not a complete audit. It's a directed one that takes a day instead of a week.

Look for hardcoded exceptions in auth logic. Conditions like if user.is_internal, if request.is_legacy, if debug_mode. Every one is a candidate for a semantic contradiction. Grep for them. Review each against the intended behavior, not the local code context alone.

Write negative tests for your authorization invariants. A test that asserts "a session that has completed credential verification but not 2FA cannot reach any admin endpoint" will catch this class of bug at the source. Most authorization test suites don't have these because they require knowing what "cannot reach" means for every endpoint.

The same capability that found this flaw is available on the defensive side. GTIG has noted AI-augmented code review is already part of advanced offensive research. The same technique applied to your own codebase is not complicated. Point an LLM at your auth layer, give it the invariants you care about, ask it to find contradictions. Imperfect. Faster than the alternative.

The part that changed

For most of the last decade, "AI writes malware" meant automation of the boring parts: phishing templates, port scanning scripts, mutation of existing payloads. Nothing that required genuine understanding of a target system. Nothing a competent developer couldn't replicate in a day.

What Google documented is a different category. An AI independently reasoned its way to a novel semantic flaw in production software that traditional scanning had missed. It produced a working exploit for that flaw. The discovery, not the scripting, was the hard part. That was the part AI did.

The barrier that kept zero-day research expensive was always the cognitive cost of understanding a large codebase at the level of developer intent. Not what the code does, but what it was written to do, and where it fell short. Holding 100,000 lines of authorization logic in mind simultaneously and noticing the contradiction. That's the task that cost weeks of skilled researcher time. That cost is going down.

One case, one flaw class, implementation errors, no confirmed mass exploitation, a successful defensive response. It's not proof of a permanent shift. The direction is clear.

The exploit with the docstring was clumsy. The next one won't have one.

All posts

#security #ai #vulnerability #essay