GitHub Got Owned by a Semicolon

CVE-2026-3854 let any authenticated user pop GitHub's backend with a single git push. The bug class is older than I am. So why does it keep working?

May 2, 20268 min read

Dark cover with a glowing red accent and the post title

Last week, Wiz Research showed that any authenticated GitHub user could execute arbitrary code on GitHub.com's backend with one git push. The payload was a semicolon. The blast radius was millions of private repos.

Read that again. Not a CSRF. Not a deserialization gadget. Not even an SSRF. A semicolon, in a place semicolons weren't supposed to be.

CVE-2026-3854 was disclosed on April 28th. By the morning after, every security feed was running the same headline. I want to write about the bug, but more than that I want to write about why this class of bug refuses to die — and why I think we're going to keep paying for it.

Six lines

Here is the entire vulnerability, as best I can reconstruct from the public writeup.

GitHub's edge accepts your git push. Behind it, an internal proxy called babeld shovels the request into the rest of the stack. Push options — those -o key=value flags you almost never use — get serialized into an internal HTTP header called X-Stat. Multiple options are joined with semicolons:

X-Stat: rails_env=production;custom_hooks_dir=/data/hooks;repo_pre_receive_hooks=

babeld did the obvious thing. It took the user-supplied value and concatenated it. It did not escape semicolons.

So you push with -o "rails_env=production;custom_hooks_dir=/tmp;repo_pre_receive_hooks=../../bin/sh", and now the receiving service parses your single push option as three of them. One flips the environment from sandbox to production-direct. One redirects where pre-receive hooks are loaded from. One does a path traversal to a binary the service will helpfully execute as the git user.

Six lines of conceptual surface, three orders of impact. CVSS 8.7. About 88% of self-hosted GHES instances were still unpatched at disclosure.

That is the entire story. We've seen it before. We've named it before. It's CRLF injection's older, blue-collar cousin: header injection. The CWE is 77 — Improper Neutralization of Special Elements in a Command. It was old when Bush Jr. was president.

So why is it working in 2026?

The naive answer is "because the developer forgot to escape." That's true and useless. The interesting answer is structural.

When I started building things, the threat model for header injection was external: an attacker controls something that flows into an HTTP response or a log line, and they smuggle a CRLF to forge a header. We learned that. Modern frameworks reject \r\n in header values. Reverse proxies normalize. The naive version is mostly handled.

What we did not handle is the explosion of internal protocols. Microservices talk to each other in HTTP-shaped envelopes that nobody calls HTTP. Service meshes inject X-Forwarded-*. RPC layers prepend X-Tenant-Id. Internal proxies invent their own headers — X-Stat, X-Internal-Auth, X-Trace-Span — and those headers get parsed by another service downstream. Each new header is a tiny ad-hoc protocol. Each ad-hoc protocol gets ad-hoc parsing. Ad-hoc parsing skips the escaping nobody remembered to spec.

The framework-level protections that made external header injection rare don't help here, because the framework doesn't know your X-Stat is a delimited list. To the framework it's a string. You are the parser.

GitHub's mistake was not "we forgot to escape." It was "we built a serialization format inside an HTTP header and didn't write it down." If X-Stat had been documented as key=value pairs joined by semicolons with this escaping rule, the next engineer touching babeld would have had a fighting chance. Instead the format lived in two places — the writer and the reader — and neither knew about the other's assumptions.

I have shipped this exact bug at three different jobs.

The internal protocol problem

Here's the pattern, and once you see it you can't unsee it:

Two services need to pass structured data.
They already have an HTTP connection.
Adding a new endpoint is "too heavy."
Someone stuffs a tuple into a header. X-Things: a,b,c.
Six months later someone needs a fourth field. They pick a different separator. X-Things: a,b,c;extra=foo.
Eighteen months later, user input flows into one of those fields.
Twenty-four months later, Wiz finds it.

Every step makes sense in isolation. The aggregate is a bug factory. The fix isn't "don't put data in headers" — sometimes that's the right move. The fix is: if your header carries a structure, treat it like a protocol. Specify it. Test the parser. Fuzz the writer against the reader. Reject malformed values at the boundary, not in the middle.

The reason header injection keeps shipping is that we keep building protocols accidentally and treating them like strings.

The patch isn't the interesting part

The patch is short and good. babeld now percent-encodes semicolons and other reserved characters before composing X-Stat. The reader was already tolerant. Total diff: small. Wiz has the details.

The interesting part is the timeline. GitHub deployed the GitHub.com fix under two hours after validation. That's elite incident response — most enterprises measure in days. The shipped product is in good hands.

Then read the next number: 88% of GitHub Enterprise Server instances unpatched at public disclosure.

GHES is the on-premises version that big companies and governments run because they can't put their source on someone else's machine. Every one of those 88% has now had four weeks to patch, and the public exploit chain is documented well enough that anyone with git and a free account can reproduce it. The window between "you should have patched this" and "you absolutely will get owned" is measured in days.

If you run GHES, the answer is 3.19.3 or later. If you don't know whether your team runs GHES, the answer is "find out today."

What defenders should change

I'm wary of writing security checklists. Most of them are tribal knowledge laundered into a numbered list. But three things from this bug are concrete enough to put down:

Treat your internal headers like protocols. If a service emits a header with structure, write the structure down. Pin a parser. Fuzz the parser against the emitter. The cost is one afternoon. The savings, statistically, are a CVE.

Sandbox escapes shouldn't be a config flag. The first link in this chain was rails_env=production flipping the binary out of a sandboxed execution path. If your sandbox is opt-in by config string, your sandbox is not a sandbox. The escape needs to be load-time-only, signed-binary-checked, or just plain absent. Anything that lives in user-influenced state is a trapdoor.

Audit your "ergonomic" features for trust boundaries. Push options were added because someone wanted CI hooks to receive metadata from the push. Reasonable feature. The trust boundary, though, walks straight from the user's keyboard into an internal proxy that does internal things. Every new ergonomic feature needs an "is this string trusted by anything downstream?" review. Most don't get one because the feature doesn't feel dangerous.

The ergonomic-feature point is the one I keep returning to. Push options sound boring. Logger metadata fields sound boring. Cookie attributes sound boring. They're all the same trapdoor — convenience surfaces that quietly become attack surfaces because the threat model didn't update when the feature shipped.

A footnote on AI-augmented vuln research

The Wiz writeup mentions in passing that they used "AI-augmented reverse engineering" on GitHub's closed-source compiled binaries. I want to flag this because I think it's the actual long-term story.

GitHub's babeld is a Go binary that ships in GHES. It's not on GitHub. You cannot read its source. Until recently, finding bugs in it required either a leak, a debugger session against a paid GHES license, or the patience of a saint with IDA Pro. Wiz pointed an LLM at the disassembly, asked it to summarize control flow, and got somewhere fast.

That's a phase change for offensive research. The set of practically auditable software just expanded by an order of magnitude. Every closed-source binary on a defender's network is now slightly less private. I don't have a tidy take on what to do with that — but if your security model has been "the bug isn't worth finding because the source isn't public," that model is over.

For defenders, the same techniques work. Take your own production binaries. Disassemble them. Ask an LLM where the parsers are. You will find at least one place that doesn't validate input. I'd bet a coffee.

Why this one matters

GitHub is plumbing. Half the supply-chain attacks of the last five years route through it. A working RCE on GitHub.com would have been a generational supply-chain incident — one push and you have read access to private dependencies for any org sharing a storage node. You don't even need to be subtle. You can tar the disk and leave.

It didn't happen. GitHub's response was fast and Wiz's disclosure was responsible. But "didn't happen" is not the same as "couldn't happen." The bug existed for an unknown number of years. The same bug class exists in software you and I write today, in places we haven't named yet.

The lesson I keep coming back to is unglamorous: header injection is not a class of bug we can claim to have killed. We pushed it out of the front door and let it move into the back. As long as we keep stuffing tuples into header strings without writing the protocol down, we are going to keep paying for it.

Patch your GHES. And while you're there, grep your own code for any line that builds an HTTP header by string concatenation. Then write the protocol down.

All posts

#security #vulnerability #github #essay