Claude 4.7 and the End of Prompt Engineering as We Knew It
The era of bag-of-tricks prompts is closing. The new bottleneck isn't writing the perfect sentence — it's engineering the right context. Notes from a year of building with agentic models.
There's a moment in every developer's life with a new model where you stop trying to outsmart it.
For me, with Claude 4.7, that moment came in March. I had inherited a 2,300-line legacy invoicing module — the kind of code where the original author's git email no longer resolves. I opened Claude Code, pointed it at the file, and typed exactly what I would have said to a senior engineer joining the team: "This module is breaking on edge cases for multi-currency invoices. Figure out why and fix it."
No "act as a 10x engineer." No "think step by step." No XML scaffolding around the request. No examples of input and output. Just the problem.
It found the bug in 90 seconds. The fix was three lines. The PR description it wrote was better than what I'd have written myself.
I closed my laptop and thought about all the prompt-engineering rituals I'd accumulated since GPT-3 — the personas, the few-shot examples, the magic incantations passed around in Discord screenshots — and realized most of them were now superstition. Not wrong, exactly. Just no longer load-bearing.
This essay is about what's changing, what isn't, and what to do instead.
The era we're leaving
For most of the last four years, prompt engineering looked like rhetoric class. We learned to flatter the model ("you are an expert"), to threaten it ("if you get this wrong, a kitten dies"), to bargain with it ("I'll tip you $200"), and to coax intermediate steps out of it ("let's think step by step"). Whole startups were funded on the premise that prompting was a craft you could productize.
Some of those tricks worked. Most of them worked because the underlying models were unreliable in specific, predictable ways: they'd skip steps, hallucinate confidently, and go off-task if you didn't pin them down. The prompts were workarounds for model weaknesses, not interactions with model strengths.
What happens when the weaknesses go away? You're left holding a toolbox of solutions to problems the model no longer has.
Claude 4.7 doesn't need you to ask it to think before it speaks. It thinks. It doesn't need you to demand JSON; it gives you JSON when JSON makes sense. It doesn't need you to forbid hallucinations; it just notices when it doesn't know and says so. The skills we built around prompt fragility are, increasingly, busy work.
What actually changed
Two shifts, working in tandem:
Models got better at intent inference. Earlier models matched your prompt syntactically. 4.7 matches your purpose. If you ask it to "clean up this function," it doesn't apply a checklist of style rules — it figures out what kind of function it is, what the surrounding code expects, and what "clean" means for this specific case. The output is the output a thoughtful colleague would produce after reading the file for thirty seconds.
Tools became first-class. The model isn't just generating text anymore. It's grepping your codebase, running shell commands, querying your database, calling your APIs. Anthropic's been pushing this aggressively with MCP — Model Context Protocol — which is essentially a standard for "here's how a tool exposes itself to a language model." The result: instead of trying to describe your codebase in a prompt, you give the model grep and let it look.
These two shifts compound. A model that can both infer intent and use tools doesn't need you to over-specify anything. It will go figure things out.
The new bottleneck: context, not prose
If prompts aren't the bottleneck, what is?
The right context being in the window when the model needs it.
This sounds obvious. It's not. Most of the failures I see now aren't prompt failures — they're context failures. The model didn't know about the constraint in config/billing.yml because nobody loaded it. It didn't know that processPayment was deprecated because the deprecation notice lives in a Slack thread from 2024. It didn't know your team prefers Result<T,E> over throwing because that lives in a CONTRIBUTING.md it never read.
The skill that matters now isn't writing better sentences. It's curating what the model sees.
Three concrete shifts that come out of this:
1. Stop writing instructions. Start curating files.
Old habit: write a 600-word system prompt explaining how the model should behave.
New habit: drop a CLAUDE.md at the root of your repo with the things it actually needs to know — the conventions, the gotchas, the "we tried this and it didn't work" history. Twenty lines beats a thousand-word prompt.
# Conventions
- We use `Result<T, E>` for fallible ops, never throw.
- Database queries go through `lib/db/` only — no inline SQL.
- Money values are always stored as integer cents.
Decimals will silently corrupt across the GraphQL boundary.
- The legacy `services/billing-v1/` is read-only. Don't touch it.The model reads this once and behaves accordingly. No system prompt acrobatics.
2. Tools beat prompts.
Old habit: paste the relevant file into the prompt.
New habit: give the model a way to fetch the file when it decides it needs to.
This is a meaningful inversion. When you paste a file, you're guessing what's relevant before the model has thought about the problem. When you give it read_file and grep, you let it look at what it needs after it's reasoned about the task. The token economics are radically better, and the answers are better because the model is reasoning over the actual code rather than your edited summary of it.
The same logic extends out: a database is a tool, not a prompt. Your issue tracker is a tool. Your design docs are a tool. The model with three good tools eats the model with a perfect prompt.
3. Memory persists what conversation cannot.
The third move is recognizing that some context shouldn't be re-derived every conversation. "This user is a senior backend engineer who hates yak-shaving and wants terse answers" is the kind of thing that should outlive a session.
Claude Code's memory system, the long-term memory in the apps, the sub-agent persistence patterns — all variations of the same idea: don't rebuild context that you already built.
If you find yourself re-explaining your stack, your preferences, or your project's history every time you start a chat, that's a memory problem, not a prompt problem.
A worked example
Let me make this concrete. Last week I shipped a small feature: webhook delivery retries for failed payments. Here's the actual prompt I gave Claude Code:
We're losing payment confirmations when our webhook endpoint times out. Add retries with exponential backoff. Make it production-grade.
That's it. Twenty-four words. No persona, no chain-of-thought instructions, no examples.
What it did:
- Grepped for
webhookto find the existing delivery code. - Read the file, noticed we already had a
BackoffPolicyutility from a different module. - Read
BackoffPolicyto understand its interface. - Asked the question I would have asked: "I see a
BackoffPolicyinlib/retry/. Should I reuse it, or does payment delivery have constraints that need a separate policy (e.g., idempotency keys, max-age cutoffs)?"
I told it to reuse, and that idempotency keys were already handled upstream. It implemented the retry, added a small migration to track delivery state, wrote a test that hit the actual webhook server with a flaky-on-purpose handler, ran the test, and opened a PR.
The thing that made this work wasn't a clever prompt. It was that:
- Our
CLAUDE.mdtold it about our test conventions. - It had a
greptool. - It had a
read_filetool. - It had
bashto run the test. - It had the previous payment-related PRs in its working memory.
I wrote twenty-four words. Everything else was infrastructure. That's the shift.
What to drop
In rough order of how much it pains me to admit each one:
- "You are an expert in X." Models don't need to be told this anymore. It biases the output toward the kind of thing an "expert" talks like rather than what an expert would do.
- "Let's think step by step." Modern Claude already thinks. Adding this phrase mostly bloats the output.
- Few-shot examples for general tasks. Still useful for genuinely novel formats. Mostly redundant for well-known ones.
- Threats and bribes. Embarrassing in retrospect. The model is not a child and was never motivated by your fictitious tip.
- XML tag scaffolding for simple requests. Still helpful for complex multi-input prompts and for clearly delimiting user data from instructions. Overkill for "fix this bug."
- "Do not hallucinate." It does not know what hallucinating means in the way you mean it. It does, however, know how to ground in tool output.
What to keep
Not everything from the prompting era is dead:
- Specificity about the goal. "Fix the bug" is fine; "make the code better" is not. The model needs to know what done looks like.
- Constraints that aren't visible from the code. Performance budgets, compatibility requirements, deadlines, things the user values that aren't expressible in tests.
- Structured output when you need to programmatically parse. Schemas and tool-call signatures, yes. Vague "return JSON" pleas, less needed.
- Worked examples for genuinely tricky domain DSLs. If you're asking for output in a custom format with seventeen rules, show, don't tell.
- The "ignore my last message" escape hatch. Still works. Still useful.
The pattern: keep the prompting techniques that give the model real information. Drop the ones that just try to coerce its behavior.
The skill that replaces it
If prompt engineering is fading, what's the replacement? My working answer: context engineering.
Context engineering is the discipline of making sure the right information is available to the model at the right time, through the right channel — files, tools, memory, retrieval — without overwhelming the window or under-feeding the question.
It's a real engineering discipline, and it's harder than prompt engineering ever was, because the failure modes are more varied. A prompt either works or it doesn't, and you iterate. A context system can fail in ten different ways: stale memory, wrong file fetched, tool returning the wrong shape, retrieval missing the obvious match, system prompt and user request contradicting each other.
The teams that are pulling ahead with these models aren't the ones with the best prompts. They're the ones with:
- Clean, well-curated repo conventions that the model can actually follow.
- A small number of high-quality tools that compose well.
- Memory and retrieval systems that surface the right thing without drowning the window.
- A culture of treating the model like a colleague: tell it what's true, give it what it needs, then trust it to do the work.
That last one is the underrated piece. A lot of people are still stuck in the prompt-engineering mindset of treating the model as something to be tricked. With 4.7-class models, that posture is actively counterproductive. The best results come from a posture that looks more like onboarding a smart new hire than coaxing a difficult tool.
Closing
Prompt engineering isn't dead. It rotated 90 degrees and became something bigger.
The clever-sentence-writing era is winding down because the models stopped needing clever sentences. The systems-design era is starting, because the models started needing — and rewarding — well-designed surroundings: clean context, sharp tools, durable memory, sensible defaults.
If you spent the last few years getting good at writing prompts: most of that taste transfers. You learned to be specific, to think about ambiguity, to anticipate failure. Those are systems-design skills wearing a costume.
Take off the costume. Build the system.