ayush
Tags

© 2026 Ayush Sharma. Built with care.

All posts
#ai#claude#tools#productivity

The Claude Code Config That Changed How I Work

Running Claude Code out of the box is leaving a lot on the table. Here's the exact setup I use: Opus 4.7, extended thinking, 1M context, and a hook that makes every prompt leaner before it hits the model.

May 11, 2026·8 min read
Dark cover with a green glow accent and white title text

Six months ago I cloned a repo, ran claude, and figured I was using Claude Code. I was, technically. But the gap between the default setup and what I actually run now is the gap between "pretty good" and "I genuinely don't want to go back to the old way."

Here's exactly what changed.

Model: Opus 4.7 is not optional for hard work

The default model is Sonnet. Sonnet is fast, cheap, and for most routine tasks completely fine. I still use it when I'm doing quick lookups or light edits where I know exactly what I want.

But if I'm doing anything where the judgment call matters (security reviews, architectural decisions, debugging gnarly multi-file issues), Sonnet leaves ideas on the table that Opus finds. The difference isn't dramatic on any single interaction. Over a full day of work it compounds into noticeably different outcomes.

Set your default model in ~/.claude/settings.json:

{
  "model": "claude-opus-4-7"
}

You can always override per-session with --model claude-sonnet-4-6 when Sonnet is enough. But if you're defaulting to Sonnet and switching up when things get hard, you're going to forget to switch half the time.

Extended thinking: what "effort high" actually does

Opus 4.7 supports extended thinking, which is exactly what it sounds like. Instead of producing an answer immediately, the model works through the problem in a scratchpad before committing to a response. The output you see looks like it came from a calmer, more thorough version of the same model. That's because it did.

In Claude Code, this maps to the thinking budget: how many tokens the model can burn on internal reasoning before giving you an answer. Setting effort to high lets it use that budget freely on complex tasks.

The tradeoff is tokens and latency. Extended thinking costs more and takes longer. That's a real tradeoff for quick tasks, but for anything genuinely hard, the alternative is getting a faster answer that's wrong and then spending 20 minutes debugging why. Extended thinking wins on total time.

When I'm doing something mechanical, like refactoring a function signature across files, I don't need it. When I'm trying to understand why a race condition appears only under load on a specific message queue pattern, I absolutely do. The model's reasoning trace on hard problems surfaces assumptions I hadn't thought to question.

1M context window: the trap and the opportunity

Opus 4.7's context window is one million tokens. That number sounds like a solution to every "the model doesn't have enough context" problem you've ever had. It's not. It's a solution to the symptom while leaving the disease untreated.

The disease is garbage in. A 1M window doesn't help you if you're stuffing it with noise. It means you can stuff more noise.

What the 1M window actually enables is leaving stuff in context that you'd have had to cut before: long conversation histories, multiple files loaded simultaneously, output from tool calls, git history for a module. You stop having to make triage decisions about what to include. That's genuinely useful.

But the strategy that works: use the window aggressively for relevant content, not exhaustively for all content. Load the files the task actually touches. Keep conversation history for context continuity, not for completeness. Let the model use tools to pull things in on demand rather than pre-loading everything speculatively.

The 1M window and a well-curated CLAUDE.md are the right combination. The file tells the model what's true about your project (conventions, constraints, gotchas) without you re-explaining it every session. The big window means you never have to choose between loading your CLAUDE.md and loading the actual code.

The caveman plugin: token savings from the first word

This one needs some context.

Claude Code supports hooks: shell scripts that fire at specific events in a conversation. One of those events is UserPromptSubmit, which fires before your message reaches the model. You can use it to transform the prompt on the way in.

The caveman plugin is a UserPromptSubmit hook that strips your prompt down to the essentials: no articles (a, an, the), no filler words (just, really, basically, actually), no pleasantries (sure, certainly, of course), no hedging. If you type "Can you please help me figure out why this function is basically just returning null instead of the expected value?", the plugin rewrites that to "Why function return null instead of expected value?" before it hits the model.

That sounds aggressive. It is. But it works for a specific reason: the model doesn't need the pleasantries. "Please" doesn't make it more polite. "Just" doesn't make your request easier to process. These are tokens you're paying for that do zero work.

On any given prompt you might save 5-15 tokens. That sounds irrelevant. Do the math: 80 prompts a day, average 10 tokens saved per prompt, Opus 4.7 pricing, over a month. You're looking at a real number. More importantly, leaner prompts tend to get more focused responses because there's less noise for the model to process alongside the signal.

The plugin also comes with a status line badge so you can see caveman mode is active and a session header notification that shows the current intensity level.

To install: the hooks drop into ~/.claude/plugins/ and you wire them up in your settings.json:

{
  "hooks": {
    "UserPromptSubmit": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "powershell -ExecutionPolicy Bypass -File \"C:\\Users\\user\\.claude\\plugins\\cache\\caveman\\caveman\\hooks\\caveman-hook.ps1\""
          }
        ]
      }
    ]
  }
}

The hook runs on every prompt, takes under 10ms, and is completely transparent in normal use. You forget it's there except when you notice your token usage is lower than it used to be.

The rest of the production config

A few smaller things that make a real difference:

Auto-approve common operations. By default, Claude Code asks for permission on every file write, every bash command, every tool call. Sensible for exploration. For daily work on a project I know well, it's friction. I set auto-approve for read operations, test runs, and writes to files I'm actively working on. My settings.json permissions block reflects what I actually want to be asked about versus what I'm fine with it just doing.

Memory for persistent preferences. Claude Code's memory system lets you save things that persist across sessions: your preferred output style, how you like error messages framed, what you've already decided not to do on a project. The model reads these at the start of each session. Without memory you're re-establishing context from scratch every time. With it, you pick up roughly where you left off.

MCP servers over copy-paste. If you're not using Model Context Protocol servers, you're doing the manual version of something that can be automated. Instead of pasting file contents or API responses into the chat, you give the model a tool that fetches them. It pulls what it needs when it needs it. Token cost goes down, context relevance goes up. I have MCP servers for my most-used internal APIs and documentation sources.

CLAUDE.md, not system prompts. A CLAUDE.md file at the root of your repo beats a system prompt for project-specific context. It version-controls with your code. It gets updated when conventions change. The model reads it at the start of every session automatically.

# Project conventions
 
- Money values stored as integer cents. Decimals corrupt across the GraphQL boundary.
- Result<T, E> for fallible ops, never throw.
- Database queries go through lib/db/ only. No inline SQL.
- The legacy services/billing-v1/ is read-only. Don't touch it.

Twenty lines like this beats a 600-word system prompt every time.

What this setup costs

Honest numbers: running Opus 4.7 with extended thinking on hard tasks is not cheap. If you're a solo developer being disciplined about when you use Opus versus Sonnet, you're probably fine. If you're running Opus on everything including "what does this variable name mean," you'll notice the bill.

The caveman plugin helps on the margin. Disciplined model selection helps more. The real cost control is using extended thinking on the problems that benefit from it and not reflexively on everything else.

After six months, my bill is higher than it was with the default config and my output per hour is also substantially higher. The math works out, at least for me.

The actual point

The default Claude Code setup is conservative because it has to serve everyone from first-time users to teams with complex workflows. If you're reading a post about optimizing your Claude Code config, you're probably not a first-time user.

Opus 4.7 with extended thinking, a leaner prompt pipeline from the caveman plugin, and proper context curation through CLAUDE.md and memory is a meaningfully different experience from the out-of-the-box defaults. Not marginally better. Different enough that going back would feel like writing code in Notepad.

Set it up once. Stop thinking about it. Go build something.

On this page

  • Model: Opus 4.7 is not optional for hard work
  • Extended thinking: what "effort high" actually does
  • 1M context window: the trap and the opportunity
  • The caveman plugin: token savings from the first word
  • The rest of the production config
  • What this setup costs
  • The actual point

Found this useful? Share it, or send a note.

PreviousA Single git push Was All It Took: CVE-2026-3854Next Project Glasswing and the Open-Weights Problem