DEV Community

RAXXO Studios
RAXXO Studios

Posted on • Originally published at raxxo.shop

The 1M Context Window Actually Changes How I Code

  • Loaded my entire 12-repo studio into one session and stopped re-explaining context to the model every hour

  • Ran a seven-hour debug across Shopify, Vercel, and my blog pipeline without resetting the thread once

  • Dropped three full API docs in one prompt and got a direct answer on which auth pattern matched my stack

  • Prompt caching drops the math to roughly 1.20 EUR per loaded session, which is why I now keep threads open for days

Opus 4.7 shipped with a 1 million token context window, and after three weeks of living inside it I can say the part that actually changed is not the spec. It is the habits. I stopped curating context. I stopped starting new threads. I stopped summarizing what the model should already know. The 1M context window is less about "bigger prompts" and more about keeping an entire working memory open while I do real work across a dozen repos.

This is the article I wanted to read before I changed my setup. Four things that actually shifted, plus a few places where the window still lets me down.

Loading a whole monorepo into one session

My studio is 12 active repos. Shopify theme, marketing site, blog syndication engine, four product landing pages, a design system, a shared tokens package, and assorted scripts I refuse to throw away. Before 1M, asking an architecture question meant picking which three files felt most relevant and hoping the model could infer the rest. It usually could not.

Now I paste everything. Literally everything.


# Flatten the studio into one blob, skip node_modules and build output
find ~/CLAUDE/RAXXOSTUDIOS -type f \
  \( -name "*.ts" -o -name "*.tsx" -o -name "*.liquid" \
     -o -name "*.md" -o -name "*.json" -o -name "*.sh" \) \
  -not -path "*/node_modules/*" \
  -not -path "*/.next/*" \
  -not -path "*/dist/*" \
  | xargs -I {} sh -c 'echo "=== {} ==="; cat "{}"' \
  > /tmp/studio.txt

wc -w /tmp/studio.txt
# 612000 words, roughly 820k tokens

Enter fullscreen mode Exit fullscreen mode

820k tokens fits. That is the whole studio as flat text, with file headers so the model knows what came from where. I paste it, I ask "where am I duplicating section schema logic between the Shopify theme and the marketing site", and I get an answer that references files in four different repos with line-level accuracy.

The specific unlock is dependency tracing. If I rename a token in the design system, I can ask which files across every repo still reference the old name. I used to grep, then check usage, then decide. Now the model holds the whole graph and tells me the blast radius before I touch anything.

A few rules that make this work:

  • File headers matter. Plain cat concatenation without === path === markers turns into mush.

  • Put the config and tokens files last. Recency bias is still real past 500k.

  • If I am asking about a specific subsystem, I drop that subsystem's files twice (once in its real position, once at the end as "focus area").

  • Exclude lock files and generated artifacts. A 400k token pnpm-lock.yaml burns budget and teaches the model nothing.

This is the setup I use for refactor planning. The model traces what a change touches, lists the files, groups them by risk, and I decide what to ship. Last week I renamed one CSS variable across the design system. The model gave me a 34-file change list grouped by "safe, automatic rename" and "manual review, this one uses the old name as a string literal for a theme fallback". I ran the safe batch with sed, reviewed the four ambiguous ones by hand, and shipped in an hour. The old flow would have been a day of grep-and-verify.

Multi-day debug sessions without restarting the thread

The second shift is stranger, because it is about what I stopped doing rather than a new thing I started.

I used to start fresh threads constantly. A bug in the blog syndication engine was its own thread. A Vercel env pull failure was a different thread. A Shopify theme deploy issue was a third. Each one meant rebuilding context: here is my stack, here is my repo layout, here is the error, here is what I already tried. Twenty minutes of setup before any actual debugging.

Now I keep one thread open for days. The blog syndication thread I opened on a Tuesday morning was still the thread I used Thursday evening when a completely different bug showed up in the same pipeline. The model still had the mental model of the system. It remembered that I use safe-env-pull.sh instead of raw vercel env pull. It remembered the retry logic we already tried. It remembered which errors were red herrings.

What this actually feels like:

  • Hour 1: paste the full syndication repo, describe the symptom, start debugging.

  • Hour 3: fixed the first bug, moved on to a different failure in the same pipeline.

  • Day 2: new error, paste the stack trace, the model recalls the earlier fix and rules it out.

  • Day 3: architecture question about whether to refactor the retry layer, the model answers with reference to specific code it read on day 1.

The cost math that makes this sustainable is prompt caching. The first load of the repo is expensive. Every follow-up reads from cache at a tenth the price.


import anthropic

client = anthropic.Anthropic()

# First call loads the whole studio. ~820k input tokens, ~4.10 EUR uncached
# With caching, every follow-up reads from cache and drops to ~0.41 EUR per turn
response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4000,
    system=[
        {
            "type": "text",
            "text": open("/tmp/studio.txt").read(),
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[
        {"role": "user", "content": "Where does the syndication retry logic live, and why is it retrying twice on 429s?"}
    ],
)

Enter fullscreen mode Exit fullscreen mode

A fully loaded day-long session lands around 1.20 EUR in my Anthropic bill. That is cheaper than the coffee I drink while the session runs, and it replaces the thirty minutes a day I used to spend re-explaining my stack to a fresh thread.

The second-order effect I did not expect: the model gets more opinionated as the thread goes on. On hour one it gives me three options with tradeoffs. By hour five it knows which option I will pick based on patterns in my earlier code, and it just picks. Sometimes that is wrong and I course-correct. More often it saves me the "which of these three" conversation I was going to have anyway.

Comparing long docs side by side

Before 1M, if I wanted to pick between three auth strategies from three different vendors, I either read the docs myself (hours) or fed them in one at a time and asked the model to remember (never worked). Now I paste all three full doc sets in one prompt and ask a direct question.

Last week I was deciding between Shopify's app bridge OAuth, a custom session token flow, and a third-party auth layer. All three vendor docs, dropped in at once, total around 180k tokens. One question: "which of these three fits a single-tenant admin app where I am the only user, and what is the simplest implementation path".

The answer cited specific sections from each doc, ruled out option two on a detail buried in a sidebar, and gave me the actual code shape for option one. That is the kind of answer that normally takes me half a day of tab-juggling.

The same pattern works for research. I read a 400-page technical book in one shot last month. Dropped the whole PDF (converted to text, 290k tokens) into a thread, then spent the week asking it questions while I worked on a related project. Not "summarize this book". Specific questions tied to specific code I was writing. The model could pull a concept from chapter 14 and tie it to a pattern mentioned in chapter 3 without me having to remember either chapter existed.

This is the use case that people underrate. Most of my day is not "generate code from scratch". It is "I have a thing in front of me, I have a reference in my head, help me reconcile them". The 1 million token context window makes the reference actually sit in the model's head instead of mine.

One more pattern I did not expect to use often: contract review. When I onboard a new tool or sign up for a service, I paste the full terms, the pricing page, and the API docs into one thread. Then I ask specific questions about data retention, rate limits, and what happens if I exceed a quota. I get a direct answer with citations from the actual document text, not a generic summary from the model's training data. This has caught two pricing gotchas this year that I would have missed reading the docs linearly.

Where it still breaks, and the habits that keep me honest

The window is not magic. Retrieval quality degrades past roughly 700k tokens. I have watched the model miss things at token 850k that it would have nailed at token 400k. The needle-in-haystack benchmarks look clean, but real code has structure and ambiguity, and past 700k the signal gets noisier.

What I actually do about it:

  • Keep the focus material in the last 100k tokens of the prompt. The model weights recency, so I use that.

  • If I am doing dependency tracing, I put the target files both in their natural position and again at the end with a === FOCUS AREA === header.

  • I still use grep first for simple lookups. "Find every file that imports legacyTokens" is a grep, not a model call.

  • I reset the thread when the topic actually shifts. Same pipeline, same thread. New domain, new thread.

RAG and embeddings are not dead. For genuinely huge corpora (my 173-article blog archive, the remind.me design system docs, anything that grows past a few million tokens) I still want retrieval on top of a vector store. The 1M window wins when I want everything relevant in one pass with no retrieval latency and no chunk boundaries cutting a function in half. It loses when the corpus is genuinely larger than the window.

The mental model I landed on: treat the 1M window as working memory, not storage. I load what I am actively working on. I keep it loaded while the work is active. I flush when the work changes. Storage lives in the filesystem and the vector store, same as before.

Bottom Line

The 1 million token context window changed three things about how I code. I keep threads open for days instead of hours. I load whole repos instead of curated file lists. I compare long documents directly instead of summarizing them first.

The habits I dropped matter more than the ones I picked up. I stopped starting new threads on every topic shift. I stopped manually picking which files to include. I stopped writing "here is the context" paragraphs at the top of every prompt. Those were all workarounds for a constraint that no longer exists in my day-to-day.

The real cost is not the API bill. It is the discipline to actually use the window instead of defaulting to the old habits. Three weeks in, I am still catching myself about to open a fresh thread for a simple question. Then I paste it into the existing thread, the model answers with full context, and I remember why I made the switch in the first place.

Top comments (0)