Originally published at thoughts.jock.pl
Sonnet 4.6 dropped with a 1 million token context window, improved coding, same price as 4.5. I ran two experiments the same day.
Experiment 1: Loading 6 Months of My Writing Into It
I took 16 unpublished drafts. About 24,000 words across six months. Everything I'd been thinking about but hadn't published yet. Loaded it all in plus my published post titles.
Then I asked: "What am I really trying to say?"
The model came back with something I wasn't expecting. It identified an underlying anxiety running through all of it -- finding something that creates both empowerment and uncertainty about whether it's valid.
It picked up on my ADHD. I'd mentioned it exactly twice in passing across the whole corpus. It connected that to patterns in how I write about control, systems, and needing external structure.
It said the agent I'm building might be "a coping mechanism that happens to also be commercially interesting."
That's... accurate. I didn't love hearing it but I couldn't argue with it.
Experiment 2: Blind Coding Evaluation
Set up three models without revealing which was which: Sonnet 4.6, Claude 3 Haiku, GPT-4o mini.
Ran coding challenges. The most interesting result was in SQL review.
Sonnet 4.6 flagged a missing field in a GROUP BY clause that would cause the query to fail on PostgreSQL. The others suggested index optimization and performance improvements. Technically correct. But they missed the bug that would break the query entirely.
That's the difference between a tool that improves code and one that actually reads it.
The Bigger Point
The 1M token context doesn't just mean "faster." It means you can ask different questions entirely.
Feeding in all your drafts and asking what you actually think. Feeding in a full codebase and asking what the actual problem is. That's a different category of interaction.
Want to run experiments like this with your own agent setup? My Claude Code Workshop covers the practical mechanics.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.