Anthropic released Claude Opus 4.7 yesterday. The headline feature is hours-long autonomous work without constant course-correction.
I put it to the test on something real. Here's the field report.
The task
A side project of mine had a scheduled-post automation that had been broken for about a week. Posts were going out at wrong times. The root cause could have been timezone handling, API auth, queue state, or all three. I wasn't sure.
I gave Opus 4.7 this:
Find out why scheduled posts are firing at wrong times. Check the scheduler code, the API responses, the timezone handling in the storage layer, the queue state, and the actual fired-post logs. Fix the underlying issue. Don't just patch the symptom.
Then I went to do other things for about 8 hours.
What actually happened
It didn't fix the bug in 8 hours. But it did something more interesting.
It:
- Reproduced the bug reliably.
- Instrumented 4 different layers of the stack with debug logs.
- Wrote a test harness that fired fake posts through the pipeline with controlled timestamps.
- Identified that the bug was actually two bugs, not one — one in my storage layer (naive local timestamps), one in the scheduler (assumed UTC input).
- Proposed a fix for the storage layer that would require a one-time migration of existing data.
- Stopped there and asked me whether the migration was acceptable.
That last bullet is the thing. Earlier Claude releases would either:
- Just charge ahead and migrate (scary)
- Stop way earlier and ask what to do (annoying)
This version did a long stretch of investigative work, correctly identified a decision that needed human input, and parked there. That's the skill level I've been waiting for.
Where it drifted
Three places.
1. Scope creep on the test harness.
I asked for a fix. It wrote a fairly elaborate test suite as part of the investigation. Useful, but not what I asked for, and it ate context window.
2. It over-explained the storage layer bug.
The explanation was correct but long. Three paragraphs when one sentence would have worked. Earlier releases were more concise.
3. It assumed my time zone from context.
I hadn't told it what TZ I was in. It inferred from some filename references. It inferred correctly, but I'd rather be asked than guessed at. This is a tiny thing.
None of these are blockers. They're polish issues.
What didn't drift
The parts that mattered held up.
- It stayed on task for the full session without me re-anchoring it.
- It didn't hallucinate API shapes — when it wasn't sure, it read actual responses.
- It wrote diffs I could read and approve line by line.
- It caught a genuinely subtle second bug that I had missed in my own debugging earlier.
What this actually changes
For me, the jump from Opus 4.6 to 4.7 is the jump from "pair programming assistant" to "junior engineer with better taste than me on average, but who still needs to check in."
That's a qualitative shift.
Before 4.7, I would ask for scoped single-file tasks and stitch the results together. After 4.7, I can give multi-layer tasks that cross files, modules, and domains, and the output is coherent.
The work I used to do — context stitching, keeping state across sessions, reminding the AI what we were doing — that work got cheaper this week.
What it still can't do
- Decide whether a product is worth building
- Decide what a feature should feel like
- Pick which bug is worth fixing first
- Explain to a user why their workflow is wrong
The interesting part is still mine.
Practical takeaway
If you're already using Claude Code or Claude through an editor:
- Give it bigger tasks than you used to. It can handle them.
- Still ask it to stop before destructive actions. That behavior is there now, but it's still polite to scope it.
- Don't bother with "think step by step" — that's implicit at this level.
- Read the diffs. You'll learn things.
If you weren't using Claude before: this is probably the release worth trying.
Eight hours of autonomous work, two real bugs found, one fix migration decision parked for me. Net positive. Still tired.
Top comments (0)