Saurabh Upadhyay

Posted on May 17

The Claude Code Regression Rerouted My Flutter Workflow. The 4-Tool AI Stack I Use Now.

#flutter #ai #claude #developer

Two regressions in six weeks. I'm glad I didn't go all-in on one tool. Here's what I do instead — and what 22 commits in a single session looks like.

I almost missed the shift.

For a while, I was doing what everyone was doing — using free AI for emails, code snippets, the occasional Ghibli image. The whole market ran on free. Then it got saturated. Everyone was generating the same images, writing the same emails, asking the same questions. The tools started feeling identical.

Then I wanted to build something real.

Not fix a bug. Not autocomplete a function. I wanted to take a project — Sanatan Guide, a Flutter app I'm building — from idea to shipped feature. Research, validation, PRD, technical architecture, folder structure, clean code, design system, typography, every screen, every edge case. The whole thing.

No AI I'd used could do that. They'd solve the small thing in front of them and stop.

Claude didn't stop.

That's when I paid ₹2,000 for Claude Pro. First real subscription of my life, outside utility bills. And it was obvious the moment I did it.

What actually happened in that first real session

I'm building Sanatan Guide because a lot of us — even people raised in the tradition — don't know where to start. There's too much text, no clear entry point, no way to actually understand what you're reading. The app is meant to fix that, with AI-integrated explanations and eventually recitations and personal commentaries.

Last week I ran a Claude session to close out three UI complaints that had been open for too long: navigation entry points were broken, Credits screen wasn't uniform, Feedback screen was missing.

One session. Here's what actually happened:

Session result — 22 commits on main, 213 tests green, analyzer clean

Column 1: S1 ✅ — Navigation keystone
State: Home/Library topbar + 5-item overflow; bottom nav scoped to 3 tabs;
/feedback+/chat routes; 4 screens out of nav shell; latent /library bug fixed

Column 1: S2 ✅ — Credits redesign
State: heritage spec — श्रद्धा hero, sūtra-numbered domain sections,
Tools section, BindingLine + Bṛhadāraṇyaka footer, fade-up

Column 1: S3 ✅ — Feedback screen
State: new /feedback — pick-kind → compose → mailto; replaced stub

All three of your original complaints are closed.

22 commits. 213 tests green. Analyzer clean. Three screens done.

That used to take me 6–8 months to build a full feature set. Now it's 2–3. The reduction isn't from typing faster. It's from not being the only person doing every job.

I'm the PM. Claude is the team.

The regression I almost missed

Here's the thing I want to be honest about: I was also considering Claude Code — the CLI product, the agentic layer on top of the raw API.

And then I read Anthropic's April 23 postmortem.

Three product-layer bugs had been degrading Claude Code quality for six weeks before fixes shipped:

1. Reasoning effort downgrade (March 4 → reverted April 7). Default reasoning switched from high to medium to reduce UI latency. Result: shallower suggestions, more "simplest fix" behavior. If you were using Claude Code for complex Flutter architecture decisions during this window, you were getting medium-quality reasoning and probably didn't know it.

2. Caching bug (March 26 → fixed April 10). Chain-of-thought was being pruned from idle sessions every turn instead of only after idle gaps. Claude appeared to forget its own reasoning mid-conversation — what felt like the model "losing the thread" was actually a caching defect.

3. Verbosity instruction (April 7 → reverted April 20). A "shorter is better" instruction caused a 3% drop in evals. Shorter outputs aren't smarter outputs.

All three were fixed in v2.1.116. The raw API was never affected — only the product layer (the CLI harness, system prompts, caching logic).

Then on May 11, Claude Opus 4.7 shipped. Within days, developers started calling it a regression versus 4.6. CLAUDE.md files tuned for 4.6 started producing verbose, hedged output.

Two regressions. Six weeks apart. Two different product layers.

This is how hosted LLM products work. Vendor benchmarks lag user-observed quality by weeks. Working developers notice first, quietly absorb the degradation, and eventually someone writes the postmortem.

If your entire daily workflow runs through one product layer, you're betting the next six months on that layer not regressing. That bet just lost twice.

The 4-tool Flutter AI stack I actually use

This isn't a recommendation. It's what I use, with the specific reasoning behind each choice.

Tool 1: Claude (raw — via web or API)

This is my primary. Not Claude Code CLI — raw Claude, either the web UI for interactive sessions or the API when I'm building things that call it programmatically.

The distinction matters: the raw API was untouched by both regressions. When the CLI had the caching bug and the verbosity instruction, claude-opus-4-6 via API was still performing normally.

What I use it for in Flutter:

Architecture decisions. "Here's my folder structure and state management approach, here's the feature I want to add — what breaks and why?"
The /advisor command (built into Claude's interface) — it automatically orchestrates heavy reasoning with Opus and then hands off execution to Sonnet. Genuinely useful for saving tokens on long sessions. Worth knowing about.
Full-session project drives like the Sanatan Guide session above — where I load the plan, the specs, and let Claude commit and test across multiple files.

Cost: ₹2,000/month (Claude Pro). Worth it because 6–8 month builds now take 2–3.

Tool 2: Cursor (daily driver, IDE)

For daily Dart edits, inline completions, small refactors inside a file — Cursor is faster than context-switching to a chat window.

Composer 2 (shipped March 2026) handles multi-file Flutter widget refactors reasonably well. The key thing I do: maintain a .cursor/rules file that specifies my state management (I'm using a combination of Riverpod and MobX depending on the feature) and my folder conventions. Cursor honors it per-session without you having to re-explain.

What I use it for in Flutter:

Tab autocomplete while writing Dart — fastest in class
Inline edits on a widget I'm actively touching
Quick refactors that don't cross too many files

Cost: My employer's setup for work code. For personal projects like Sanatan Guide, the free tier covers most of what I need.

Tool 3: Minimax

I recently started using Minimax and it's been a surprise. The response quality is close to Claude Sonnet — not Opus, but Sonnet.

My pattern now: make the heavy decision with Opus (architecture, tradeoffs, "should I even do this?"), then give the full plan to either Sonnet or Minimax to operate on. For the execution phase — writing the code, implementing the spec — the quality difference between Sonnet and Minimax is small enough that cost becomes the deciding factor.

What I use it for in Flutter:

Running the plan once the architecture is decided
Parsing large outputs — when flutter build --verbose produces 200 lines and something is broken, Minimax handles the full log cheaply
Anything where I've already done the thinking and just need the execution

Cost: Free tier or very cheap pay-as-you-go. Significantly cheaper than running everything through Opus.

Tool 4: Gemini (free tier fallback)

60 requests/min, 1000/day on the free tier (verify current limits before you depend on this). For anything low-stakes where free is fine, this covers it.

The thing Gemini does better than expected: vision. Screenshot a Flutter UI bug — misaligned Padding, wrong SizedBox size, color not matching the design — and ask "why doesn't this look right?" It works. Not perfect, but useful enough that I reach for it when I'm in iteration mode on UI and don't want to burn Opus credits on pixel-pushing.

Cost: ₹0 for my current usage volume.

The two-engine fallback principle

The actual rule I follow:

Never let one AI tool be the only thing that can do your critical work.

For any task category that matters — architecture review, multi-file refactors, debugging complex async flows — I keep two tools I can switch between without losing 4 hours. When one regresses or rate-limits or just gives me a bad week, I switch primary duty to the other.

The cost of running two subscriptions is much smaller than the cost of one bad week when your only tool degrades silently.

Where this approach falls apart

Honest section, because every article should have one.

Cognitive overhead is real. Knowing which tool to reach for is a skill that takes time to develop. If you're new to AI-assisted development, going deep on one tool first is the right call. The multi-tool stack is for after you understand what each tool is actually good at.

Context loss between tools. There's no shared memory. When you switch from Claude to Cursor mid-session, you pay a re-prompting tax. Not huge, but real.

₹2,000/month is not nothing. That's a real expense in India. I justified it because what used to take 6–8 months now takes 2–3 — the time saved is worth more than the subscription. But do your own math. The Gemini + Minimax free tiers cover a lot if you're not ready to pay yet.

This is overkill for a greenfield side project. If you're building something at 11 PM just to ship it, pick one tool and go. The multi-tool setup is for working developers managing production codebases where a bad week actually costs something.

TL;DR

Claude Code had three product-layer regressions March–April 2026 (all fixed in v2.1.116). The raw API was unaffected.
Opus 4.7 shipped May 11 and developers immediately called it a regression vs 4.6.
Two regressions in six weeks. Single-tool commitment is the bug.
My stack: Claude (raw) for architecture + long sessions, Cursor for daily Dart editing, Minimax for plan execution, Gemini as free-tier fallback.
The /advisor command in Claude automatically routes heavy reasoning to Opus and execution to Sonnet. Saves tokens on long sessions.
One Claude session on Sanatan Guide: 22 commits, 213 tests green, analyzer clean. That's what "Claude as team, me as PM" looks like in practice.

If you're all-in on a single AI coding tool right now, you're betting the product layer doesn't regress for the next six months. The last six weeks say that bet is bad. What's your fallback when your primary tool ships a degraded week — or do you not have one?

Article 2 in this series: the .cursor/rules and CLAUDE.md files I actually use for Flutter dev — with the real files attached.

canonical_url: https://saurabh7973.hashnode.dev/the-claude-code-regression-rerouted-my-flutter-workflow-the-4-tool-ai-stack-i-use-now?utm_source=hashnode&utm_medium=feed