kavyarani7

Posted on Jun 16

Picking Models and Tools

#ai #claude #llm #devtools

SectorFlow Engineering Series · Part 3 of 3 · Read Part 1 first: Token Efficiency in Claude Code

The MCPs we tried, refused, and why — and how we drew the Haiku/Sonnet line.

June 2026 · SectorFlow Engineering

In this series Part 1: Token Efficiency in Claude Code (start here). Part 2: The Skills File Pattern — fixing CLAUDE.md bloat with imports.

Two decisions, same question

Picking a model and picking your tools feel like two different jobs. One is about which APIs you wire up, the other is about which Anthropic model you call. Underneath, though, they're the same question: what's the cheapest way to get the quality of output you need?

They even fail the same way. You reach for more capability than the job needs and pay for it with tokens that could have gone to the task. This covers both, and the reasoning we used on each.

MCPs

MCP servers give Claude Code tools that talk to outside systems. They're useful and they aren't free. Every call costs tokens — the call itself, the schema around it, the parsing of whatever comes back. If the MCP wraps an external API there's a response payload too, and those get big.

We judged every MCP against one question:

Does this give Claude something it genuinely can't do with Bash, Read, or Edit? Or is it just a wrapper around something an engineer could do faster in a terminal?

Most of the ones we dropped failed that.

Linear — tried it, dropped it for coding sessions

We hooked up the Linear MCP early because it seemed like a no-brainer: Claude reads the ticket, pulls the acceptance criteria, gets to work. The math didn't hold up.

Operation	Token cost	Human alternative
`list_issues`	~3,500 tokens	Engineer opens Linear: 5 seconds
`get_issue` (single)	~1,200 tokens	Engineer copies acceptance criteria: 30 seconds
Mark Done + post comment	~4,300 tokens	Engineer clicks Done: 10 seconds
Full ticket workflow via MCP	~9,000 tokens	Total human time: ~45 seconds

The engineer spending 45 seconds costs zero tokens. Claude spending 9,000 on the same clicks is 9,000 fewer for the implementation. Over 60-plus tickets that's 7 or 8 full context windows you've handed back to code.

Linear is still installed — we use it for project-level stuff outside coding sessions. But during a session the engineer reads the ticket and pastes the part Claude needs.

The tool that makes the paste painless

"Just paste the acceptance criteria" sounds clean until you watch someone do it. They copy the whole ticket — priority, reporter, four comments and all — because pulling out only the useful parts by hand is fiddly and easy to get wrong. So the discipline slips. The fix wasn't a stricter rule, it was a small tool.

It's one static HTML file, no backend. You copy the Linear ticket as Markdown, paste it in, and it does two things. It throws out the fields Claude never needs — title, priority, type, status, reporter, assignee, dates, comments — and keeps the ones it does: acceptance criteria, files affected, design notes, out of scope. Problem/goal and context are there as toggles too, off by default, for the tickets where they earn their place.

Then it wraps what's left in the task format from workflow.md and adds the session rules on the end. Out comes something like this:

## Task: SF-62 — ETF flow pulse in morning briefing
Branch: kavyarani7/SF-62-etf-flow-pulse (already created — do not run git checkout)

### Files to create or modify
[ files section from the ticket ]

### What to build (acceptance criteria)
[ acceptance criteria ]

### Do NOT build
[ out-of-scope items ]

### Design constraints
[ design notes ]

### Rules for this session
- Read ONLY the files listed above — nothing else
- Do NOT run git commands — I handle all git operations
- Do NOT touch Linear — I mark Done and post comments
- Do NOT start the dev server — I do browser testing
- When done, report in the short DONE format from CLAUDE.md

Two assumptions are baked in, and they match how we run every ticket:

The branch already exists. You create it before opening Claude Code, so the prompt just tells Claude the branch is there and not to run git checkout. One ticket, one branch, one fresh session — the work stays isolated.
The prompt carries enough of the ticket that Claude has no reason to go reading files. That's what makes the "read only these files" rule stick: if the context it needs is already in front of it, it won't wander into the codebase hunting for background.

The generated prompt runs a few hundred tokens. Against the ~9,000 the full MCP workflow costs in the table above, that's the difference between spending your context on the ticket and spending it on the actual work.

None of this is clever. It's just that a step people have to do by hand gets skipped on a busy day, and a step a tool does for them doesn't. That's the whole reason it exists.

The whole tool is one self-contained HTML file — open it in a browser and it runs, no install. There's a live version you can use here: Linear → Claude Code converter. The full source is embedded below:

GitHub — not now, maybe later

The GitHub MCP creates PRs, posts comments, manages labels. We didn't connect it for coding sessions, same logic: checkout, add, commit, push are all fast in a terminal, and the MCP scaffolding doesn't make the code any better.

The difference from Linear is that GitHub is only deferred, not rejected. The day we build an automation pipeline that needs PRs created programmatically at scale, the overhead pays for itself. Creating PRs by hand during ticket work doesn't clear that bar.

Supabase — blocked outright

Supabase is in the plan (ADR-001) but not built yet. The rule is already written for when it lands: Claude never pulls DB schema or checks Supabase state over MCP during a coding session.

This one is about freshness, not cost. The database changes between sessions. If Claude reads the schema at session start, then the engineer alters a table, then the session keeps going — Claude is now working off stale info it thinks is current. Better pattern: the engineer pastes the table definition when a task genuinely needs it.

The ones we kept

MCP	Why we kept it	What it replaces
Claude Preview	Renders the real browser output — no text tool does that	Blind trust that the UI works
Computer Use	Drives native desktop apps — nothing else can	Manual cross-app workflows
Scheduled Tasks	Cron jobs can't be set up with code edits alone	Shell-based scheduling hacks

The pattern is clear enough in hindsight. Keep the MCPs whose value you can't get any other way. Drop the ones wrapping something an engineer does faster with a terminal or a browser tab.

The Haiku / Sonnet line

SectorFlow runs on two models. Where we split them is written into core.md as a hard constraint — not a default, not a starting point to tune later.

Constant	Model	Used for	max_tokens
`DATA_MODEL`	`claude-haiku-4-5-20251001`	All 11 data endpoints — sectors, subsectors, stocks, financial tabs 1–6	600–1,200
`AI_ANALYSIS_MODEL`	`claude-sonnet-4-6`	Stock analysis memo — tab 7 only	3,000

Why Haiku for the data

Every data endpoint returns JSON against a fixed schema. The hard part isn't reasoning, it's following instructions exactly: these fields, this format, these units. That's it.

Haiku is fast, cheap, and completely fine at filling a schema with correct values. And fast matters here — a dashboard drill-down chains three calls, sectors → subsectors → stocks. Haiku does all three in about the time Sonnet takes for one.

With the 24-hour server cache and the 24-hour localStorage cache, each Haiku call gets spread across everyone who looks at that sector inside the window. Per-user cost rounds to nothing.

Why Sonnet for the analysis tab

The analysis memo is a different animal from a data endpoint. It has to:

pull several signals together into a verdict, with a confidence level and a reason behind it;
come up with bull / base / bear cases — target prices, upside, an actual thesis;
catch red flags that would never show up in a structured-data prompt;
and write like research someone would actually read, not a JSON dump with some sentences wrapped around it.

Haiku can fill in a memo-shaped template. Sonnet can write the memo. The gap is visible to the user, and it's the entire reason the analysis tab exists as its own feature. Dropping it to Haiku wouldn't even save much — that tab has a 4-hour cache so it barely gets called. It'd just be worse.

Never downgrade

core.md says it flat out: never sub Haiku in for AI_ANALYSIS_MODEL. Surface the error instead.

This rule is there because the temptation is real, especially mid-debug. A Sonnet call fails, and the quickest thing is to drop in Haiku and see if the request goes through. But then the analysis tab is quietly serving worse output and nobody can tell anything changed. An error you can see forces you to actually fix it.

Same thing in reverse — we don't bump the Haiku data endpoints up to Sonnet for "better quality." Those endpoints have schemas. Better quality there means fixing the schema or the prompt, not throwing a pricier model at what's basically an instruction-following task.

Model strings are facts, treat them like facts

The model IDs live in core.md in a "confirmed model strings" table, with a note: never make up a model string, and if a call 404s, check the table before anything else.

This stops one specific failure: Claude generating a model ID that looks right but isn't — say claude-haiku-4-5 with the date suffix dropped — just because it seemed plausible in context. Every model string in the code has a matching row in that table, confirmed working. Nothing goes into the code until it's in the table first.

What ties both together

MCP choice and model choice come down to the same rule: don't pay for capability you don't need. Connect a tool only when there's no other way to get what it does. Reach for the bigger model only when the work is real synthesis, not schema-filling. The rest is overhead.

What we're still figuring out

This series is the stuff we've actually settled. There's more we're in the middle of, and we'll write it up once we trust the answers:

prompt structure and where the line really sits between the "data" you hand the model and the "instructions" you give it;
pre-warming the cache on server restart versus leaning on web search for freshness — we think deterministic warming wins, and we built it into the SectorFlow automator;
when Claude Code is the wrong tool and a plain API call with a structured prompt is the right one.

We'll post each one when we've got a conclusion we believe.

DEV Community