DEV Community: kavyarani7

Picking Models and Tools

kavyarani7 — Tue, 16 Jun 2026 04:35:50 +0000

SectorFlow Engineering Series · Part 3 of 3 · Read Part 1 first: Token Efficiency in Claude Code

The MCPs we tried, refused, and why — and how we drew the Haiku/Sonnet line.

June 2026 · SectorFlow Engineering

In this series Part 1: Token Efficiency in Claude Code (start here). Part 2: The Skills File Pattern — fixing CLAUDE.md bloat with imports.

Two decisions, same question

Picking a model and picking your tools feel like two different jobs. One is about which APIs you wire up, the other is about which Anthropic model you call. Underneath, though, they're the same question: what's the cheapest way to get the quality of output you need?

They even fail the same way. You reach for more capability than the job needs and pay for it with tokens that could have gone to the task. This covers both, and the reasoning we used on each.

MCPs

MCP servers give Claude Code tools that talk to outside systems. They're useful and they aren't free. Every call costs tokens — the call itself, the schema around it, the parsing of whatever comes back. If the MCP wraps an external API there's a response payload too, and those get big.

We judged every MCP against one question:

Does this give Claude something it genuinely can't do with Bash, Read, or Edit? Or is it just a wrapper around something an engineer could do faster in a terminal?

Most of the ones we dropped failed that.

Linear — tried it, dropped it for coding sessions

We hooked up the Linear MCP early because it seemed like a no-brainer: Claude reads the ticket, pulls the acceptance criteria, gets to work. The math didn't hold up.

Operation	Token cost	Human alternative
`list_issues`	~3,500 tokens	Engineer opens Linear: 5 seconds
`get_issue` (single)	~1,200 tokens	Engineer copies acceptance criteria: 30 seconds
Mark Done + post comment	~4,300 tokens	Engineer clicks Done: 10 seconds
Full ticket workflow via MCP	~9,000 tokens	Total human time: ~45 seconds

The engineer spending 45 seconds costs zero tokens. Claude spending 9,000 on the same clicks is 9,000 fewer for the implementation. Over 60-plus tickets that's 7 or 8 full context windows you've handed back to code.

Linear is still installed — we use it for project-level stuff outside coding sessions. But during a session the engineer reads the ticket and pastes the part Claude needs.

The tool that makes the paste painless

"Just paste the acceptance criteria" sounds clean until you watch someone do it. They copy the whole ticket — priority, reporter, four comments and all — because pulling out only the useful parts by hand is fiddly and easy to get wrong. So the discipline slips. The fix wasn't a stricter rule, it was a small tool.

It's one static HTML file, no backend. You copy the Linear ticket as Markdown, paste it in, and it does two things. It throws out the fields Claude never needs — title, priority, type, status, reporter, assignee, dates, comments — and keeps the ones it does: acceptance criteria, files affected, design notes, out of scope. Problem/goal and context are there as toggles too, off by default, for the tickets where they earn their place.

Then it wraps what's left in the task format from workflow.md and adds the session rules on the end. Out comes something like this:

## Task: SF-62 — ETF flow pulse in morning briefing
Branch: kavyarani7/SF-62-etf-flow-pulse (already created — do not run git checkout)

### Files to create or modify
[ files section from the ticket ]

### What to build (acceptance criteria)
[ acceptance criteria ]

### Do NOT build
[ out-of-scope items ]

### Design constraints
[ design notes ]

### Rules for this session
- Read ONLY the files listed above — nothing else
- Do NOT run git commands — I handle all git operations
- Do NOT touch Linear — I mark Done and post comments
- Do NOT start the dev server — I do browser testing
- When done, report in the short DONE format from CLAUDE.md

Two assumptions are baked in, and they match how we run every ticket:

The branch already exists. You create it before opening Claude Code, so the prompt just tells Claude the branch is there and not to run git checkout. One ticket, one branch, one fresh session — the work stays isolated.
The prompt carries enough of the ticket that Claude has no reason to go reading files. That's what makes the "read only these files" rule stick: if the context it needs is already in front of it, it won't wander into the codebase hunting for background.

The generated prompt runs a few hundred tokens. Against the ~9,000 the full MCP workflow costs in the table above, that's the difference between spending your context on the ticket and spending it on the actual work.

None of this is clever. It's just that a step people have to do by hand gets skipped on a busy day, and a step a tool does for them doesn't. That's the whole reason it exists.

The whole tool is one self-contained HTML file — open it in a browser and it runs, no install. There's a live version you can use here: Linear → Claude Code converter. The full source is embedded below:

GitHub — not now, maybe later

The GitHub MCP creates PRs, posts comments, manages labels. We didn't connect it for coding sessions, same logic: checkout, add, commit, push are all fast in a terminal, and the MCP scaffolding doesn't make the code any better.

The difference from Linear is that GitHub is only deferred, not rejected. The day we build an automation pipeline that needs PRs created programmatically at scale, the overhead pays for itself. Creating PRs by hand during ticket work doesn't clear that bar.

Supabase — blocked outright

Supabase is in the plan (ADR-001) but not built yet. The rule is already written for when it lands: Claude never pulls DB schema or checks Supabase state over MCP during a coding session.

This one is about freshness, not cost. The database changes between sessions. If Claude reads the schema at session start, then the engineer alters a table, then the session keeps going — Claude is now working off stale info it thinks is current. Better pattern: the engineer pastes the table definition when a task genuinely needs it.

The ones we kept

MCP	Why we kept it	What it replaces
Claude Preview	Renders the real browser output — no text tool does that	Blind trust that the UI works
Computer Use	Drives native desktop apps — nothing else can	Manual cross-app workflows
Scheduled Tasks	Cron jobs can't be set up with code edits alone	Shell-based scheduling hacks

The pattern is clear enough in hindsight. Keep the MCPs whose value you can't get any other way. Drop the ones wrapping something an engineer does faster with a terminal or a browser tab.

The Haiku / Sonnet line

SectorFlow runs on two models. Where we split them is written into core.md as a hard constraint — not a default, not a starting point to tune later.

Constant	Model	Used for	max_tokens
`DATA_MODEL`	`claude-haiku-4-5-20251001`	All 11 data endpoints — sectors, subsectors, stocks, financial tabs 1–6	600–1,200
`AI_ANALYSIS_MODEL`	`claude-sonnet-4-6`	Stock analysis memo — tab 7 only	3,000

Why Haiku for the data

Every data endpoint returns JSON against a fixed schema. The hard part isn't reasoning, it's following instructions exactly: these fields, this format, these units. That's it.

Haiku is fast, cheap, and completely fine at filling a schema with correct values. And fast matters here — a dashboard drill-down chains three calls, sectors → subsectors → stocks. Haiku does all three in about the time Sonnet takes for one.

With the 24-hour server cache and the 24-hour localStorage cache, each Haiku call gets spread across everyone who looks at that sector inside the window. Per-user cost rounds to nothing.

Why Sonnet for the analysis tab

The analysis memo is a different animal from a data endpoint. It has to:

pull several signals together into a verdict, with a confidence level and a reason behind it;
come up with bull / base / bear cases — target prices, upside, an actual thesis;
catch red flags that would never show up in a structured-data prompt;
and write like research someone would actually read, not a JSON dump with some sentences wrapped around it.

Haiku can fill in a memo-shaped template. Sonnet can write the memo. The gap is visible to the user, and it's the entire reason the analysis tab exists as its own feature. Dropping it to Haiku wouldn't even save much — that tab has a 4-hour cache so it barely gets called. It'd just be worse.

Never downgrade

core.md says it flat out: never sub Haiku in for AI_ANALYSIS_MODEL. Surface the error instead.

This rule is there because the temptation is real, especially mid-debug. A Sonnet call fails, and the quickest thing is to drop in Haiku and see if the request goes through. But then the analysis tab is quietly serving worse output and nobody can tell anything changed. An error you can see forces you to actually fix it.

Same thing in reverse — we don't bump the Haiku data endpoints up to Sonnet for "better quality." Those endpoints have schemas. Better quality there means fixing the schema or the prompt, not throwing a pricier model at what's basically an instruction-following task.

Model strings are facts, treat them like facts

The model IDs live in core.md in a "confirmed model strings" table, with a note: never make up a model string, and if a call 404s, check the table before anything else.

This stops one specific failure: Claude generating a model ID that looks right but isn't — say claude-haiku-4-5 with the date suffix dropped — just because it seemed plausible in context. Every model string in the code has a matching row in that table, confirmed working. Nothing goes into the code until it's in the table first.

What ties both together

MCP choice and model choice come down to the same rule: don't pay for capability you don't need. Connect a tool only when there's no other way to get what it does. Reach for the bigger model only when the work is real synthesis, not schema-filling. The rest is overhead.

What we're still figuring out

This series is the stuff we've actually settled. There's more we're in the middle of, and we'll write it up once we trust the answers:

prompt structure and where the line really sits between the "data" you hand the model and the "instructions" you give it;
pre-warming the cache on server restart versus leaning on web search for freshness — we think deterministic warming wins, and we built it into the SectorFlow automator;
when Claude Code is the wrong tool and a plain API call with a structured prompt is the right one.

We'll post each one when we've got a conclusion we believe.

The Skills File Pattern

kavyarani7 — Tue, 16 Jun 2026 04:33:48 +0000

SectorFlow Engineering Series · Part 2 of 3 · Read Part 1 first: Token Efficiency in Claude Code

How we stopped appending to CLAUDE.md and started importing only what we need.

June 2026 · SectorFlow Engineering

In this series Part 1: Token Efficiency in Claude Code (start here). Part 3: Picking Models and Tools — the MCPs we tried, refused, and why.

The append loop

If you use Claude Code for a while you'll probably land where we did: a CLAUDE.md that got too big and too self-contradictory to trust.

It starts fine. You write down the basics — what the app does, where the server lives, what the API looks like. Then something breaks. Claude picks the wrong model string, or makes a chart without the right color, or ignores a cache rule you mentioned in passing three weeks ago. So you write the rule down and append it. Reasonable.

And it works for a while. Then the file's big, the week-two rules are sitting on top of the week-six rules, and when two of them disagree — they will, because requirements move and old notes go stale — the model either reconciles them into mush or just picks one. You spend an hour debugging output before you realize the rule you thought was active got overridden by something you wrote months ago and forgot about.

The bigger issue: the whole file loads every session, no matter what you're doing. A color fix on the frontend doesn't need the deployment runbook. An API route change doesn't need the chart color rules. Every line that isn't relevant is costing tokens and adding noise.

We thought about how to fix this without just deleting the stuff we'd need again later, and ended up with what we call the skills file pattern. Split the context into separate files by purpose, import only what the task needs.

The structure

CLAUDE.md stops being the rulebook and becomes a table of contents. The actual rules move into separate files under .claude/.

File	What it owns	When it loads
`core.md`	Absolute constraints: model strings, API key rules, cache TTLs, data contracts	Every session — the foundational constraints
`design.md`	Color tokens, chart types, component patterns, loading-state conventions	UI and frontend tasks only
`infrastructure.md`	Hosting config, environment variables, deployment rules, automator wiring	Infrastructure and CI tasks
`workflow.md`	Ticket format, commit rules, token rules, division of labor	Every coding session
`architecture.md`	ADRs, scaling thresholds, rejected technical decisions	Architecture reviews and new integrations

The import syntax in CLAUDE.md is nothing special:

@sector-dashboard/.claude/core.md
@sector-dashboard/.claude/design.md
@sector-dashboard/.claude/workflow.md
(infrastructure.md and architecture.md load only when the task references them)

So a UI task loads core + design + workflow. An infra task loads core + infrastructure + workflow. architecture.md only shows up if someone actually asks about a scaling decision or an ADR. The context stays small and stays on topic.

What goes where

core.md — the constitution

core.md is the one file that loads every single session, so everything in it has to be true and currently enforced. We treat it like a constitution. Stable, authoritative, never wishful.

Two kinds of things go in:

Facts we've verified — model IDs we've confirmed work, endpoint shapes, cache key names, exact token limits per endpoint.
Hard constraints — rules you don't break without a deliberate, written-down decision (where the API key goes, no swapping models, TTL values).

Don't put aspirations in core.md. "We should try to…" or "ideally the model would…" — that's workflow.md material. core.md is only for things that are true right now. The moment you put a hope in there, reality eventually contradicts it and now you've got a lie sitting in your most-trusted file.

design.md — the visual contract

Every hex value, every chart-type assignment, every loading-state pattern goes in design.md. Three reasons it pays off:

A new engineer, or a fresh Claude session, doesn't have to reverse-engineer the visual conventions out of the component code.
When we changed the color logic on the 52-week range bar, we edited one table in design.md. No component files had to change just to record the rule.
We can reject a PR where Claude hardcoded a hex instead of using the documented token, because the rule is written down.

design.md is also where we list what not to touch. TradingViewChart.jsx is wrapper-only. signals.js has its own color system and must not get the new --sf- variables. Those "don't" rules matter as much as the conventions do.

workflow.md — who does what

This is the file that matters most day to day. It spells out what Claude does and what the engineer does, in enough detail that there's no room to wonder. Part 1 has the full reasoning, but the short version is: if the engineer can do it in under two minutes in a terminal, it shouldn't go to Claude through an MCP.

workflow.md also holds the task format we make engineers use when they hand work to Claude:

Required field	Why it exists
`Branch: {name}`	Stops Claude asking or guessing — one less tool call
`Files to CREATE: [list]`	Scopes creation, stops new files getting added speculatively
`Files to MODIFY: [list + one-liner]`	Keeps Claude from reading unrelated files for context
`What to build: [criteria]`	The single source of truth for when it's done
`Do NOT read: [not listed]`	An explicit boundary — absence isn't permission

If any of those fields is missing, Claude asks before it touches a file. That kills the usual failure mode where it reads three or four files "to understand the codebase" before writing a single line.

architecture.md — decisions and why

Architecture decisions go in as ADRs — the decision, what led to it, what it costs us. We record the rejected ones too, and those turn out to be the more useful part.

When some future session asks "why not GraphQL?" or "should we add WebSockets for live data?", the answer is already sitting there: considered, rejected, here's why. Claude doesn't reopen it and neither do we.

The rule that keeps it from rotting

One rule keeps the split files from turning into the same contradictory mess as the old monolith:

One file owns one domain. If a rule could live in two files, it goes in the more constrained one — usually core.md. And if you can't tell which file a rule belongs in, it's probably too vague to be a rule.

In practice: color tokens go in design.md, not core.md. Cache TTLs go in core.md, not architecture.md. Division of labor is workflow.md, not core.md. Once the domains are clear the contradictions jump out — if the same thing shows up in two files, one of them is wrong.

What changed after

Startup token cost down about 60% on a typical task.
Contradictions are findable now — the same topic in two files gets flagged.
New rules have an obvious home, and figuring out the home forces you to be clear about what kind of rule it even is.
You can review the files on their own — a frontend change goes up against design.md and nothing else.

It's not a complicated setup. Six files and a manifest. The work is in holding the one-domain line: the moment a file starts feeling like it's about two things, split it.

Continue reading

Part 3: Picking Models and Tools — the MCPs we tried, refused, and why.

Token Efficiency in Claude Code

kavyarani7 — Tue, 16 Jun 2026 04:31:45 +0000

SectorFlow Engineering Series · Part 1 of 3 · Parent article

Notes on where our context budget was actually going, and what we did about it.

June 2026 · SectorFlow Engineering

In this series Part 2: The Skills File Pattern — fixing CLAUDE.md bloat with imports. Part 3: Picking Models and Tools — the MCPs we tried, refused, and why.

The problem nobody warns you about

Claude Code can do a lot. The catch is that all of it runs on context, and you don't get much of that.

When we started SectorFlow we did the obvious thing. Kept a CLAUDE.md at the repo root, and every time something went wrong — wrong model string, a cache TTL that didn't match, a chart that came out looking off — we'd write a rule and stick it on the end. The file kept growing. We didn't really clock it as a problem until it was one.

By about week six it was 400 lines. Every session loaded the whole thing. Frontend rules sitting next to deployment runbooks sitting next to database decisions, none of it sorted. And because we'd added the rules one at a time over weeks, some of them flatly disagreed with each other. Claude would follow the new one, or the old one, or try to split the difference. We got something wrong either way.

I want to be clear this isn't a Claude Code problem. It's on us, and it's fixable. But fixing it meant we had to stop treating CLAUDE.md like a junk drawer.

The thing that actually hurts isn't the per-token price. It's that every token spent loading context is a token you don't get back for the work. Burn 30,000 on setup and you've got far less room to write code than if you'd burned 5,000. You hit the ceiling partway through a file and whatever you were in the middle of is just gone.

Context is a budget

Here's the shift, and it's simple once you see it: anything Claude reads at the start of a session is something it can't use later for code. Most projects pile everything into CLAUDE.md on the theory that the model might need it someday. We flipped the question. What does the model need for this task? Load that. Skip the rest.

Two rules came out of it:

Precision over completeness. A small context that's right does more for you than a big one trying to cover all the bases.
Load on demand. Structure things so only the relevant part shows up for a given task.

Those turned into three actual practices, and each one gets its own article in this series. This one is just the overview — what we measured and why it matters.

What we actually changed

1. Session startup cost

Every session loads CLAUDE.md plus whatever it imports. Before, that was the one 400-line file, every time, regardless of the task. After we split it into separate skill files, a UI task pulls in core.md (the constraints) and design.md (the visual stuff) and nothing else. An infra task gets core.md and infrastructure.md. Startup cost dropped about 60%.

2. Reading tickets

We had the Linear MCP hooked up so Claude could read tickets itself. Nice in theory. But one list_issues call runs about 3,500 tokens, and the whole read-it / mark-done / comment loop is around 9,000. So now the engineer just pastes the acceptance criteria. That's maybe 400 tokens. The 8,600 difference doesn't sound like much until you multiply it across 60-plus tickets — that's something like 7 or 8 full context windows handed back to the actual work.

3. Reading files it was never asked to read

Left alone, Claude reads files to get its bearings, sometimes three or four of them before it writes a line. So we made a rule: only read files the task names. Need to find a function? grep for it, then view just those lines. Don't open a file to soak up "context." If something's actually missing, ask. Saves 2,000–4,000 tokens on a complex task.

4. Spinning up the dev server to check things

Verifying a change by eye means starting the server, waiting, navigating, screenshotting, evaluating — a whole chain of calls. For anything you can't see in a browser, like server logic or data contracts or route handlers, that chain tells you nothing. So we only do the visual check when the change is something a person could actually see in a browser. For syntax we run node --check. One Bash call.

The numbers

Source of overhead	Before	After	Saving
Session context load	~400 lines, every session	60–120 lines, task-specific	~60%
Ticket ingestion (per ticket)	~9,000 tokens via MCP	~400 tokens via paste	~8,600 tokens
File reads per task	3–5 files speculatively	Named files only	2,000–4,000 tokens
Verification overhead	Dev server + screenshot	`node --check` only	4–6 tool calls

Each of these on its own is fine, nothing dramatic. Put together they change what fits in a session. Stuff that used to take two or three sessions now usually fits in one. That's the whole point.

How the parts fit

The other two articles each take one piece of this:

Part 2, the skills file pattern, is about the startup cost and the contradicting-rules mess. It's where the import-on-demand structure comes from.
Part 3, models and tools, covers the ticket overhead and the file-reading habit, plus which MCPs we said no to and where we drew the line between Haiku and Sonnet.

Read this one first for the why. Then either of the others for the how.

One thing to keep in mind

Claude Code does its best work when the context is small, accurate, and honest about what's actually known versus what you're hoping for. Vague in, vague out. And a context file that tries to cover everything ends up covering nothing properly.

Continue reading

Part 2: The Skills File Pattern — fixing CLAUDE.md bloat with imports.
Part 3: Picking Models and Tools — the MCPs we tried, refused, and why.

We Were Paying 3.75x More Than Necessary on Every AI API Call — Here's How We Found It

kavyarani7 — Mon, 08 Jun 2026 20:45:57 +0000

Our Anthropic bill was higher than expected. Nobody on the team knew exactly why. So we built a scanner and ran it on our own codebase. First thing it found:

What We Found

server/services/divergence-detector.js was using claude-sonnet-4-6 with max_tokens=150 to generate 2-sentence explanations. Every night. On every divergence found.

const response = await client.messages.create({
  model: 'claude-sonnet-4-6',
  max_tokens: 150,
  messages: [{ role: 'user', content: prompt }]
});

Sonnet costs $15/M output tokens. Haiku costs $4/M. For a 2-sentence output there is zero quality difference. We were paying 3.75x more on every single call and nobody noticed.

What We Built

A GitHub Action that catches this automatically on every PR — before it merges.

Works with Anthropic, OpenAI, Gemini, Bedrock, and LangChain. JS and TS supported. Zero dependencies.

Add It in 2 Minutes

- uses: kavyarani7/ai-arch-scanner@v1
  with:
    github_token: ${{ secrets.GITHUB_TOKEN }}
    threshold: '500'

🔗 GitHub Marketplace
🔗 Repo

What's the most expensive AI pattern you've found in your codebase? Drop it in the comments.

Hi, I'm kavyarani7

kavyarani7 — Sat, 08 Apr 2017 10:45:43 +0000

I have been coding for 9 years.

You can find me on GitHub as kavyarani7

I live in Bengaluru.

I work for Ametek.

Nice to meet you.