DEV Community: speedy_devv

Claude for Small Business

speedy_devv — Mon, 18 May 2026 19:27:09 +0000

Anthropic announced Claude for Small Business on May 13, 2026, and the headlines made it sound like a new SKU.

It is not. There is no new pricing tier. There is no separate plan to buy.

It is a feature toggle inside Claude Cowork that turns on 15 prebuilt workflows wired to QuickBooks, PayPal, HubSpot, and a few other tools you probably already pay for. If you have Claude Pro at $20 per month, you already have it. Open the desktop app, find Cowork, flip the switch.

That is the whole product. The rest of this post is what it actually does, who it fits, an
d where it falls short.

Why the confusion is fair

Anthropic ran a real press cycle. TechCrunch, Axios, 9to5Mac. They kicked off a 10-city Claude SMB Tour in Chicago on May 14. The launch page opens with a stat about small businesses being 44 percent of US GDP. That looks like a SKU launch.

The product is real. The pricing line is just not what people assumed.

If you were searching for "claude for small business pricing" expecting a new plan around $50 or $100 per month, the answer is zero dollars on top of your existing Claude subscription.

What the toggle actually does

You sign in with your existing Claude account, open Cowork, and a Small Business mode appears next to your usual chat. That mode loads a library of prebuilt workflows ready to fire against your connected business tools.

Anthropic calls them agentic workflows because each one is an end-to-end job, not a one-shot prompt. Chasing 12 unpaid invoices is one job. Doing the month-end close is another. The workflow knows the steps, asks for the data it needs, and stops at every action that would touch real money or send a message.

The approval gate matters. Per Anthropic and TechCrunch, you initiate each workflow, approve the plan, and sign off before anything gets sent, posted, or paid. Claude only sees what your connected account already lets it see. No new permissions are added on its behalf.

The 15 workflows in plain English

Six categories: finance, operations, sales, marketing, HR, customer service. The named examples from the launch page and Axios coverage:

Payroll planning
Monthly close
Cash-flow analysis
Month-end prep
Tax organization
Campaign management
Lead triage
Invoice chasing
Contract review

Underneath sit 15 reusable skills. Skills are smaller pieces a workflow can call. Margin analysis, content strategy, and business pulse reporting are the three Anthropic names by name. The rest are not enumerated on the public launch page.

The split is the same mental model as Claude Code. Workflows are jobs. Skills are tools the jobs use. You stay in the driver's seat.

The integrations

Confirmed by Anthropic's launch page plus TechCrunch, Axios, and Quartz coverage:

QuickBooks for invoices, expenses, P and L, payroll data
PayPal for payments, payouts, transaction lookups
HubSpot for contacts, deals, lead routing, campaigns
Canva for brand assets and marketing templates
DocuSign for contracts and signature requests
Google Workspace for Gmail, Drive, Calendar
Microsoft 365 for Outlook, Excel, OneDrive, Teams

Slack, Square, Stripe, and Webflow show up in supporting coverage. Treat that second set as expected rather than confirmed in the official Anthropic post.

A fair critique making the rounds: the real customer for this launch is QuickBooks. Integration partners get lock-in. Small businesses get features that work as long as they keep paying both Claude and the partner tool. You can decide if that bothers you. It is true either way.

How much it actually costs

Nothing extra. Direct quote from TechCrunch: "no extra charge for Claude for Small Business beyond the cost of Claude licenses and whatever partner tools a business already pays for, such as QuickBooks, PayPal or HubSpot."

You still need the underlying Claude subscription. Today's published prices from claude.com/pricing:

Claude Pro is $20 per month, single seat, the cheapest path to the toggle.

Claude Max is $100 or $200 per month, single seat, with higher usage caps.

Claude Team Standard is $20 per seat per month on annual billing, $25 monthly, five seat minimum, with 1.25x more usage per session than Pro.

Claude Team Premium is $100 per seat per month annual, $125 monthly, five seat minimum, with 6.25x usage and Claude Code included.

Claude Enterprise is custom pricing with SSO, audit logs, and custom retention.

A 15-person HVAC team on Team Standard pays 15 x $20 x 12, so $3,600 per year. Add QuickBooks Online Plus at roughly $1,080 per year for a multi-user seat. You are at $4,680 in software for a fully connected setup before you book a single job. The Small Business toggle adds nothing on top.

Pro vs Max vs Team, in one paragraph

The toggle is the same on every tier. You get the same 15 workflows on Pro that you get on Enterprise. What changes is the underlying plan. Pro fits a solo operator running the workflows alone. Max fits a power user with heavy daily volume. Team Standard fits a 5 to 30 person business that needs a shared workspace. Team Premium fits a 5 to 50 person business that also wants Claude Code seats and 6.25x usage. Enterprise fits 50-plus headcount and regulated industries that need SSO and audit logs.

A higher tier does not unlock more workflows. It buys you room to run more of them, more often, with more people, and with stronger data controls.

What the launch left out

The Hacker News thread on the launch had over 500 points and 450 comments within hours. Three concerns came up over and over.

Privacy. Anthropic states it does not train on Team and Enterprise data by default. Pro is weaker. HN commenters specifically flagged that Claude Cowork is not under Zero Data Retention. If you are about to connect a QuickBooks account with payroll in it, read your tier's privacy terms first.

Compliance. There is no HIPAA Business Associate Agreement for Cowork at launch, and no documented audit trail of every action a workflow took. For regulated industries, that is a hard stop.

Vibecoding fragility. The agentic workflow pattern works when the connected tool API is stable and the data is clean. When QuickBooks throws an error mid-run or HubSpot rate-limits a sync, the workflow can stall in a way the average non-developer cannot debug.

None of this is a reason to skip the toggle. It is a reason to start with low-stakes workflows and expand once you trust what it does.

How to set it up

You need three things. A Claude Pro, Max, or Team subscription. The Claude desktop app. The partner accounts you want to connect.

Then:

Open the Claude desktop app and click into Cowork.
Find the Small Business toggle in settings and turn it on.
Connect each integration one at a time. Each connection runs an OAuth flow.
Open the workflow library and pick one low-stakes job to test, like business pulse reporting.
Read the plan Claude proposes, approve it, and watch the run.
Repeat with the next workflow once you trust the first.

The 10-minute claim from other coverage holds if your partner accounts are already set up. If you are also wiring QuickBooks for the first time, plan an afternoon.

Should you turn it on

Use it if you run a 10-to-100 person business, already pay for QuickBooks or HubSpot, and want to take repetitive back-office work off your team's plate. It is included with the subscription you already pay for. The downside of trying it is approximately zero.

Skip it if you handle HIPAA-regulated data, need a documented audit trail for every action, or run on vertical software that is not in the integration list. Wait for the next release. It will get there.

Automation is the floor. Owning the product is the ceiling.

Claude for Small Business automates the back office. That is the floor.

A 30-person landscaper using the toggle automates invoice chasing in QuickBooks. A 30-person landscaper that wants its own client portal, with scheduling, invoicing, and route optimization built around how the business actually runs, needs to ship custom software. The automation buys back hours. Owning the product changes the business.

If a freelancer once quoted you $15,000 and three months for a small custom app and you put the idea on a shelf, that is the gap to close next. Same Claude account. Same $20 per month subscription. From idea to live product in a weekend, not a quarter.

The toggle is the floor. You automate the busywork in QuickBooks. You free up the team. You decide what to build with the time.

Full breakdown with the source links and FAQ is on the live post: https://www.buildthisnow.com/blog/guide/mechanics/claude-for-small-business

Claude Opus 4.7 vs 4.6

speedy_devv — Mon, 18 May 2026 08:17:03 +0000

The price page hasn't moved. Five dollars per million input. Twenty-five per million output. Identical to Opus 4.6.

Run the same Claude Code call on both, though, and 4.7 hands you a bigger bill. A representative agent call with 25K of context and 4K of output went from $0.225 on 4.6 to $0.2805 on 4.7. That's a 24.7 percent jump for the exact same job, the exact same prompt, the exact same code shipped.

Nothing in the rate card explains it. The new tokenizer does. And one day before this writeup, Claude Code v2.1.142 made 4.7 the fast-mode default. So your next session is paying the new rate unless you go opt back.

The Tokenizer Is the Story

Simon Willison ran one system prompt through Anthropic's own count_tokens API on both models. Opus 4.6 returned 5,039 tokens. Opus 4.7 returned 7,335. That's a 1.46x ratio on plain English text, and it sits eleven points above the 1.0 to 1.35x ceiling Anthropic published themselves.

OpenRouter then ran the comparison across one million real production requests. Their finding: 12 to 27 percent more billed tokens on typical 2K to 25K prompts after prompt caching. Without caching, 32 to 45 percent more.

Same English in. More tokens counted. Same per-token rate. Bigger bill out.

What the Worked Math Looks Like

A standard Claude Code build call. The agent reads roughly 25K tokens of context (your repo plus a spec) and writes roughly 4K tokens of code.

On Opus 4.6:

input:  25,000 tokens × $5/1M  = $0.125
output:  4,000 tokens × $25/1M = $0.100
total                          = $0.225

On Opus 4.7, the same English source gets recounted by the new tokenizer. OpenRouter measured native input inflation at 1.34x for 25K to 50K prompts, and 13 to 30 percent longer completions at long context. Apply both:

input:  25,000 × 1.34 = 33,500 tokens × $5/1M  = $0.1675
output:  4,000 × 1.13 =  4,520 tokens × $25/1M = $0.1130
total                                          = $0.2805

A 24.7 percent effective increase. Nothing in the menu warned you. The tokenizer ate your margin.

Run that across a working day. A solo founder pushing 200 agent calls daily moves from $45 to $56. About $330 a month in spend you didn't opt into.

Two cases flip. Long-context calls at 128K and up get prompt caching to absorb roughly 93 percent of the inflation, so the gap shrinks to about 15 percent. Short calls under 2K tokens actually get cheaper by 1.6 percent, because completions on 4.7 are about 62 percent shorter at that size.

The mid range, where most agent calls sit, is where it hurts.

The kicker: prompt caching has to be on for the 12 to 27 percent figure to hold. Without it you're staring at 32 to 45 percent more billed tokens for the same English text. A lot of indie setups don't have caching wired up because the docs treat it as a power-user feature. On 4.7 it's not optional anymore. It's the difference between a quiet price hike and a loud one.

The Default Flip Nobody Warned You About

Claude Code v2.1.142 shipped on 2026-05-14. The changelog buries the change two bullets deep: fast mode now runs on Opus 4.7 by default. Every /fast you typed yesterday on 4.6 fires on 4.7 today.

Fast mode is six times the per-token price of standard mode. Multiply that by the new tokenizer ratio on long English prompts and a fast call now bills closer to seven and a half times the cost of standard 4.6, depending on prompt size.

You can pin it back. Anthropic shipped a new env var alongside the default flip:

# Pin Claude Code fast mode to Opus 4.6 (not the new 4.7 default).
# Active as of Claude Code v2.1.142 (2026-05-14).
export CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE=1

# Verify:
claude --version   # expect 2.1.142 or later
claude /fast       # confirm fast mode is on
# Fast mode now runs on Opus 4.6 regardless of any other env var.

A few notes on the flag. It takes precedence over CLAUDE_CODE_ENABLE_OPUS_4_7_FAST_MODE if both are set. Drop it in ~/.zshrc for global stickiness, or your project .env for per-repo control. Confirm it still ships in your version with:

claude --help | grep OPUS_4_6

Anthropic typically keeps these flags around for six to twelve months after a default flip.

Where 4.7 Earns Its Keep

The benchmark gains are real, and they cluster.

SWE-bench Pro climbed +10.9 points (53.4 to 64.3). That's the largest jump on the table for hard, multi-file agent coding work. CharXiv (no tools) gained +13.0 points (69.1 to 82.1), so chart and figure parsing got a real upgrade. SWE-bench Verified, Terminal-Bench 2.0, GPQA Diamond, OSWorld-Verified all moved up too.

One regression stands out. BrowseComp, the agentic web research benchmark, dropped from 83.7 to 79.3 percent. That's 4.4 points worse on 4.7 than on 4.6.

Stay on 4.7 when the call profile matches at least one of these:

Hard SWE work where the +10.9 SWE-bench Pro lift compounds across long tasks
High-resolution images, charts, or PDFs where +13.0 CharXiv shows up in your output
Prompts at 128K and up where caching captures most of the tokenizer inflation
Plan-then-execute agents where one stronger plan saves five weaker build cycles

When to Pin Back to 4.6

Three workloads regress or break even on 4.7 once you account for cost. Override fast mode back to 4.6 when:

Prompts sit between 2K and 25K and you don't use prompt caching. This is the worst tokenizer cost zone.
The job involves agentic web research. BrowseComp dropped 4.4 points.
You're on a Claude Max plan and your session caps now burn down 1.3 to 3 times faster than they did last month.

Evaluator and judgment work doesn't need the SWE-bench Pro lift. Linters and doc writers don't gain from CharXiv. Those agents produce identical output on 4.6 at a real discount.

Per-Agent Model Selection Is the Real Move

Most teams pick one model per project. The 4.7 cost shift makes per-agent selection the better default.

A planner reads a spec, decides the file layout, picks the build order. SWE-bench Pro +10.9 lands here. Keep planners on 4.7.

A backend agent or designer writes code. Same SWE-bench Pro lift. Keep them on 4.7.

An evaluator reads code and finds gaps. Judgment work. Pin to 4.6 with the override flag and pocket the difference.

A linter, a formatter, a doc writer. Mechanical work. Pin to 4.6.

A test agent runs the test plan and reports back. Mechanical too. Pin to 4.6.

Five agents pinned to 4.6. Three on 4.7. The build still gets the planner and builder lifts where they matter. The cheap loops stay cheap.

The Verdict

Same rate card. Different bill.

Opus 4.7 prints better numbers on the benchmarks that matter for hard agent coding. It costs more for the same English prompt almost everywhere except the very short tail. And the default fast-mode flip means the new bill is already running on your laptop right now if you haven't set the override.

Pin 4.6 for evaluators, linters, and doc writers. Keep 4.7 for planners and SWE-heavy builders. Set the override flag once, forget about it, and let each agent role buy what it actually needs.

Full benchmark table, sources, and the override config: https://www.buildthisnow.com/blog/models/claude-opus-4-7-vs-4-6

How Garry Tan (YC CEO) Uses Claude Code: Inside the 23-Tool gstack Setup

speedy_devv — Sun, 17 May 2026 17:00:29 +0000

"I just open-sourced my entire Claude Code setup I used to average 10K LOC and 100 PRs per week in the last 50 days."

That was Garry Tan, President and CEO of Y Combinator, on March 12, 2026. He dropped a repo called gstack. Six weeks later it has 97k stars, 14.5k forks, and is still pulling 915 stars in the last 24 hours.

Half of X is calling it god mode. The other half is calling it a folder of prompts in a text file. Both sides are partially right. Neither side is showing you the actual table of what is in there.

This post does.

What gstack Actually Is

gstack is an open-source pack of opinionated Claude Code skills. MIT license. One install line. Each slash command embodies a specialist persona with its own priorities, constraints, and outputs. The work moves through a fixed loop: Think, Plan, Build, Review, Test, Ship, Reflect.

As of 2026-05-15, the README ships 23 core skills. The count grew from 6 at launch, to 13 by the TechCrunch coverage on March 17, to 23 today.

That progression matters. The repo is moving fast. Pin the commit you cloned if you want a stable reference.

Why The Source Matters

YC has funded 4,000+ companies. The CEO running that engine is publishing the exact skill setup he uses to ship code. That is the rarest possible signal for a Claude Code workflow.

You can argue with the design. You cannot argue with who shipped it.

The 10K LOC and 100 PRs per week number is self-reported. Frame it that way. The shape of the workflow is the part you can copy.

The Role-Based Architecture

gstack groups every skill under a job title. A role owns priorities, constraints, and outputs. Free-form prompting asks Claude to wear many hats inside one message. Roles split the hats and pass work between them.

Six core roles plus a Chief Security Officer:

CEO / Founder: Scope, product framing, what to build
Designer: Visual system, mockups, AI slop detection
Eng Manager: Architecture, data flow, edge cases
Release Manager: Sync, test, push, PR, deploy
Doc Engineer: Docs in sync with shipped code
QA Lead: Real browser testing, bug fixes
Chief Security Officer: OWASP Top 10 plus STRIDE threat modeling

The loop reads as one sentence: Think (Office Hours plus CEO review), Plan (Eng plus Design plus DevEx review), Build, Review, Test (QA plus CSO), Ship, Reflect (retro).

The 23 Tools

Sourced verbatim from the gstack README and docs/skills.md, accessed 2026-05-15.

Think and Plan

/office-hours (YC Office Hours): Six forcing questions that reframe the product before any code
/plan-ceo-review (CEO / Founder): Scope review across four modes: expansion, selective, hold, reduction
/plan-eng-review (Eng Manager): Architecture, data flow, diagrams, edge cases, test matrix
/plan-design-review (Senior Designer): 0 to 10 ratings per design dimension, flags AI slop
/plan-devex-review (DX Lead): Interactive developer-experience audit, three modes

Design

/design-consultation (Design Partner): End-to-end design system, research plus mockups
/design-shotgun (Design Explorer): 4 to 6 AI mockup variants with taste-memory learning
/design-html (Design Engineer): Mockup converted to production HTML, ~30KB, zero deps

Review and Investigate

/review (Staff Engineer): Production bug detection with auto-fixes and coverage audit
/investigate (Debugger): Systematic root-cause analysis, traces data flow, three-fix limit
/design-review (Designer Who Codes): Post-ship design audit and atomic-commit auto-fixes
/devex-review (DX Tester): Live onboarding audit with timing and error screenshots

Test

/qa (QA Lead): Browser testing, fixes bugs, generates regression tests
/qa-only (QA Reporter): Bug reports only, no code edits
/cso (Chief Security Officer): OWASP Top 10 plus STRIDE, 17 false-positive exclusions

Ship

/ship (Release Engineer): Sync, test, audit, push, open PR, bootstraps frameworks
/land-and-deploy (Release Engineer): Merge, run CI, deploy, verify production health
/canary (SRE): Post-deploy monitoring, console errors, perf, failures
/benchmark (Performance Engineer): Core Web Vitals, resource sizes, before-and-after diffs

Document and Reflect

/document-release (Technical Writer): Auto-updates docs, Diataxis coverage map
/document-generate (Doc Author): Generates missing reference, how-to, and tutorial docs
/retro (Eng Manager): Weekly retro, per-person breakdowns, streak tracking
/browse (QA Engineer): Real Chromium browser, ~100ms per command

That is the 23-tool count locked to today. The README is moving fast.

A Real Workflow End To End

A clean session moves through six commands in this order: think, plan, build, review, test, ship.

/office-hours runs first. Six forcing questions reframe the feature. You answer in plain English. Output is a sharper scope.

/plan-ceo-review checks scope across four modes: expand, selective, hold, reduce. Often kills work before it starts.

/plan-eng-review drafts architecture, data flow, edge cases, and a test matrix.

You implement against the plan as Claude normally would.

/review scans for production bugs and applies atomic auto-fixes. Coverage audit runs in the same pass.

/qa opens a real Chromium browser, runs the flow, fixes bugs it finds, and writes regression tests.

/ship syncs, tests, audits, pushes, and opens a PR in one chain.

Each command keeps Claude in one role for the duration of that step. That is the design choice the rest of the system is built on.

The Power Tools Most Posts Skip

Beyond the 23 there is a second tier of utility commands: /codex, /careful, /freeze, /guard, /unfreeze, /open-gstack-browser, /setup-deploy, /gstack-upgrade, /setup-browser-cookies, /setup-gbrain, /sync-gbrain, /autoplan, /pair-agent, /context-restore, /learn. Two CLI binaries also ship with the repo: gstack-model-benchmark and gstack-taste-update.

The big one for teams is /pair-agent. It coordinates Claude Code, Codex, and Hermes against the same task. Garry sits on top of this layer when he runs many sessions at once.

His own framing: "gstack is powerful with one sprint. It is transformative with ten running at once."

What Garry Says, In His Own Words

Three quotes worth keeping near the workflow.

On the philosophy: "A single builder with the right tooling can move faster than a traditional team."

On the cost: "I sleep, like, four hours a night right now. I have cyber psychosis." (SXSW with Bill Gurley, March 2026)

That last quote matters. Throughput at this rate is not the average user's normal day. Treat the headline numbers as a ceiling, not a floor.

The Reception Was Split

The launch trended on Product Hunt and pulled 33k stars in week one. Now 97k. Garry's tweet hit 849k views. A CTO friend called it god mode.

The pushback was loud too. Mo Bitar shipped a critique calling gstack "a bunch of prompts in a text file." Sherveen Mashayekhi argued the visibility came from Tan's YC role, not from the artifact's merit.

The steelman of the critique is fair. gstack does not run novel infrastructure. Each skill is markdown plus a system prompt. You could write the same files yourself in a weekend.

The counter is also fair. Most people will not write the same files in a weekend. They will write three, ship none, and revert to free-form prompting. A finished, opinionated set, vetted by someone shipping 10K lines a week, is a different artifact than your own half-written prompts.

Both readings are correct. Pick the one your team needs to hear.

Five Lessons For Founders

Roles beat prompts. Defining a persona with priorities and constraints produces more consistent output than asking one Claude to wear seven hats.

Forcing functions before code. /office-hours exists because reframing the product is cheaper than rewriting it.

AI slop is a real failure mode. A dedicated Senior Designer skill exists specifically to catch it.

Ship is a workflow, not a button. /ship and /land-and-deploy chain sync, test, audit, push, deploy, verify into one command.

Parallelism multiplies leverage. Garry runs 10 to 15 sprints at once. Solo builders get a structured team. Team leads get a fleet.

Clone gstack In 5 Minutes

You need Claude Code, Git, Bun v1.0+, and Node.js if you are on Windows. A Claude Pro subscription ($20/mo) is required for Claude Code itself.

git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git \
  ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup

Then follow the loop. Start with /office-hours to reframe the product. Move to /plan-ceo-review and /plan-eng-review. Build. Run /review. Run /qa. Ship with /ship. That is the loop Garry is using to move at his stated rate.

The Honest Bottom Line

The CEO of YC published his exact Claude Code stack. The 23 commands are listed. The criticism is real. Pick the parts that fit your team and leave the rest.

Full breakdown with the complete tool table, comparison to default Claude Code, and where gstack stops: https://www.buildthisnow.com/blog/guide/agents/garry-tan-gstack-claude-code

Tags: #ClaudeCode #YCombinator #AI #DeveloperTools #BuildInPublic

Claude for Small Business Owners: Build Your Own Software in a Weekend

speedy_devv — Sun, 17 May 2026 16:00:47 +0000

You run a real business. You also pay for six SaaS tools that almost fit your workflow. QuickBooks for the books. HubSpot for the pipeline that nobody on the team actually opens. Calendly for bookings. Canva for assets. A CRM you keep meaning to clean up. A client portal you never built, so you settled for emailing PDFs.

Every quote you have ever asked for from a freelancer or agency starts at twenty thousand dollars. The 2026 cost guides put a small business custom tool at $30,000 to $100,000 all in, with a 3 to 6 month timeline.

There is a different number worth holding next to that one. $79 to $197 one-time, plus $20
a month for a Claude Pro subscription. Same finished software. Live by Sunday night. Code in your own GitHub.

That gap is the whole post.

What Anthropic just shipped

Two days ago, Anthropic launched Claude for Small Business. It comes with 15 ready-to-run agentic workflows and 15 repeatable task skills, plugged into QuickBooks, PayPal, HubSpot, Canva, Docusign, Google Workspace, Microsoft 365, and Slack.

Daniela Amodei framed the launch around a stat worth keeping. SMBs make up 44 percent of US GDP and almost half the private workforce, and have lagged in AI adoption.

So Claude for Small Business automates the tools you already pay for. Lead triage, month-end reconciliation, payroll forecasting, invoice chasing.

It does not build you a new app. That is the part nobody else has written about yet.

Build This Now is the other half of the picture. Same Claude underneath. Different job. You describe a piece of software you wish you had. 32 specialist AI agents plan it, build it, test it, and ship it as real production code in your own GitHub. By Sunday night, the app is live, accepting payments, and sending email.

The four paths an SMB owner can pick from in 2026

Pick honestly. Each one solves a different version of your problem.

Path one: DIY no-code. Bubble, Glide, Softr. Costs zero to two hundred dollars to set up, then $49 to $399 a month forever. Time to a live app: two to six weeks. You do not own the code, and you are locked to the platform. This works for a directory or a static intake form. It cracks the moment you need a workflow that does not exist as a drag-and-drop block. Bubble's pricing now spikes through usage-based Workload Units, and exceeding record limits stops new data entry cold. Glide tops out at 25K rows and bills per seat. Softr caps complex SaaS logic.

Path two: hire a freelancer or agency. Twenty thousand to a hundred thousand dollars up front. Three to six months to a live app. Five hundred to five thousand a month in maintenance. You usually own the code, if you specified that in the contract. This solves the "almost what I need" problem. It also takes a quarter of a year and the cost of a used car.

Path three: Claude for Small Business. No extra cost over your Claude license. Same-day setup for the prebuilt workflows. You do not own new code, because there is no new code. Claude works inside the tools you already pay for. User approval is required before Claude sends, posts, or pays, and existing tool permissions are preserved. This is the right answer for "automate the work I already do inside QuickBooks, HubSpot, and PayPal."

Path four: Build This Now. Seventy-nine to one hundred ninety-seven dollars one-time. Twenty dollars a month for Claude Pro. Forty-eight hours from idea to a focused MVP. You own the full source code in your GitHub. This is the right answer for "build the thing nobody sells me yet."

Claude for Small Business and Build This Now are not competing. They cover different gaps. Both run on Claude.

Five pieces of software a small business can build this weekend

Each one names a real SaaS tool it replaces and a real price. Match these to your business and pick one.

1. Custom client portal with invoicing and document signing. Replaces SuiteDash at $19 plus per user per month and Dubsado at $40 plus per month. Service businesses, consultants, agencies. The no-code wall hits the moment you want a workflow that is not a checklist. With Build This Now, the portal lives in your GitHub, signs with your domain, and bills through your Stripe account.

2. Niche scheduling app with Stripe deposits. Replaces a Calendly Pro plus Acuity stack. Salons, clinics, tutors, coaches, mobile service providers. A custom build lets you take a deposit at booking, charge a no-show fee on policy, and send the reminder text in your own voice.

3. Internal CRM tailored to your sales process. Replaces a watered-down HubSpot or a Trello plus Google Sheets duct-tape rig. Real estate teams, B2B sales shops. The shape of your pipeline is the asset. A custom CRM lets you build it, not retrofit it.

4. Customer-facing dashboard for your service. Replaces "weekly PDF report we email." Marketing agencies, IT MSPs, accounting firms. Clients log in. They see results live. Renewals stop being a fight.

5. Inventory or job tracker with photo uploads and team logins. Replaces spreadsheets plus a WhatsApp group. Trades, contractors, e-commerce ops. Field staff post photos, the back office sees status, and nothing falls through the cracks.

Each one of these would cost $20K to $75K from a freelancer.

Friday to Sunday, hour by hour

Friday, 7 PM. Run npx buildthisnow init. Link Supabase, Stripe, Resend.

Friday, 8 PM. Run /discover. The agent team turns a plain-English description of your business into a product spec.

Friday, 10 PM. Read the seven generated docs. Sleep on it.

Saturday, 9 AM. Run /mvp-spec. Discovery becomes a feature-by-feature plan.

Saturday, 10 AM to 6 PM. Run /ship {feature} six to eight times, one per core feature. Each one runs through planning, build, evaluation, and testing automatically.

Saturday, 7 PM. Run /security and /audit. The build gets locked down.

Sunday, 10 AM. Run /landing-design and /seo. The marketing surface gets dressed.

Sunday, 2 PM. Run /deploy. The app goes to Vercel. Point your custom domain.

Sunday, 4 PM. The app is live, accepting payments, sending emails, and signing up users.

This pipeline is real, not a marketing diagram. Indie hackers are already shipping MVPs in two days using Claude with no coding background.

The honest math

Build This Now is $79 one-time for CodeKit. $197 one-time for Speedy Swarm, the macOS dashboard for running multiple builds side by side. Both prices are one payment. No per-seat. No subscription on the framework itself.

You also need a Claude Pro subscription at $20 per month to run the agents. That requirement is real and not optional. Add free-tier Supabase, Stripe, Resend, and Vercel for most early-stage workloads, and your total monthly burn for a live app is $20.

Compare that to the $30,000 to $100,000 freelancer quote. Compare it to a no-code stack at $49 to $399 per month that you never own.

A few honest disclaimers. Build This Now is no-code at the prompt level, not at the operating system level. You will run an npx command in a terminal and edit a .env file. You will not write a line of TypeScript unless you choose to. The agents handle the code. You describe the feature. The production skeleton ships with auth, payments, email, storage, and row-level security on every database table already wired up.

Use Claude for Small Business for the work you already do inside QuickBooks, HubSpot, and PayPal. Use Build This Now for the custom software you wish existed. Neither of those decisions has to wait until next quarter.

What to do this week

You have a real workflow nobody else has. Stop renting six tools that almost fit. Spend one weekend building the one app that does.

Read the full post with the cost tables and source links here: Claude for Small Business Owners: Build Your Own Software in a Weekend. If you want to start, the framework lives at buildthisnow.com.

Tags: #SmallBusiness #SaaS #NoCode #ClaudeAI #BuildInPublic

What Are Claude Skills

speedy_devv — Sun, 17 May 2026 08:00:53 +0000

You keep seeing "Claude Skills" in repos, on X, in YouTube thumbnails. You still are not sure what the thing is. Anthropic's docs read like a spec. Most blog posts bury the answer under five paragraphs of intro.

Here it is in 30 seconds. A Claude Skill is a small folder with one file inside it called SKILL.md. That file tells Claude how to do one specific thing. Build the folder once. Claude uses it forever.

That sentence is the whole concept. The rest of this post is the proof, the parts list, the 12 real ones people are running today, and the 5-way matrix that tells you when to reach for a skill instead of a
n MCP server, a subagent, a plugin, or a project.

The One-Paragraph Definition

A Claude Skill is a folder of instructions Claude reads when it decides the topic fits your request. The folder lives at .claude/skills/your-skill-name/ and contains a single required file, SKILL.md. Claude scans the descriptions of every skill you have installed. When one matches what you asked for, it loads the body and follows it. You can also call a skill by hand with /skill-name.

Anthropic's own line: "Skills are folders containing instructions, scripts, and resources that Claude discovers and loads dynamically when relevant to a task." (Source: claude.com/blog/skills-explained.)

Three things follow from that. Skills sit on disk. Skills are discovered by description-matching, not by file extension or folder location. Skills are reusable across every session in the project they live in.

A Real SKILL.md File, Opened Up

Here is a working skill in ten lines, taken from Anthropic's own docs at code.claude.com/docs/en/skills:

---
description: "Summarizes uncommitted changes and flags anything risky. Use when the user asks what changed, wants a commit message, or asks to review their diff."
---

## Current changes

!`git diff HEAD`

## Instructions

Summarize the changes above in two or three bullet points, then list any risks you notice.

Two parts. The block at the top fenced by --- lines is the frontmatter. It is a tiny block of settings written in YAML. The only required line is description, which tells Claude what the skill does and when to load it.

Below the frontmatter is plain Markdown. That is the body Claude reads when the skill fires. The line !\git diff HEAD`` is a shell command that runs first, with its output dropped into the message Claude sees. Then Claude follows the instructions under "Instructions". That is it. No build step. No registration. Save the file and Claude can use it on the next prompt.

How Claude Decides to Run a Skill

Two paths. You can call it by name, like /code-review, and the autocomplete list shows you every installed skill. Or you can let Claude pick. In a normal session, Claude only sees the descriptions of every installed skill, not the bodies. It reads the descriptions, decides which ones fit your prompt, and pulls in the full body of any that do. The body stays in context for the rest of the session.

This is why the description matters more than anything else in the file. Bad description, the skill never fires. Good description, Claude reaches for it the moment the topic lands.

Two caps you should know about. The combined description plus when_to_use text gets truncated at 1,536 characters in the listing Claude sees. So write the description to fit. The default budget for all skill listings together is 1% of the model's context window, configurable via a setting called skillListingBudgetFraction. Past that budget, lower-priority skills get cut. (Source: code.claude.com/docs/en/skills.)

Skills hot-reload. Edit the file mid-session and the change is live on the next turn.

12 Real Claude Skills People Run Today

Pulled from Anthropic's official set and the three highest-starred community repos on GitHub.

pdf from Anthropic: generate and process PDF documents. "Make a one-page PDF report from this CSV."
xlsx from Anthropic: create and edit Excel spreadsheets. "Turn this list into an Excel file with totals."
pptx from Anthropic: build PowerPoint decks. "Generate a 10-slide deck from this brief."
mcp-builder from Anthropic: scaffold a new MCP server. "Build an MCP server that talks to my internal API."
skill-creator from Anthropic: help Claude write new skills. "I keep doing X by hand. Make a skill out of it."
webapp-testing from Anthropic: run automated browser tests. "Click through the signup flow and tell me what breaks."
tdd from mattpocock: red-green-refactor TDD loop. "Add this feature using test-driven development."
caveman from mattpocock: compress prose to roughly 75% fewer tokens. "Rewrite this in caveman so it costs less to send."
diagnose from mattpocock: structured debugging methodology. "This bug is weird. Walk through it methodically."
code-review-and-quality from addyosmani: five-axis code review with severity labels. "Review this PR like a senior engineer would."
security-and-hardening from addyosmani: OWASP Top 10 prevention checks. "Audit my app for the top web security risks."
systematic-debugging from obra: four-phase root-cause analysis. "Don't patch the symptom. Find the actual root cause."

The three community repos behind rows 7 to 12 carry well over 80,000 stars each on GitHub, with obra/superpowers the most-starred of the bunch.

Skill vs Subagent vs MCP vs Plugin vs Project

Five things people confuse with each other. Here is the plain-English version.

Use a Skill when you want to teach Claude how to do something the same way every time. Static instructions Claude reads when the description matches. Cheap. Lives on disk.

Use an MCP server when you want to connect Claude to a live system like Stripe, Postgres, or Slack. MCP is the connection layer. It pulls live data and runs actions.

Use a subagent when you want to spawn an isolated worker with its own context window and tools. Subagents have their own context, system prompt, and tool list.

Use a plugin when you want to bundle skills, MCP servers, commands, and hooks together to share. A plugin is the distributable container around the other primitives.

Use a project when you want to provide background reference for one workspace. Projects hold knowledge files scoped to one workspace.

Anthropic's own framing: "If you've got more skills than MCP servers, you're probably doing it right." (Source: claude.com/blog/skills-explained.)

Cleanest way to remember it. MCP brings live data in. Subagents fork off a worker. Plugins package the whole bundle for distribution. Projects hold reference material. Skills teach procedure.

What Else Goes Inside a Skill Folder

SKILL.md is the only required file. Most production skills also carry one or more of these:

scripts/ for helper scripts the skill calls
references/ for documentation Claude can read on demand
examples/ for sample inputs and outputs
templates/ for files the skill copies and fills in

None of those are loaded into context by default. Claude opens them only when the body of SKILL.md tells it to. That is how a skill can carry hundreds of pages of reference without burning tokens until it needs them.

What Changed in 2026

The skills format moved fast through the spring. Three things worth knowing as of May 2026.

Root SKILL.md auto-surface. Around Claude Code v2.1.69 in late April 2026, plugins with a root-level SKILL.md and no skills/ subdirectory now show up as a single skill. Single-file plugin distribution is no longer awkward.

Custom commands merged into skills. A file at .claude/commands/deploy.md and a skill at .claude/skills/deploy/SKILL.md both create /deploy and behave the same way. Old .claude/commands/ files keep working. The distinction has dissolved.

An open standard. The format is published at agentskills.io, so the same SKILL.md file works across more than just Claude. Community marketplaces collectively list more than 4,000 skills as of May 2026. The official source remains github.com/anthropics/skills.

Bundled skills now ship with Claude Code itself: /simplify, /batch, /debug, /loop, /claude-api, /init, /review, /security-review. You already have those even if you have never installed a thing.

When a Skill Refuses to Fire

Most common reason: Claude never picked your skill, because the description did not match the prompt the user wrote. Three fixes.

Rewrite the description to lead with what the skill does, then add explicit example phrases the user might say. The 1,536-character cap is generous, so use it.

Run /doctor to confirm Claude is seeing the skill at all. The output lists every skill loaded in the current session, with their descriptions.

If you have hundreds of skills installed and most are getting cut from the listing, raise skillListingBudgetFraction past the default 1%.

Triggering is probabilistic, not deterministic. Claude usually picks the right skill. Sometimes it does not. The description is the lever.

Closing

A skill is a folder. A folder is a file. A file is a description plus a recipe. Build the folder once. Claude uses it forever.

Full version with the live decision matrix and frontmatter field reference: https://www.buildthisnow.com/blog/guide/mechanics/what-are-claude-skills

Cut Claude Code Token Costs

speedy_devv — Sat, 16 May 2026 16:45:07 +0000

Your Claude Code bill went up twice in 30 days. You felt it on the invoice before you understood why.

On June 15, 2026, Anthropic moves Agent SDK, claude -p, and Claude Code GitHub Actions onto a separate metered credit pool that does not roll over. Once the pool drains, you pay full API rates.

At the same time, the new Opus 4.7 tokenizer reports about 1.46x more text tokens than 4.6 at the same per-token price. Simon Willison flagged it as "actually a pretty big price bump." Image content can hit 3.01x. Same prompt, same dollars per token, more tokens per request.

Five open-source repos fight back. I ranked them by max stated savings, with install commands and where each percentage actually comes from. Every number in this post is vendor-stated. Real savings shift with codebase size, MCP server count, and how often your sessions repeat work.

Why your bill is climbing right now

Three things are stacking on top of each other.

First, the June 15 split. Programmatic Claude Code usage gets its own dedicated budget instead of sharing the chat pool. The Pro plan ships $20 of Agent SDK credit, Max 5x ships $100, Max 20x ships $200. None of it rolls over. Interactive Claude Code in your terminal is unaffected.

Second, the Opus 4.7 tokenizer. Same price per token, more tokens per request. Willison's testing measured 1.46x for plain text and up to 3.01x for images. A 15MB PDF only inflated 1.08x, so the impact varies with content type.

Third, Fast Mode now defaults to Opus 4.7 in recent Claude Code releases. Faster, smarter, and quietly more expensive per request than the 4.6 baseline you had a month ago.

The fix is not "use Sonnet for everything." The fix is fewer tokens hitting the wire on every call you do make.

The five tools, ranked by max stated savings

Verify each one in your own workflow. Percentages come straight from each repo's README or the third-party listing noted.

lean-ctx: 60% to 95% reduction across reads, up to 99% on cached reads (vendor README).
airis-mcp-gateway: up to 97% context token reduction. The 97% figure comes from the VoltAgent awesome-claude-code-subagents listing*, not the repo itself. The repo's own README says only "Token Efficiency: Measurable reduction in initial context overhead" with no number.
agentmemory: 92% fewer tokens than pasting full context across sessions. Badge sits at the top of the README.
9router: 20% to 40% per request via RTK token compression on tool output. Worked example in the README: 47K tokens shrunk to 28K.
cc-ledger: 0% direct savings. This is the meter. You need it on before you can prove the other four did anything.

(*third-party listing, not vendor)

lean-ctx: compress the inputs

lean-ctx is a single Rust binary that sits between Claude Code and your filesystem. It hooks every file read, every grep, every shell command. Output gets compressed before it reaches the model.

Headline claim is 60% to 95% reduction, with up to 99% on cached reads. The 99% figure assumes cache hits. Cold runs see less.

Three commands stand it up:

curl -fsSL https://leanctx.com/install.sh | sh
lean-ctx setup
lean-ctx init --agent claude-code

Best fit: workflows where you re-read the same files often. Worst fit: thin sessions with mostly short prompts. The compression overhead pays off only when there is something to compress.

airis-mcp-gateway: compress the tool listings

If your Claude Code talks to Sentry, GitHub, Linear, Postgres, and a couple of other MCP servers, the system prompt pays a tax for every tool listing on every turn. airis-mcp-gateway aggregates many MCP servers behind a single SSE endpoint with intelligent routing and on-demand lifecycle management.

The 97% figure that travels with this repo comes from the VoltAgent awesome-claude-code-subagents listing. The repo itself is more conservative. Read both before you quote a number to your team.

Production install:

curl -fsSL https://raw.githubusercontent.com/agiletec-inc/airis-mcp-gateway/main/install.sh | bash

Skip this one if your .mcp.json lists fewer than five servers. The savings come from collapsing tool listings, and a slim setup has nothing to collapse.

agentmemory: stop paying for re-explaining your project

Every new Claude Code session starts fresh. You re-paste the stack notes, the rules, the patterns. agentmemory kills that cost. It captures what the agent does via hooks, compresses into searchable observations, and injects relevant prior context into future sessions.

The README claims 92% fewer tokens against the worst-case "paste full context every session" baseline. Worked comparison in the repo: about 170K tokens per year (around $10) with agentmemory versus 19.5M+ tokens pasting full context manually.

Install:

npx @agentmemory/agentmemory

If you already use Claude Code's /resume and a tight CLAUDE.md, the marginal savings are smaller than the badge implies. Still useful. Just not 92% useful.

9router: compress the outputs and arbitrage providers

9router is a multi-provider router with two tricks. The first is RTK Token Saver, which auto-compresses tool_result content like git diff, grep, find, ls, tree, and log output. The README quotes 20% to 40% per request and shows a worked example: 47K tokens without RTK, 28K tokens with it, a 40% cut on that one call.

The second trick is provider routing. 9router fronts 40+ providers including Kiro AI (Claude 4.5), OpenCode Free, Vertex AI's $300 credit pool, GLM at $0.6/1M, MiniMax at $0.2/1M, and Kimi at $9/month.

Install:

npm install -g 9router
9router

Caveat worth saying out loud. Routing to non-Anthropic providers changes the trust profile, the latency, the model quality, and the data handling. Read the provider's terms before you push real client code through it.

cc-ledger: see what you spent

You cannot manage what you cannot see. cc-ledger captures every Claude Code edit, prompt, and per-turn token cost via Claude Code hooks. It writes to ~/.cc-ledger/ledger.db and tracks five token classes per turn: input, output, cache_read, cache_write_5m, and cache_write_1h.

Those classes match Anthropic's own pricing model. Cache reads bill at 0.10x input rates. 5-minute cache writes bill at 1.25x. 1-hour cache writes bill at 2x. Without a ledger, prompt caching feels invisible. With one, you see the line move.

Install:

curl -fsSL https://ccledger.dev/install | bash

cc-ledger also computes "shadow billing", which estimates what your subscription usage would have cost on the API. Sat at six stars on May 15, 2026. Early-stage. Use it for visibility, not as a billing system of record.

The do-this-in-order recipe

Stack the savings in this order. Each step compounds on the last, and the ledger lets you see what each step actually saved.

Install cc-ledger first. You need a baseline. Run it, then work normally for one day. Note the daily spend.
Install agentmemory. This kills the cost of re-explaining your project on every new session.
Install lean-ctx. This compresses every file read and shell command before it hits the model.
Add airis-mcp-gateway only if you have five or more MCP servers configured. Otherwise skip it.
Add 9router only if you are willing to route to non-Anthropic providers. Highest impact for Pro users on tight budgets, also the most disruptive change to your workflow.
Re-check cc-ledger after one week. Compare against your baseline.

One install at a time, with the ledger between each step. That's how you tell which tool moved the line.

What Anthropic itself recommends (free)

Before any third-party tool, the free moves from Anthropic's own cost guide:

Run /clear between unrelated tasks so context does not balloon.
Run /compact with custom instructions to keep only what the next step needs.
Set MAX_THINKING_TOKENS=8000 so extended thinking has a ceiling.
Prefer CLI tools over MCP servers when the CLI is installed already.
Move CLAUDE.md detail into skills, so the bytes load only when needed.

Anthropic also publishes a "$13 per developer per active day" enterprise benchmark, and notes that agent teams use about 7x more tokens than standard sessions. Worth knowing before you launch a fleet of subagents on a tight budget.

Risks and caveats

A few honest gotchas, in order of how often they bite people.

Every percentage above is vendor-stated. The 97% airis figure comes from a third-party listing, not the repo itself. The 99% lean-ctx figure assumes cache hits. The 92% agentmemory figure compares against the worst-case baseline. The 40% 9router figure is one worked example, not a benchmark.

9router routes traffic to non-Anthropic providers. That changes trust, latency, quality, and data handling.

cc-ledger is early-stage. Use it for visibility, not as a billing system of record.

The June 15 split is recent. Anthropic has changed billing twice in two months. Check the official pricing page before making subscription decisions based on this post.

Stacking five tools adds operational surface area. Bash install, Docker, npm global, npx daemon, hook scripts. Install one at a time and re-measure with cc-ledger between each.

None of these tools is endorsed by Anthropic. They are community projects.

The bill went up twice in 30 days. The fix is not one tool. The fix is a stack with a meter on top. Install the ledger first, then layer the rest.

Full breakdown with every source link: https://www.buildthisnow.com/blog/guide/mechanics/cut-claude-code-token-costs

I Shipped a SaaS MVP With Three Emails. Then I Watched It Die.

speedy_devv — Sun, 03 May 2026 15:16:06 +0000

I just shipped a SaaS MVP last month. Auth worked. Payments worked. The product did the thing it was supposed to do. I opened the dashboard on a Monday morning, coffee in hand, ready to watch the trials convert.

Three weeks later my trial-to-paid was sitting at 8%. Users were signing up, clicking around for a day, and vanishing. Nobody was churning loudly. They were just leaving. Quietly. Like the app was a restaurant they walked into, looked at the menu, and walked out of without ordering.

I looked at my email setup. A welcome email, a password reset, a payment receipt. That was it. Three emails covering maybe 10% of the lifecycle. The other 90% was silence, and that silence was costing me every single trial that did not convert on its own.

The Number That Made Me Stop Coding and Start Reading

I went down a research rabbit hole for two days. I read every email marketing study I could find. One number kept showing up, and it kept me up at night.

Automated email flows make up about 2% of total email volume but drive 37% of email revenue. Two percent. Thirty-seven percent. That ratio is insane. The emails most founders skip are the ones generating most of the money.

Two percent of your email volume drives 37% of your revenue. That is the gap where silent SaaS products go to die.

I had shipped the product. I had not shipped the lifecycle. And the lifecycle was the product, because nobody reaches the product if nobody leads them to it.

Why I Stopped Trusting My Email Tool

The standard advice is: open your email platform, pick a template, write the copy, set a 3-day delay, repeat. I tried that. I hated it. Here is what I kept running into.

Timers Ignore What Users Actually Do

Most tools schedule emails on fixed delays. Send email 2 three days after email 1. That logic treats every user the same. A user who activated on day 1 gets the same nudge as someone who never logged in. It is spam disguised as personalization.

The research is blunt about this. Triggered emails, meaning emails sent based on what a user does, get 76% higher open rates and 152% higher click-through rates than timed sends. If I wanted the 37% revenue, I had to build triggered, not timed.

HTML Templates Look Professional and Underperform

I had been building rich HTML emails with my logo, images, layout grids, the whole thing. They looked great in preview. They also tanked in practice.

Plain text emails get 42% more clicks than HTML. Gmail and Apple Mail render plain text well. Heavy HTML triggers spam filters. Half the devices your users carry render your beautiful layout like a ransom note. I had been optimizing for what looked good in my design tool, not for what clicked in real inboxes.

External Tools Have No Idea What Your App Is Doing

This is the one that killed me. My email platform could not check if a user had already activated before sending the activation nudge. It could not skip the upgrade email for someone who upgraded yesterday. The email system and the product lived in separate worlds. I was sending people upgrade prompts after they had upgraded.

What I Actually Wanted to Build

I wrote down what a correct email system would do. It would:

Read my product, my user journeys, my brand voice, and design sequences that sounded like my product
Cover all six major lifecycle stages, not just welcome and billing
React to what users do, not when they signed up
Check the database between every email so I never sent something stupid
Render both HTML and plain text so every client got the version it preferred
Wire triggers to events my app already fires (signup, limit hit, cancellation)

Then I looked at what that would take to build by hand. Seventeen templates. Six background job functions. Trigger wiring across three codebases. Type checking. Tests. Maybe a week of focused work if nothing went sideways.

I did not have a week. I had a conversion problem right now.

So I built it differently.

One Command, Six Agents, Seventeen Emails

I built a slash command called /emails that runs a pipeline of Claude Code agents. One command reads my product, designs every sequence, waits for approval, then builds in parallel.

The command runs in two phases.

Phase 1 is design. One agent reads my product overview, user journeys, brand guidelines, feature map, auth flow, billing setup, and brand colors. From that it builds an Email Brief: the product name, the "aha moment" (the specific action where the product clicks), pricing model, top features to educate about, and upgrade triggers. Then it designs all 17 emails. Subject lines, timing, goals, copy angles.

Nothing gets built yet. The plan goes to me first. I can cut sequences, rewrite subject lines, change timing, or kill emails I do not want. The build phase uses my approved plan as its spec.

Phase 2 is build. Six agents work in parallel. One base agent goes first and creates the shared layout and sending utility. Then five agents build their sequences at the same time. Templates are React components. Background jobs use Inngest (a job engine that handles scheduling, retries, and event-driven workflows).

Twenty minutes of wall-clock time. Six parallel agents. Seventeen typed templates. Six functions. One type check at the end.

The Six Sequences Every SaaS Is Missing

Here is the lifecycle the system builds. I did not invent this. I read about 40 email marketing studies and pulled the pattern that worked.

1. Welcome (4 emails, days 0 to 7)

This is the highest-value sequence you will ever send. First welcome emails average 55 to 70% open rates. Top-performing SaaS companies hit 75% plus. That is the best attention you will ever get from a user. Four emails over the first seven days, each one guiding them closer to the moment your product clicks for them.

2. Activation Nudge (+48h if no key action)

Someone finishes onboarding, sees the dashboard, and leaves. This is the single biggest drop-off point in most SaaS products. One email 48 hours later, focused on one specific action, pulls a chunk of them back.

3. Feature Education (days 14 to 28)

The user is active but only uses one feature. Three emails, each highlighting something they have not touched yet. Segmented campaigns like this drive up to 760% more revenue than broadcast blasts.

4. Upgrade Prompt (on limit hit)

This one fires when the user hits a real limit, not on a timer. They tried to do something the free plan does not allow. That is the moment they actually understand the value. Three emails over five days: what happened, what the paid plan includes, and an offer.

5. Churn Prevention (7 to 30 days inactive)

Runs on a daily cron job. Every morning at 9 AM, the system queries the database for users inactive for 7 plus days. Three emails spread across the next 30 days, each using a different angle: here is what you are missing, your data is still here, tell us what went wrong.

6. Win-Back (0 to 30 days after cancellation)

Starts when Stripe sends a cancellation webhook. Three emails over 30 days. The first acknowledges the cancellation. The second highlights what improved. The third makes an offer.

Here is the part that floored me. A study of 38 SaaS companies found that only 2 covered all of these stages. Two out of thirty-eight. Most stop at welcome and billing. The other 36 are leaving 37% of their email revenue on the table.

Only 2 of 38 SaaS companies cover all nine lifecycle stages. Everyone else is bleeding revenue politely.

The Part That Actually Matters: Behavioral Branching

The thing that separates a real lifecycle system from a timer-based drip is branching. Reacting to what users do, not when they signed up.

Inngest has a feature called step.waitForEvent. In plain terms, it pauses a sequence and waits for a specific thing to happen before continuing. If that thing does not happen in a time limit, the sequence takes a different path.

Here is how that plays out for a new user.

They sign up. The welcome sequence starts. Email 1 sends immediately. Then the function calls step.waitForEvent and waits up to 48 hours for a user.activated event, meaning the user completed the product's core action.

If the event arrives, the sequence skips the activation nudge and jumps straight to feature education. The user already found the value. Nudging them would be noise.

If the event does not arrive, the activation nudge fires. One email focused on the specific action they have not taken.

That same branching happens inside every sequence. The upgrade prompt checks if the user upgraded between emails. Churn prevention checks if they logged back in. Win-back checks if they re-subscribed. Between every step.sleep() call, the function opens a fresh database connection and checks the user's current state. If the user already converted, the sequence stops.

No one gets an upgrade email the day after they upgraded. No one gets a churn email the morning they came back. The email system knows what the product knows, because it is built inside the same app.

The Copy Angle Nobody Teaches You: Loss Framing

I want to single out one specific thing the research changed about my copy. The win-back sequence.

Most founders write cancellation follow-ups that are sunny and optimistic. "Come back, we miss you, look at all the great new things we built." Positive framing. It feels right.

It underperforms. By a lot.

Loss-framed messaging, meaning what you are giving up, converts 21 to 32% better than positive framing. My win-back emails now lead with what the user loses by staying away. The data they cannot access. The automations still running in the background waiting for them. The integration they set up that is about to go stale.

It feels counterintuitive. Loss framing sounds mean. But humans are wired to avoid loss more than to seek gain, and email copy is the place where that wiring shows up most clearly.

What Happened When I Actually Used This

I rolled out the full seven-email onboarding arc on my MVP.

Trial-to-paid went from 12% to 22% over six weeks. Monthly churn dropped from 8% to 4.8%. Those are the two numbers that compound. A 10-point conversion lift plus a 3-point churn reduction changes the math on every marketing dollar I spend. Every channel I was running suddenly paid back more, because the funnel behind it stopped leaking at the bottom.

The other thing I did not expect: my support volume went down. People stopped asking me basic feature questions. The education sequence was answering them before they hit the contact form.

Trial-to-paid went from 12% to 22%. Churn went from 8% to 4.8%. Support volume dropped. The emails did the work my onboarding was supposed to do.

Six Rules I Would Keep Even If I Rebuilt This

The pattern works with any email provider, any job engine, any framework. Here is what I would carry forward.

Read the product before designing emails

The most common mistake is writing emails that sound generic because the person writing them did not understand the product deeply enough. Feed the system your product docs, user journeys, and brand voice before it touches a template.

Design before you build

Present the full plan. Get approval. Then build. Changing a subject line in a plan is free. Changing it in a built template means re-rendering, re-testing, re-deploying.

Check user state between every email

Never assume the user is still in the state they were in when the sequence started. A step.sleep("3d") means three days of potential changes. Query the database before sending.

One action per email

Not two buttons. Not three links. One clear next step. Emails with a single call to action get higher click-through rates because there is no decision to make.

Build plain text alongside HTML

Do not treat plain text as an afterthought. Render both versions from the same template. Plain text emails get 42% more clicks in head-to-head tests, and you get it for free if you design for both.

Wire triggers to events your app already fires

Signup already happens. Cancellation already happens. Limit hits already happen. The email system hooks into events that already exist. That is zero new infrastructure.

The Pattern Works Beyond Email

The parallel-agents pattern, one base agent builds shared infrastructure, then specialized agents build on top of it at the same time, is not specific to email.

I have since used the same shape for notification systems (push, in-app, SMS), onboarding flows (modal, tour, checklist, empty states), and documentation (structure and shared components, then sections written in parallel). Same coordination. Same speed win.

Read the product context first. Design before building. Build in parallel with shared infrastructure. Check state before acting. Type-check everything at the end. That is the entire recipe.

If You Want the Whole Thing

The /emails command is part of Build This Now. It is an AI-powered build system for shipping production SaaS in 48 hours. Eighteen specialist agents, 55+ skills, a full production codebase with auth, payments, storage, analytics, and security built in. One-time payment. No subscriptions.

If you want to stop shipping products that die quietly three weeks after launch, start here:

Full system: buildthisnow.com
Read the full technical breakdown on the blog: buildthisnow.com/blog/ai-email-sequences

The MVP is the product. The lifecycle is the business. Build both.

MCP Tool Hooks in Claude Code

speedy_devv — Fri, 24 Apr 2026 13:22:05 +0000

Your hooks run shell scripts. Every time a hook needs to call an MCP server, it spawns a subprocess, wires up transport, handles auth, parses the response, and formats JSON back to stdout. For a security check that fires on every file write, that overhead adds up fast.

As of Claude Code v2.1.118, there is a cleaner path. Hooks have a new type that calls MCP tools directly. No subprocess. The MCP server is already running. The hook calls straight into its RPC connection.

Add this to .claude/settings.json to run a security scan after every file write:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Write|Edit",
        "hooks": [
          {
            "type": "mcp_tool",
            "server": "semgrep",
            "tool": "scan_file",
            "input": { "path": "${tool_input.file_path}" }
          }
        ]
      }
    ]
  }
}

That is the whole thing. The tool's text output goes through the same JSON decision parser as any command hook. No shell, no PATH problems, no jq dependency.

What type: "mcp_tool" actually is

Before v2.1.118, hooks had four handler types: command, http, prompt, and agent. Now there are five:

command — shell subprocess (stdin/stdout)
http — POST to a URL endpoint
mcp_tool — direct RPC call to a connected MCP server
prompt — single-turn LLM evaluation (Haiku default)
agent — multi-turn subagent with Read/Grep/Glob access

The mcp_tool type works on every hook event, same as command and http. One practical caveat: SessionStart and Setup fire while servers are still connecting. Those hooks may get a "server not connected" error on first run. Subsequent runs are fine.

The full schema

{
  "type": "mcp_tool",
  "server": "my-mcp-server",
  "tool": "tool_name",
  "input": {
    "arg1": "${tool_input.file_path}",
    "arg2": "${session_id}"
  },
  "timeout": 30,
  "statusMessage": "Checking...",
  "if": "Edit(*.ts|*.tsx)"
}

Three fields are specific to mcp_tool hooks: server, tool, and input.

server must exactly match the server name in your MCP configuration. One character difference and the hook silently fails with a non-blocking error.

input values support ${field.path} dot-notation into the hook's full event JSON. For a PostToolUse hook on a Write call, the event includes tool_input.file_path, tool_input.content, session_id, cwd, duration_ms, and more. Any field is reachable.

if uses permission-rule syntax. "Edit(*.py|*.ts|*.js)" means the hook only fires when the matched file extension applies. On a docs-heavy project with constant markdown edits, this is a real performance difference.

Why this matters vs. command hooks

Two concrete differences, not just speed.

Stateful servers. A shell subprocess starts fresh every time. An MCP server is a live process with its own state: loaded configs, open connections, caches, accumulated session context. A linting MCP that pre-parsed your tsconfig.json on startup does not re-parse it on every file write. A command hook does.

No shell environment dependency. Command hooks fail silently when PATH is wrong, when jq is not installed, when ~/.zshrc prints something to stdout on non-interactive shells. MCP tool hooks bypass all of that. The call goes from Claude Code to the server over the existing RPC connection.

How the output is processed

The MCP tool's text content is treated exactly like a command hook's stdout. If it parses as valid JSON, Claude Code acts on the decision fields. If not, the text becomes context for Claude.

{
  "decision": "block",
  "reason": "Security issue found in src/api.ts: SQL injection risk on line 42."
}

Return this from a PostToolUse MCP tool hook and Claude gets the message and fixes the file. For blocking before a tool runs, use PreToolUse and return permissionDecision: "deny".

One field is exclusive to mcp_tool hooks on PostToolUse: updatedMCPToolOutput. It replaces what Claude sees as the tool's output before it enters the conversation. A running MCP server can post-process another tool's result before Claude reads it.

Pattern 1: Security scanning on every write

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Write|Edit",
        "hooks": [
          {
            "type": "mcp_tool",
            "server": "semgrep",
            "tool": "scan_file",
            "if": "Write(*.ts|*.py|*.js|*.go)",
            "input": { "path": "${tool_input.file_path}" },
            "statusMessage": "Scanning..."
          }
        ]
      }
    ]
  }
}

The scan runs on the server's cached ruleset. Not a fresh subprocess parse on every keystroke. If the tool finds something, return decision: "block" with the finding. Claude reworks the file before continuing.

Pattern 2: Stop hook with external verification

A Stop hook that calls a Linear MCP to check whether the related ticket is actually closed before Claude declares done:

{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "mcp_tool",
            "server": "linear",
            "tool": "get_issue_status",
            "input": { "issue_id": "${tool_input.issue_id}" }
          }
        ]
      }
    ]
  }
}

Always check stop_hook_active in your Stop hook logic. The event JSON includes this field as "true" when Claude is already continuing from a previous Stop hook firing. A server that ignores this creates an infinite loop. Build the guard into the MCP tool: if stop_hook_active is "true" in the input, return empty output and exit cleanly.

Pattern 3: Production error check before stopping

After Claude finishes a feature, check whether anything new broke in staging before marking the session complete:

{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "mcp_tool",
            "server": "sentry",
            "tool": "get_new_errors_since",
            "input": { "minutes": "5", "skip_if_active": "${stop_hook_active}" }
          }
        ]
      }
    ]
  }
}

If new errors appeared in the last five minutes, the MCP tool returns them with decision: "block". Claude reads the error details and fixes the regression before stopping.

Pattern 4: Auto-inject docs before every prompt

A UserPromptSubmit hook with a Context7 MCP fetches live documentation for any library mentioned in the prompt, before Claude processes it:

{
  "hooks": {
    "UserPromptSubmit": [
      {
        "hooks": [
          {
            "type": "mcp_tool",
            "server": "context7",
            "tool": "get_library_docs",
            "input": { "prompt": "${prompt}" },
            "timeout": 15
          }
        ]
      }
    ]
  }
}

Previously this required Claude to explicitly call the MCP tool. Now it happens on every prompt automatically. Claude starts with current docs instead of training data.

Pattern 5: Policy enforcement for agent teams

When running multi-agent workflows, a shared policy MCP server can enforce which agent writes to which directories:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Write|Edit",
        "hooks": [
          {
            "type": "mcp_tool",
            "server": "policy-server",
            "tool": "check_write_permission",
            "input": {
              "agent": "${agent_name}",
              "path": "${tool_input.file_path}"
            }
          }
        ]
      }
    ]
  }
}

Update the server once and every agent in every project inherits the new rules. No touching individual settings.json files.

Pattern 6: MCP tool hooks in agent frontmatter

Hooks can live in an agent's YAML frontmatter, scoped to that agent's lifecycle:

---
name: backend-developer
description: "Builds API endpoints and database logic"
hooks:
  PostToolUse:
    - matcher: "Write"
      hooks:
        - type: mcp_tool
          server: semgrep
          tool: scan_file
          input: { "path": "${tool_input.file_path}" }
  Stop:
    - hooks:
        - type: agent
          prompt: "Verify all API endpoints have corresponding tests. Block if any are missing."
---

Each specialist agent in an orchestrated team carries its own validation logic. The backend agent scans for security issues. The frontend agent checks accessibility. Neither needs a global hook that applies to everyone.

MCP servers worth pairing with hooks

Server	Event	What it does
Semgrep	PostToolUse: Write	Security scan on every write
Sentry	Stop	Check for new staging errors before completing
Linear / Jira	Stop	Verify ticket status, update on completion
Context7	UserPromptSubmit	Auto-fetch live docs for mentioned libraries
ElevenLabs	Stop	TTS audio on task completion
Slack	Notification, Stop	Team alerts without curl boilerplate
E2B	Stop	Run generated scripts in a sandbox before marking done
claude-mem	PostCompact, SessionStart	Restore session context after compaction
n8n	TaskCompleted	Trigger an external workflow on completion

Known issue: PostToolUse + MCP events + additionalContext

There is an open bug (GitHub issue #24788) where additionalContext from hooks gets silently dropped when the triggering event was an MCP tool call. This affects type: "command" hooks that respond to MCP tool events, not mcp_tool hooks themselves.

The distinction matters: hooks that are MCP invocations work fine. Hooks that respond to MCP tool calls and return additionalContext do not. Workaround is exit 2 plus stderr for critical messages. The blocking pattern works. Advisory injection does not.

The hook system's last missing piece

Before this, hooks were a safety net. Shell commands that could block dangerous things or run formatters. Stateless, process-local, disconnected from everything your MCP servers already know.

After: hooks are a deterministic orchestration layer. Any event, any MCP tool, full decision control, with state that persists across calls and no subprocess overhead.

PreToolUse validates. PostToolUse formats and scans. PostToolBatch runs tests. Stop verifies with real external data. Every step can be an MCP tool invocation. None of them require a shell script.

Full reference with schema, substitution syntax, and all event types: https://buildthisnow.com/blog/tools/hooks/mcp-tool-hooks

5 Frontier Models Compared

speedy_devv — Thu, 23 Apr 2026 19:43:03 +0000

I kept picking the wrong model.

Not because I didn't know the benchmarks. Because the benchmarks don't tell you what a model actually costs when you're running it daily, or whether it holds up across a 3-hour agent session, or whether it can fit your whole codebase without truncating half of it.

Five frontier models shipped in early 2026. All of them are good. None of them is good at everything. DeepSeek V3.2 costs $1 per million input tokens. GPT-5.4 costs $2.50 for the same volume. That is a 2.5x spread at the top of the lineup, and the cheaper model is not always worse for the job at hand.

Here is what I learned after running them all.

The price gap nobody talks about

The headline numbers first, because they set the frame.

Model	Input / Output (per 1M tokens)	Context Window
Claude Opus 4.7	$5 / $25	1M tokens
GPT-5.4	$2.50 / $15	256K tokens
Kimi K2.6	$3 / $15	512K tokens
Gemini 3.1 Pro	$2 / $12	2M tokens
DeepSeek V3.2	$1 / $4	128K tokens

The price gap is real. DeepSeek V3.2 costs a fifth of what Opus 4.7 costs per input token. Context windows vary by 16x from smallest to largest. DeepSeek's 128K window handles a medium codebase. Gemini's 2M window fits an entire monorepo.

These gaps are not footnotes. For the right workloads, they are the whole decision.

Coding: where the separation actually shows up

The standard benchmark is SWE-Bench, real GitHub issues where the model writes a fix that passes the test suite. Good benchmark. It skews toward clean, well-specified problems.

CursorBench runs a different evaluation. Real prompts from Cursor users. Messy, underspecified, half-broken codebases. The kind of problems actual developers bring to an AI every day.

Opus 4.7 leads CursorBench at 70%. GPT-5.4 comes close at 68% on SWE-Bench. On clean, well-defined problems the two are nearly even. On messy problems, the gap widens.

What makes Opus 4.7 different on hard coding tasks is self-correction. Most models generate code, declare it done, and move on. Opus 4.7 reviews what it just wrote, spots the type error or logic gap, and fixes it in the same pass. One fewer debugging loop per session adds up across a week of engineering work. I noticed it first on a nasty legacy codebase with no tests and inconsistent patterns: Opus 4.7 held the thread across multiple refactoring steps where others started drifting.

Gemini 3.1 Pro scores 63% on SWE-Bench and is a solid coding model when the task requires pulling context from a large codebase. The 2M window means it can read the whole thing. Where it falls behind is on complex reasoning chains where the model has to hold a long chain of logic without losing it.

DeepSeek V3.2 at 52% is surprisingly capable on standard implementation tasks for its price. Clear prompt, unambiguous problem, it delivers. It does not belong on hard, ambiguous work, and it mostly knows that.

Long documents: two different dimensions

Context window size and document reasoning quality are separate things. A huge window is useless if the model loses the plot. Strong reasoning is limited if the document doesn't fit.

Gemini 3.1 Pro's 2M context is genuinely useful for real workloads: a large monorepo, a full set of legal contracts, a year of financial filings. Nothing gets truncated. If the task is "read everything and extract what matters," Gemini is the right tool.

Opus 4.7's edge is accuracy over what it reads. On dense source material, it produces 21% fewer errors than its predecessor. That gap shows up most clearly in legal and financial work where a wrong clause or misread number has consequences. You can fit more raw text into Gemini, but Opus 4.7 does more with the text it reads.

A practical combination for large, high-stakes documents: Gemini 3.1 Pro for the initial pass across the full document, Opus 4.7 for the sections that require careful reasoning. Full picture from Gemini, accuracy from Opus on the parts that matter.

Multi-step agents: where the real separation is

Agent tasks are where the gap between models becomes undeniable. A model that is great at one-shot prompts can fall apart when it has to run for 20 steps, use tools, and keep track of what it already did.

The failure mode looks the same across models: the agent starts losing coherence around step 10 to 15. It forgets what it already checked. It tries an approach it already tried. It produces a "done" message when the task is half-finished.

Opus 4.7 stays coherent across hours of work. It has the lowest tool error rate of the group. When a tool call returns an unexpected result, it adjusts rather than proceeding on a false assumption. The practical payoff: you can set Opus 4.7 on a multi-hour task, walk away, and come back to actual results.

GPT-5.4 is strong on short chains, 3 to 5 steps, well-defined, fast. It is the fastest model in this group, which matters for interactive workflows where you are watching and course-correcting in real time. At the long end, reliability drops compared to Opus 4.7.

DeepSeek V3.2 is the right call for lightweight agent work at volume. Bulk tagging, classification pipelines, structured extraction from well-formatted documents. Running 10M tokens through DeepSeek instead of Opus saves about $61 per batch.

What it actually costs per real workload

Headline prices only tell half the story. The actual cost depends on what you are running.

Daily coding sessions (roughly 200K tokens each):

Model	Cost per Session
DeepSeek V3.2	$0.26
Gemini 3.1 Pro	$0.75
Kimi K2.6	$0.90
GPT-5.4	$1.60
Opus 4.7	$1.75

For coding sessions, DeepSeek is nearly 7x cheaper than Opus 4.7. GPT-5.4 and Opus are actually close in per-session cost — GPT-5.4 wins on speed, Opus wins on hard problems.

High-volume automation (10M tokens per month):

Model	Monthly Cost
DeepSeek V3.2	$14
Gemini 3.1 Pro	$35
Kimi K2.6	$39
Opus 4.7	$75
GPT-5.4	$78

At bulk volumes, DeepSeek is in a different price category. $14 versus $78 for the same token volume is a fundamentally different operating cost. Gemini 3.1 Pro at $35/month is the surprise here: 2M context at less than half the price of Opus.

The default pair for most builders

Opus 4.7 handles the tasks where quality decides the outcome: hard coding, debugging legacy code, long agent runs, precise document analysis. DeepSeek V3.2 handles the tasks where volume and cost decide the outcome: bulk automation, classification, templated generation, anything with a clear spec.

Those two together cover 90% of what most builders actually need.

The other three have specific edges worth knowing. Gemini 3.1 Pro for any workload that needs a 2M context window at a competitive price. GPT-5.4 for fast interactive work on clean codebases. Kimi K2.6 for Chinese-language documents at a competitive price.

The question is never "which model is best." It is "which model is right for this task." Get that right and you spend less, finish faster, and fix fewer mistakes on the other side.

Full breakdown with all the benchmark tables and cost scenarios is here: buildthisnow.com/blog/models/2026-04-21-opus47-vs-frontier

Claude Opus 4.7: What Actually Changed for Agentic Coding

speedy_devv — Thu, 16 Apr 2026 17:46:11 +0000

Anthropic shipped Opus 4.7 on April 16, 2026. Same $5/$25 pricing as 4.6. Same 1M context. Same 128K output ceiling. Different model entirely when you put it in an agentic coding loop.

If you run Claude Code or build agent pipelines on the Claude API, this post is the migration brief I wish someone had handed me on release day. Benchmarks, the five behavioral shifts that actually matter, the new API features, and the breaking changes you will hit if you just flip the model ID.

The Benchmark Table First

| Benchmark              | Opus 4.6 | Opus 4.7              |
|------------------------|----------|-----------------------|
| CursorBench            | 58%      | 70%                   |
| Rakuten-SWE-Bench      | Baseline | 3x resolution         |
| XBOW visual-acuity     | 54.5%    | 98.5%                 |
| OfficeQA Pro errors    | Baseline | 21% fewer             |
| BigLaw Bench (Harvey)  | Lower    | 90.9% at high effort  |
| Notion Agent errors    | Baseline | 1/3 the errors        |
| Factory Droids         | Baseline | +10-15% task success  |
| Bolt long-running apps | Baseline | +10% best case        |
| CodeRabbit recall      | Baseline | +10%                  |

Rakuten's 3x is the one I keep coming back to. That is real production SWE tasks, not a synthetic eval. CursorBench moving 58 to 70 is what you actually feel in a Claude Code session: fewer rounds per feature.

Five Behavioral Shifts

1. Self-Verification Is New

Prior Claude models did not verify their own assumptions before acting unless you prompted them to. 4.7 does.

Vercel reports the model runs proofs on systems code before starting work. Hex reports it flags missing data instead of inventing plausible-but-wrong fallbacks, and resists dissonant-data traps that 4.6 falls for.

In a Claude Code session this looks like:

Before I write this migration, let me verify the actual shape
of the response object, because my assumption here might be wrong.

[reads file]
[runs grep]

Confirmed. Writing migration now.

You did not ask for that step. The model added it.

2. Long Runs Stay Coherent

Devin reports 4.7 works coherently for hours on hard problems instead of giving up. Genspark measured loop rates: prior models looped indefinitely on roughly 1 in 18 queries, 4.7 posts the highest quality-per-tool-call ratio they have ever measured.

Notion saw tool errors cut to a third of 4.6's rate on multi-step workflows. That is not a small optimization. That is a different failure mode.

3. Fewer Tool Calls, More Thinking

Default behavior shifted. 4.7 thinks more and acts less. Ramp reports less need for step-by-step guidance on cross-tool, cross-codebase debugging.

If you want the old tool-heavy behavior, raise effort:

claude --model claude-opus-4-7 --effort high
# or
claude --model claude-opus-4-7 --effort xhigh

4. Instructions Get Read Literally

Prompts that relied on 4.6 quietly filling in the gaps will hit unexpected behavior on 4.7. The fix is usually shorter and more explicit, not longer.

5. Fewer Subagents by Default

If your orchestrator relied on 4.6 fanning out aggressively to specialists, 4.7 will pull that back. You can still request parallel subagent work explicitly.

The `xhigh` Effort Tier

Adaptive thinking gained a fifth setting. Ordering is now:

low < medium < high < xhigh < max

xhigh sits between high and max. Claude Code raised the default to xhigh on all plans.

For API work, Anthropic recommends starting at xhigh and dialing down:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8192,
    thinking={"type": "adaptive", "effort": "xhigh"},
    messages=[{"role": "user", "content": "Your agentic task here"}]
)

Hex's quote is the cost-math line worth memorizing: "low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6." Same quality, lower tier, fewer tokens.

Task Budgets (Public Beta)

Task budgets are the feature I keep recommending first. An advisory cap on a full agentic loop: thinking, tool calls, tool results, final output. The model sees a running countdown and self-paces against it.

Minimum value is 20k tokens. Distinct from max_tokens, which is a hard ceiling the model never sees.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=8192,
    extra_headers={
        "anthropic-beta": "task-budgets-2026-03-13"
    },
    task_budget=200000,
    thinking={"type": "adaptive", "effort": "xhigh"},
    messages=[{"role": "user", "content": "Build this feature end to end"}]
)

The model can now plan: "I have 200k tokens to spend. I will use 30k on exploration, 90k on writing, 40k on review, and leave a buffer." A hard max_tokens just cuts off mid-thought.

Vision: 3x Resolution

First Claude model with high-resolution image support. Max image resolution went from 1568px / 1.15MP on prior models to 2576px / 3.75MP on 4.7. Roughly 3x the pixel budget.

What changes in practice:

Dense screenshots where small text has to be readable
Complex diagrams with nested labels
High-fidelity design mockups
Pointing, measuring, counting, bounding-box localization
Coordinates returned are 1-to-1 with real image pixels (no scale conversion)

Cost note: a full-resolution image can consume up to roughly 4,784 tokens, versus roughly 1,600 on prior models. Downsample when you do not need the fidelity.

Breaking Changes in the API

This is the section you have to read before flipping the model ID.

Extended Thinking Is Removed

# 4.6 and earlier (still valid on 4.6)
thinking={"type": "enabled", "budget_tokens": 8192}

# 4.7 (old shape returns 400)
thinking={"type": "adaptive", "effort": "xhigh"}

Adaptive thinking is now off by default on 4.7. A request with no thinking field runs with no thinking at all.

Sampling Parameters Are Removed

Setting temperature, top_p, or top_k to non-default values returns a 400:

# Returns 400 on 4.7
response = client.messages.create(
    model="claude-opus-4-7",
    temperature=0.7,  # <-- remove this
    top_p=0.9,        # <-- remove this
    messages=[...]
)

Drop them. Use prompting to shape behavior.

Thinking Content Is Omitted by Default

Thinking blocks still appear in the response stream but their thinking field is empty unless you opt in:

thinking={
    "type": "adaptive",
    "effort": "xhigh",
    "display": "summarized"
}

If your UI streamed thinking to users on 4.6, the new default reads as a long pause before output begins. Add display: "summarized" back.

Prefill Is Blocked

Prefilling assistant messages returns a 400. Use structured outputs or output_config.format instead. This one carries over from 4.6.

New Tokenizer

The same input can now map to 1.0x to 1.35x the prior token count, content-dependent. Re-run /v1/messages/count_tokens against your representative payloads and re-baseline:

max_tokens ceilings
Compaction triggers
Client-side token estimates
Cost projections

Automated Migration

Inside Claude Code, most of the above is handled for you. On the API side:

claude /claude-api migrate

Applies the changes across the codebase.

New Product Features

/ultrareview runs a review session reading through changes and flagging bugs a careful human reviewer would catch. Pro and Max users get three free ultrareviews. I have been using it as a pre-commit gate.

Auto mode for Max. Previously gated to Team and Enterprise. Now extends to Max subscribers. Longer runs with fewer interruptions, and with less risk than fully skipping permissions.

File-system memory improvements. Agents keeping a scratchpad, notes file, or structured memory store across turns make cleaner notes and actually leverage them on later tasks.

Pricing Did Not Move

Input:        $5.00 / 1M tokens
Output:      $25.00 / 1M tokens
Prompt cache: up to 90% savings
Batch:        50% savings

Two things will nudge your bill up:

New tokenizer (up to 1.35x input token count on some content)
High-resolution images (up to ~4,784 tokens each)

Task budgets and the effort parameter are how you trade back.

On GitHub Copilot, 4.7 went GA the same day on Pro+, Business, and Enterprise, carrying a 7.5x premium request multiplier through April 30 as promotional pricing.

Switching Claude Code to 4.7

# Set as default
claude config set model claude-opus-4-7

# Or override per session
claude --model claude-opus-4-7

The opus alias in Claude Code points at it on current releases. Available on claude.ai, Claude Platform API, AWS Bedrock (research preview), Google Cloud Vertex AI, Microsoft Foundry, and GitHub Copilot.

When to Reach for It

4.7 when reasoning depth, long agent runs, or high-resolution screenshots are what the job needs. Sonnet 4.6 is still the right call for smaller, faster work where speed and cost decide the tradeoff.

Full breakdown on the webapp: buildthisnow.com/blog/models/claude-opus-4-7

Build your own agentic SaaS with Claude Code in 48 hours: buildthisnow.com