Most of what's written about AI-assisted development cost is theoretical. "Could save 50%". "Up to 10x faster". "Game-changing efficiency". Useful for slide decks, useless for budgeting.
Here are three projects from the last few months with the actual numbers. Hours billed, tokens consumed, what AI agents did, what they did not do, and what the final invoice looked like. Names are abstracted (NDA-protected) but the numbers are real.
Project A: SaaS dashboard MVP
The ask: a B2B SaaS dashboard for a small ops team. Backend in Go, frontend in Next.js, Postgres for data, Stripe for billing. Standard CRUD plus 8 reports plus a roles-and-permissions layer.
What was quoted
Quoted at 4 weeks. The discovery call surfaced 22 distinct features. I priced it as a fixed-rate engagement based on previous similar projects, with a stated ±20% accuracy band.
- Quoted price: $9,800
- Quoted timeline: 28 days
- Quoted hours (internal estimate): 78 hours
What was actually delivered
- Delivered: 23 days (5 days under quote)
- Actual hours billed: 71 hours
- Claude Sonnet token cost (development only, not runtime): $141
- Lines of application code shipped: ~14,200
- Lines of test code shipped: ~4,400 (statement coverage: 91%)
- Bugs found in client UAT: 3 (1 critical, 2 cosmetic)
- Critical bug fix time: 4 hours including regression test
Where the time actually went
Of the 71 hours billed, the breakdown was roughly:
- Senior architecture decisions and spec refinement: ~9 hours
- Agent-supervised feature build (CRUD, reports, billing, auth): ~38 hours
- Direct senior code on the harder bits (roles-and-permissions logic, Stripe webhook retry, multi-tenant data isolation): ~14 hours
- Test writing and review: ~6 hours
- Deployment, monitoring setup, runbook: ~4 hours
The agents handled the boring 70% well. The senior owned the 30% that required real judgment.
What this teaches about pricing
A traditional freelance team at $75-100/hr would have charged ~$10,000-15,000 for this scope and delivered in 5-7 weeks. The agent-assisted version landed at $9,800, in 23 days, with 91% test coverage. The savings ratio is real but the marginal token cost ($141) is so small it does not even show up on the invoice as a line item. Clients care about the total number and the delivery date, not the token accounting.
Project B: Bug fix sprint on a Go service
The ask: an existing production Go service was crashing under specific concurrent load. The team had been hunting the bug for two weeks without success. The need was triage, fix, regression test, and a runbook so it would not happen again.
What was quoted
- Quoted price: $1,200 fixed
- Quoted timeline: 2 days
- Quoted hours (internal estimate): 14 hours
This was priced as a sprint because the bug was contained and the team had already done the partial reproduction work.
What was actually delivered
- Delivered: 6 hours total elapsed
- Actual hours billed: 6 hours
- Claude Sonnet token cost: $8.30
- Outcome: race condition in a worker-pool goroutine fixed; regression test added; root cause documented in the team runbook
Where the time actually went
- Reading the existing code and the team's investigation notes: ~1.5 hours (human only; agents are bad at "read this 8000-line repo and tell me what's wrong")
- Running the race detector on suspicious paths (
go test -race -count=10 ./pool/...): ~0.5 hours - Reproducing the bug locally with a stress harness: ~1 hour
- Writing the fix (mutex around the shared resource that was being mutated outside the worker's own slot): ~0.5 hours
- Writing a deterministic regression test for the race: ~1 hour
- Writing the runbook entry: ~1.5 hours
What this teaches about pricing
Bug fix sprints are the wrong place to apply heavy AI assistance. They are 80% reading and 20% writing. The reading is human work; agents cannot reliably hold 8,000 lines of context and reason about the system's invariants. The writing portion (the fix itself, the regression test, the runbook) is where AI helps, but it is the smaller half.
Pricing reality: I bill bug fix sprints close to a senior hourly rate even though the elapsed hours are short. The work that makes the fix possible is the years of experience that lets a senior glance at a stack trace and know to look at the worker pool. Token cost? $8. The studio's overhead for that hour of focused work? Multiples of that.
Project C: LLM-powered support agent for an e-commerce site
The ask: an e-commerce business wanted an AI support agent that answered customer questions from their product catalog plus a small knowledge base. Integration with their existing helpdesk for fallback handoff to humans.
What was quoted
- Quoted price: $5,400 for phase 1 (the bot + the integration)
- Phase 2 (analytics dashboard, multi-language support, retainer) deferred until phase 1 shipped
- Quoted timeline: 3 weeks
- Quoted hours (internal estimate): 44 hours
What was actually delivered
- Delivered phase 1: 18 days (3 days under quote)
- Actual hours billed: 41 hours
- Claude Sonnet token cost (development only): $96
- Phase 2 ongoing as a $1,800/month retainer
Where the time actually went
The work split was different from the SaaS dashboard because the actual product was an LLM agent. Hours roughly:
- Knowledge base preprocessing and chunking strategy: ~6 hours (human only; this is judgment work — which fields to include, how to handle product variants)
- Retrieval-augmented generation pipeline: ~10 hours (agents wrote most of it, senior reviewed embeddings model choice and reranking strategy)
- Helpdesk integration (webhook in, conversation handoff API out): ~8 hours
- Frontend chat widget: ~6 hours (agents handled almost all of this)
- Eval harness (sample 50 real customer questions, measure helpfulness, accuracy, escalation rate): ~9 hours (mostly human work — eval design is hard to delegate)
- Deployment and monitoring: ~2 hours
What this teaches about pricing
LLM projects have a specific cost shape. The retrieval pipeline and integration compress heavily with agent help (60% efficiency gain). The eval harness, the knowledge base preprocessing, and the prompt engineering decisions stay heavily human (negligible efficiency gain from AI tools, because the work is judgment, not boilerplate).
The blended rate ends up being similar to the SaaS dashboard rate, even though the project felt more "AI-native". This surprised me. The first three LLM projects I built I priced too low because I assumed AI tools would compress the LLM-engineering work the most. They actually compressed the surrounding infrastructure work the most. The LLM-engineering itself stays expensive because it is judgment-heavy.
What three projects show in aggregate
| Project | Quoted | Delivered hours | Token cost | Days early/late | Margin vs traditional |
|---|---|---|---|---|---|
| SaaS dashboard | $9,800 | 71h | $141 | 5 days early | ~35% reduction |
| Bug fix sprint | $1,200 | 6h | $8 | Same week | Same price; faster |
| LLM support agent | $5,400 | 41h | $96 | 3 days early | ~25% reduction |
Three patterns:
Boilerplate-heavy projects compress the most. The SaaS dashboard had ~35% time savings vs traditional because most of the build was standard CRUD plus reports. Agents handle that well.
Reading-heavy projects compress the least. The bug fix sprint compressed almost zero on the thinking time. The fix took 30 minutes; the understanding took 5 hours. Agents do not help with understanding existing systems much yet.
Judgment-heavy projects compress unevenly. The LLM agent project compressed the integration work but not the eval design. Token costs were tiny in all three cases ($8 to $141), making them invisible on invoices.
The token-cost question, answered with real numbers
Three projects, total token spend: $245.30.
Three projects, total client invoices: $16,400.
Token cost as a percentage of revenue: 1.5%.
This is the part of AI-assisted development that hype articles get most wrong. Tokens are not the meaningful expense. The meaningful expense is the senior judgment that directs the agents and reviews their output. If anyone tries to sell you AI development priced primarily on "low token cost", the math is upside down. Real AI-assisted projects bill the senior judgment, not the tokens.
What does NOT compress with AI agents
Independent of project type, these consistently took the same amount of time as a fully-human project would have:
- Discovery calls and spec refinement. Agents do not run client calls.
- Stakeholder communication during the build. Async status emails, slack threads, the project-status documents.
- Debugging that requires reproducing client-specific environment issues.
- Compliance reviews (HIPAA, GDPR, PCI). The standards have not changed because AI helps you write code.
- Code review of changes that touch the riskiest 5% of the codebase (auth, billing, data deletion paths).
A studio that quotes 50% off because "AI" without specifying which parts of the work compressed is either overestimating the savings or underestimating the work.
The take
AI-assisted development is real. The compression is real. The numbers above are not marketing copy; they came off three actual invoices and three actual token spends in 2026. Token cost is real but small. Calendar time savings are real and meaningful.
The savings are not uniform across project types. Boilerplate-heavy work compresses dramatically. Reading-heavy work barely compresses. Judgment-heavy work compresses unevenly.
The right way to price an AI-assisted project is the same as the right way to price any project: count the hours by category (boilerplate vs reading vs judgment), apply the right multiplier per category, add a token line item for transparency, and quote a fixed price with a ±20% accuracy band.
The wrong way is to quote 50% off and hope you can deliver on it.
If you want the full reasoning behind the cost variables and the worked example for a similar dashboard project, the deeper budgeting framework is at the canonical URL in the post header.
Top comments (0)