BaoDev Studio

Posted on May 18 • Originally published at baodev.studio

AI-assisted development cost breakdown — real numbers from 3 projects

#ai #programming #freelancing #productivity

Most of what's written about AI-assisted development cost is theoretical. "Could save 50%". "Up to 10x faster". "Game-changing efficiency". Useful for slide decks, useless for budgeting.

Here are three projects from the last few months with the actual numbers. Hours billed, tokens consumed, what AI agents did, what they did not do, and what the final invoice looked like. Names are abstracted (NDA-protected) but the numbers are real.

Project A: SaaS dashboard MVP

The ask: a B2B SaaS dashboard for a small ops team. Backend in Go, frontend in Next.js, Postgres for data, Stripe for billing. Standard CRUD plus 8 reports plus a roles-and-permissions layer.

What was quoted

Quoted at 4 weeks. The discovery call surfaced 22 distinct features. I priced it as a fixed-rate engagement based on previous similar projects, with a stated ±20% accuracy band.

Quoted price: $9,800
Quoted timeline: 28 days
Quoted hours (internal estimate): 78 hours

What was actually delivered

Delivered: 23 days (5 days under quote)
Actual hours billed: 71 hours
Claude Sonnet token cost (development only, not runtime): $141
Lines of application code shipped: ~14,200
Lines of test code shipped: ~4,400 (statement coverage: 91%)
Bugs found in client UAT: 3 (1 critical, 2 cosmetic)
Critical bug fix time: 4 hours including regression test

Where the time actually went

Of the 71 hours billed, the breakdown was roughly:

Senior architecture decisions and spec refinement: ~9 hours
Agent-supervised feature build (CRUD, reports, billing, auth): ~38 hours
Direct senior code on the harder bits (roles-and-permissions logic, Stripe webhook retry, multi-tenant data isolation): ~14 hours
Test writing and review: ~6 hours
Deployment, monitoring setup, runbook: ~4 hours

The agents handled the boring 70% well. The senior owned the 30% that required real judgment.

What this teaches about pricing

A traditional freelance team at $75-100/hr would have charged ~$10,000-15,000 for this scope and delivered in 5-7 weeks. The agent-assisted version landed at $9,800, in 23 days, with 91% test coverage. The savings ratio is real but the marginal token cost ($141) is so small it does not even show up on the invoice as a line item. Clients care about the total number and the delivery date, not the token accounting.

Project B: Bug fix sprint on a Go service

The ask: an existing production Go service was crashing under specific concurrent load. The team had been hunting the bug for two weeks without success. The need was triage, fix, regression test, and a runbook so it would not happen again.

What was quoted

Quoted price: $1,200 fixed
Quoted timeline: 2 days
Quoted hours (internal estimate): 14 hours

This was priced as a sprint because the bug was contained and the team had already done the partial reproduction work.

What was actually delivered

Delivered: 6 hours total elapsed
Actual hours billed: 6 hours
Claude Sonnet token cost: $8.30
Outcome: race condition in a worker-pool goroutine fixed; regression test added; root cause documented in the team runbook

Where the time actually went

Reading the existing code and the team's investigation notes: ~1.5 hours (human only; agents are bad at "read this 8000-line repo and tell me what's wrong")
Running the race detector on suspicious paths (go test -race -count=10 ./pool/...): ~0.5 hours
Reproducing the bug locally with a stress harness: ~1 hour
Writing the fix (mutex around the shared resource that was being mutated outside the worker's own slot): ~0.5 hours
Writing a deterministic regression test for the race: ~1 hour
Writing the runbook entry: ~1.5 hours

What this teaches about pricing

Bug fix sprints are the wrong place to apply heavy AI assistance. They are 80% reading and 20% writing. The reading is human work; agents cannot reliably hold 8,000 lines of context and reason about the system's invariants. The writing portion (the fix itself, the regression test, the runbook) is where AI helps, but it is the smaller half.

Pricing reality: I bill bug fix sprints close to a senior hourly rate even though the elapsed hours are short. The work that makes the fix possible is the years of experience that lets a senior glance at a stack trace and know to look at the worker pool. Token cost? $8. The studio's overhead for that hour of focused work? Multiples of that.

Project C: LLM-powered support agent for an e-commerce site

The ask: an e-commerce business wanted an AI support agent that answered customer questions from their product catalog plus a small knowledge base. Integration with their existing helpdesk for fallback handoff to humans.

What was quoted

Quoted price: $5,400 for phase 1 (the bot + the integration)
Phase 2 (analytics dashboard, multi-language support, retainer) deferred until phase 1 shipped
Quoted timeline: 3 weeks
Quoted hours (internal estimate): 44 hours

What was actually delivered

Delivered phase 1: 18 days (3 days under quote)
Actual hours billed: 41 hours
Claude Sonnet token cost (development only): $96
Phase 2 ongoing as a $1,800/month retainer

Where the time actually went

The work split was different from the SaaS dashboard because the actual product was an LLM agent. Hours roughly:

Knowledge base preprocessing and chunking strategy: ~6 hours (human only; this is judgment work — which fields to include, how to handle product variants)
Retrieval-augmented generation pipeline: ~10 hours (agents wrote most of it, senior reviewed embeddings model choice and reranking strategy)
Helpdesk integration (webhook in, conversation handoff API out): ~8 hours
Frontend chat widget: ~6 hours (agents handled almost all of this)
Eval harness (sample 50 real customer questions, measure helpfulness, accuracy, escalation rate): ~9 hours (mostly human work — eval design is hard to delegate)
Deployment and monitoring: ~2 hours

What this teaches about pricing

LLM projects have a specific cost shape. The retrieval pipeline and integration compress heavily with agent help (60% efficiency gain). The eval harness, the knowledge base preprocessing, and the prompt engineering decisions stay heavily human (negligible efficiency gain from AI tools, because the work is judgment, not boilerplate).

The blended rate ends up being similar to the SaaS dashboard rate, even though the project felt more "AI-native". This surprised me. The first three LLM projects I built I priced too low because I assumed AI tools would compress the LLM-engineering work the most. They actually compressed the surrounding infrastructure work the most. The LLM-engineering itself stays expensive because it is judgment-heavy.

What three projects show in aggregate

Project	Quoted	Delivered hours	Token cost	Days early/late	Margin vs traditional
SaaS dashboard	$9,800	71h	$141	5 days early	~35% reduction
Bug fix sprint	$1,200	6h	$8	Same week	Same price; faster
LLM support agent	$5,400	41h	$96	3 days early	~25% reduction

Three patterns:

Boilerplate-heavy projects compress the most. The SaaS dashboard had ~35% time savings vs traditional because most of the build was standard CRUD plus reports. Agents handle that well.
Reading-heavy projects compress the least. The bug fix sprint compressed almost zero on the thinking time. The fix took 30 minutes; the understanding took 5 hours. Agents do not help with understanding existing systems much yet.
Judgment-heavy projects compress unevenly. The LLM agent project compressed the integration work but not the eval design. Token costs were tiny in all three cases ($8 to $141), making them invisible on invoices.

The token-cost question, answered with real numbers

Three projects, total token spend: $245.30.

Three projects, total client invoices: $16,400.

Token cost as a percentage of revenue: 1.5%.

This is the part of AI-assisted development that hype articles get most wrong. Tokens are not the meaningful expense. The meaningful expense is the senior judgment that directs the agents and reviews their output. If anyone tries to sell you AI development priced primarily on "low token cost", the math is upside down. Real AI-assisted projects bill the senior judgment, not the tokens.

What does NOT compress with AI agents

Independent of project type, these consistently took the same amount of time as a fully-human project would have:

Discovery calls and spec refinement. Agents do not run client calls.
Stakeholder communication during the build. Async status emails, slack threads, the project-status documents.
Debugging that requires reproducing client-specific environment issues.
Compliance reviews (HIPAA, GDPR, PCI). The standards have not changed because AI helps you write code.
Code review of changes that touch the riskiest 5% of the codebase (auth, billing, data deletion paths).

A studio that quotes 50% off because "AI" without specifying which parts of the work compressed is either overestimating the savings or underestimating the work.

The take

AI-assisted development is real. The compression is real. The numbers above are not marketing copy; they came off three actual invoices and three actual token spends in 2026. Token cost is real but small. Calendar time savings are real and meaningful.

The savings are not uniform across project types. Boilerplate-heavy work compresses dramatically. Reading-heavy work barely compresses. Judgment-heavy work compresses unevenly.

The right way to price an AI-assisted project is the same as the right way to price any project: count the hours by category (boilerplate vs reading vs judgment), apply the right multiplier per category, add a token line item for transparency, and quote a fixed price with a ±20% accuracy band.

The wrong way is to quote 50% off and hope you can deliver on it.

If you want the full reasoning behind the cost variables and the worked example for a similar dashboard project, the deeper budgeting framework is at the canonical URL in the post header.

DEV Community

AI-assisted development cost breakdown — real numbers from 3 projects

Project A: SaaS dashboard MVP

What was quoted

What was actually delivered

Where the time actually went

What this teaches about pricing

Project B: Bug fix sprint on a Go service

What was quoted

What was actually delivered

Where the time actually went

What this teaches about pricing

Project C: LLM-powered support agent for an e-commerce site

What was quoted

What was actually delivered

Where the time actually went

What this teaches about pricing

What three projects show in aggregate

The token-cost question, answered with real numbers

What does NOT compress with AI agents

The take

Top comments (0)