Aria13

Posted on May 8 • Edited on May 9 • Originally published at forge.closerhub.app

The AI Project Pricing & Scoping Playbook Nobody Wrote (But Every Dev Consultant Needs)

#showdev #discuss #freelance #tutorial

📥 TL;DR — Want the complete playbook? This article covers the concepts. The full guide includes production-ready frameworks, real examples, and actionable checklists.

→ Get the guide — 12€, instant PDF · 30-day refund

You land a client. They want "an AI assistant that handles customer support." You quote $8,000. Three months later you're still adding features, the budget is gone, and you've built four completely different versions of a system that still doesn't match what they actually wanted.

This happens constantly in AI consulting. Not because the devs are bad — because AI projects have a unique failure mode that standard freelance scoping doesn't account for. Here's the framework I use now.

Why AI Projects Eat Margins Faster Than Normal Builds

Traditional software has predictable failure modes. Scope creep is bad, but at least you know when a button exists or doesn't. AI projects fail differently:

Output quality is subjective. "Good enough" is never defined upfront. The client sees the first prototype and says "it's close, but can it sound a bit more human?" That sentence is worth 40 hours.
Evals don't exist yet. You can't demo a passing test suite. You demo responses, and every demo triggers new requirements.
Infra costs are invisible until they're not. Token costs, vector DB hosting, fine-tuning compute — clients have no mental model for this. They assume it's like a SaaS tool.
LLM behavior changes. A model update can break carefully-tuned prompts. You didn't cause it, but you own it in the client's mind.

The fix isn't better project management. It's a different contract structure.

The Three-Zone Scoping Method

Before writing a proposal, split the project into three zones:

Zone 1 — Fixed Core
The deterministic, verifiable piece. An ingestion pipeline that processes PDFs, a RAG retriever that returns top-5 chunks, an API endpoint that accepts a query. This is buildable at a fixed price because you can write acceptance criteria. Price this as a normal dev project.

Zone 2 — Prompt Engineering & Tuning
Everything involving model behavior, output quality, and evaluation. This is never fixed-price. Price it hourly with a cap, or as a separate retainer. Spell this out explicitly: "Prompt iteration and quality tuning is billed at $X/hour, capped at Y hours per month."

Zone 3 — Operational Unknowns
Token costs, third-party API changes, rate limits, fine-tuning runs. These are client expenses, not yours. Put a line in every proposal that says infrastructure and third-party API costs are billed at cost plus a 15% handling fee, invoiced monthly.

Most burned consultants priced Zone 2 and Zone 3 as if they were Zone 1.

The Acceptance Criteria Problem (and How to Solve It)

The most dangerous sentence in any AI project brief: "it should respond appropriately."

Appropriate according to whom? In what context? Measured how?

Your job before you write a proposal is to force quantified acceptance criteria. Here's how I do it in discovery:

Ask for 10 example inputs and ideal outputs. If they can't produce them, they're not ready to build. Charge a discovery sprint first.
Define a pass rate, not perfection. "The model should produce an acceptable response on at least 85% of test cases from our agreed evaluation set." This gives you a finish line.
Lock the evaluation set. 20-50 examples, agreed upon before work starts, written into the contract. New examples = change order.

This single habit eliminates the "it's not quite right" treadmill that kills AI project margins.

Pricing Models That Actually Work

Three structures I've used, with honest trade-offs:

Phased Fixed + Hourly Tail
Phase 1 (infrastructure + integration) at a fixed price. Phase 2 (tuning, evaluation, iteration) at hourly with a cap. Works well when the client has budget certainty needs and you have a clear technical scope for Phase 1.

Milestone-Based with Quality Gates
Each milestone has a defined deliverable AND acceptance criteria. Client pays on milestone completion. If criteria aren't met, you iterate — but you log hours during iteration and if you exceed a threshold, a change order kicks in. Good for longer engagements.

Productized AI Package
You define exactly what you build: one RAG pipeline, one fine-tuned classifier, one integration endpoint. Tested against a fixed evaluation set you provide. Nothing more. Clients self-select; it's easier to sell and scope. The downside is you leave money on the table for clients who genuinely need more.

For most $5K–$25K AI projects, the phased approach is the safest margin-protector.

Protecting Yourself When the Model Changes

Put this clause in every AI project contract, word for word or close to it:

"AI model provider updates (including changes to model behavior, output format, or availability) occurring after project delivery are not covered under this agreement. Support for model-related regressions post-delivery is available under a separate maintenance retainer."

OpenAI and Anthropic push model updates constantly. GPT-4 Turbo behaved differently than GPT-4. Claude 3 Opus prompts don't always transfer cleanly to Claude 3.5 Sonnet. This isn't negligence on your part — it's the nature of the dependency. Protect yourself contractually before it comes up.

Also: pin model versions in your code. gpt-4-0125-preview, not gpt-4. This gives you a stable target and makes any "it stopped working" conversation factual rather than defensive.

The Pre-Build Checklist That Saves Every Project

Before writing a single line of code, I verify these six things:

[ ] Evaluation set agreed and signed off (minimum 20 examples)
[ ] Pass rate threshold defined (e.g., 80% acceptable on eval set)
[ ] Infrastructure costs separated from dev costs in the proposal
[ ] Prompt iteration explicitly in Zone 2 (hourly or retainer)
[ ] Model version pinned and documented
[ ] Post-delivery maintenance terms in the contract

If any of these are missing at kickoff, I don't start. I schedule a 30-minute alignment call instead. That call is free. The scope creep it prevents is not.

The biggest shift in my consulting revenue came when I stopped treating AI projects as "software with a fancy API call" and started scoping them as a separate category with their own risk profile. Token costs are operational expenses. Prompt quality is a service, not a deliverable. Model updates are a maintenance surface, not a bug.

Get the structure right before the first commit and you protect both your margin and the client relationship.

I compiled everything into a practical guide: AI Project Pricing & Scoping Playbook

→ Get the full guide on Gumroad — instant PDF download

DEV Community