๐ฅ TL;DR โ Want the complete playbook? This article covers the concepts. The full guide includes production-ready frameworks, real examples, and actionable checklists.
โ Get the guide โ 12โฌ, instant PDF ยท 30-day refund
You land a client. They want "an AI assistant that handles customer support." You quote $8,000. Three months later you're still adding features, the budget is gone, and you've built four completely different versions of a system that still doesn't match what they actually wanted.
This happens constantly in AI consulting. Not because the devs are bad โ because AI projects have a unique failure mode that standard freelance scoping doesn't account for. Here's the framework I use now.
Why AI Projects Eat Margins Faster Than Normal Builds
Traditional software has predictable failure modes. Scope creep is bad, but at least you know when a button exists or doesn't. AI projects fail differently:
- Output quality is subjective. "Good enough" is never defined upfront. The client sees the first prototype and says "it's close, but can it sound a bit more human?" That sentence is worth 40 hours.
- Evals don't exist yet. You can't demo a passing test suite. You demo responses, and every demo triggers new requirements.
- Infra costs are invisible until they're not. Token costs, vector DB hosting, fine-tuning compute โ clients have no mental model for this. They assume it's like a SaaS tool.
- LLM behavior changes. A model update can break carefully-tuned prompts. You didn't cause it, but you own it in the client's mind.
The fix isn't better project management. It's a different contract structure.
The Three-Zone Scoping Method
Before writing a proposal, split the project into three zones:
Zone 1 โ Fixed Core
The deterministic, verifiable piece. An ingestion pipeline that processes PDFs, a RAG retriever that returns top-5 chunks, an API endpoint that accepts a query. This is buildable at a fixed price because you can write acceptance criteria. Price this as a normal dev project.
Zone 2 โ Prompt Engineering & Tuning
Everything involving model behavior, output quality, and evaluation. This is never fixed-price. Price it hourly with a cap, or as a separate retainer. Spell this out explicitly: "Prompt iteration and quality tuning is billed at $X/hour, capped at Y hours per month."
Zone 3 โ Operational Unknowns
Token costs, third-party API changes, rate limits, fine-tuning runs. These are client expenses, not yours. Put a line in every proposal that says infrastructure and third-party API costs are billed at cost plus a 15% handling fee, invoiced monthly.
Most burned consultants priced Zone 2 and Zone 3 as if they were Zone 1.
The Acceptance Criteria Problem (and How to Solve It)
The most dangerous sentence in any AI project brief: "it should respond appropriately."
Appropriate according to whom? In what context? Measured how?
Your job before you write a proposal is to force quantified acceptance criteria. Here's how I do it in discovery:
- Ask for 10 example inputs and ideal outputs. If they can't produce them, they're not ready to build. Charge a discovery sprint first.
- Define a pass rate, not perfection. "The model should produce an acceptable response on at least 85% of test cases from our agreed evaluation set." This gives you a finish line.
- Lock the evaluation set. 20-50 examples, agreed upon before work starts, written into the contract. New examples = change order.
This single habit eliminates the "it's not quite right" treadmill that kills AI project margins.
Pricing Models That Actually Work
Three structures I've used, with honest trade-offs:
Phased Fixed + Hourly Tail
Phase 1 (infrastructure + integration) at a fixed price. Phase 2 (tuning, evaluation, iteration) at hourly with a cap. Works well when the client has budget certainty needs and you have a clear technical scope for Phase 1.
Milestone-Based with Quality Gates
Each milestone has a defined deliverable AND acceptance criteria. Client pays on milestone completion. If criteria aren't met, you iterate โ but you log hours during iteration and if you exceed a threshold, a change order kicks in. Good for longer engagements.
Productized AI Package
You define exactly what you build: one RAG pipeline, one fine-tuned classifier, one integration endpoint. Tested against a fixed evaluation set you provide. Nothing more. Clients self-select; it's easier to sell and scope. The downside is you leave money on the table for clients who genuinely need more.
For most $5Kโ$25K AI projects, the phased approach is the safest margin-protector.
Protecting Yourself When the Model Changes
Put this clause in every AI project contract, word for word or close to it:
"AI model provider updates (including changes to model behavior, output format, or availability) occurring after project delivery are not covered under this agreement. Support for model-related regressions post-delivery is available under a separate maintenance retainer."
OpenAI and Anthropic push model updates constantly. GPT-4 Turbo behaved differently than GPT-4. Claude 3 Opus prompts don't always transfer cleanly to Claude 3.5 Sonnet. This isn't negligence on your part โ it's the nature of the dependency. Protect yourself contractually before it comes up.
Also: pin model versions in your code. gpt-4-0125-preview, not gpt-4. This gives you a stable target and makes any "it stopped working" conversation factual rather than defensive.
The Pre-Build Checklist That Saves Every Project
Before writing a single line of code, I verify these six things:
- [ ] Evaluation set agreed and signed off (minimum 20 examples)
- [ ] Pass rate threshold defined (e.g., 80% acceptable on eval set)
- [ ] Infrastructure costs separated from dev costs in the proposal
- [ ] Prompt iteration explicitly in Zone 2 (hourly or retainer)
- [ ] Model version pinned and documented
- [ ] Post-delivery maintenance terms in the contract
If any of these are missing at kickoff, I don't start. I schedule a 30-minute alignment call instead. That call is free. The scope creep it prevents is not.
The biggest shift in my consulting revenue came when I stopped treating AI projects as "software with a fancy API call" and started scoping them as a separate category with their own risk profile. Token costs are operational expenses. Prompt quality is a service, not a deliverable. Model updates are a maintenance surface, not a bug.
Get the structure right before the first commit and you protect both your margin and the client relationship.
I compiled everything into a practical guide: AI Project Pricing & Scoping Playbook
Top comments (0)