The AI Scaffolding Tax ๐ฐ: The Hidden 70% Nobody Warns You About When Building with LLMs
"The model is 30% of the work. The other 70% is everything around it. And nobody warned me."
You've seen the demos. A few lines of Python, an API call to GPT/Claude/Gemini, and boom โ magic. Your app can summarize documents, write code, answer questions.
Then you try to ship it to production. And reality hits like a freight train. ๐
The Demo-to-Production Gap Is a Chasm
Let me paint you a picture:
That clean openai.chat.completions.create() call? It's the tip of the iceberg. Beneath the surface lies what I call The Scaffolding Tax โ the massive layer of infrastructure code that exists solely because you chose to use an LLM.
Here's what your "simple AI feature" actually requires in production:
๐งฑ The Scaffolding Stack
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Your "Simple" Feature โ โ What stakeholders see
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Prompt Engineering & Versioning โ
โ Context Window Management โ
โ Token Counting & Budget Control โ
โ Multi-Provider Abstraction โ
โ Response Parsing & Validation โ
โ Retry Logic & Fallback Chains โ
โ Rate Limiting & Queueing โ
โ Logging & Observability โ
โ Guardrails & Content Filtering โ
โ Caching & Cost Optimization โ
โ A/B Testing Framework โ
โ Prompt Injection Defense โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ This is the Scaffolding Tax
Every single one of these is mandatory for production-grade AI apps. None of them have anything to do with your actual product.
Let's Talk Real Numbers ๐
I tracked the engineering hours across three AI-powered features we shipped last quarter:
| Component | Hours | % of Total |
|---|---|---|
| Core LLM logic (the "actual feature") | 38h | 28% |
| Context management & chunking | 22h | 16% |
| Error handling & retries | 18h | 13% |
| Guardrails & safety filters | 15h | 11% |
| Logging & observability | 12h | 9% |
| Token budgeting & cost controls | 10h | 7% |
| Multi-provider abstraction | 8h | 6% |
| Prompt versioning & testing | 7h | 5% |
| Caching layer | 5h | 4% |
| Total scaffolding | 97h | 72% |
Seventy-two percent. Almost three-quarters of our engineering effort went to infrastructure that exists only because we used an LLM. If we'd used a traditional algorithm for the same feature, those 97 hours would have been roughly 10.
That's the Scaffolding Tax. And it's due on every. single. feature.
The 12 Taxes You're Paying (Whether You Know It or Not)
1. ๐ค The Token Counting Tax
You can't just send text to an LLM. You need to count tokens before sending, because:
- Context windows have limits
- Costs are per-token
- Chunking strategies depend on token counts
So now you need a tokenizer library, a chunking algorithm, and budget enforcement โ for every provider you support.
# "Simple" code that took 3 days to get right
def chunk_text(text, max_tokens=8000, overlap=200):
tokens = tokenizer.encode(text)
chunks = []
i = 0
while i < len(tokens):
chunk_tokens = tokens[i:i + max_tokens]
chunks.append(tokenizer.decode(chunk_tokens))
i += max_tokens - overlap
return chunks
This looks simple. The edge cases are not. What about multi-byte characters? What about code blocks that shouldn't be split? What about markdown headers that need context? ๐คฏ
2. ๐ง The Context Management Tax
Your LLM has no memory. Every call is stateless. So you need to:
- Maintain conversation history
- Decide what to include (summarize? truncate? sliding window?)
- Handle the "context window is full" event gracefully
- Manage token budgets across multiple context sources
// The conversation manager nobody tells you about
class ConversationManager {
private history: Message[] = [];
private maxContextTokens: number;
private summaryCache: Map<string, string>;
async addMessage(msg: Message): Promise<Context> {
this.history.push(msg);
const totalTokens = this.countTokens(this.history);
if (totalTokens > this.maxContextTokens * 0.8) {
await this.compressHistory(); // โ This is a whole project
}
return this.buildContext();
}
// This method alone is 200+ lines in production
private async compressHistory() { /* ... */ }
}
3. ๐ The Multi-Provider Tax
"What if OpenAI goes down?" Your boss asks on day two.
So now you're abstracting across OpenAI, Anthropic, Google, and maybe a local model. Each has:
- Different API formats
- Different token limits
- Different rate limits
- Different error codes
- Different streaming behaviors
You're basically building a mini-cloud abstraction layer. For text completion. ๐ซ
4. ๐ก๏ธ The Guardrails Tax
Users will try to break your AI. They will:
- Ask it to reveal system prompts
- Try to make it say offensive things
- Attempt prompt injection attacks
- Feed it adversarial inputs
So you need:
- Input sanitization
- Output filtering
- Topic restrictions
- PII detection
- Prompt injection detection
Each of these is a mini-project with its own edge cases.
5. ๐ธ The Cost Control Tax
"Can you add AI to this feature?" = "Can you add unpredictable variable costs to this feature?"
You need:
- Per-user token budgets
- Per-feature cost tracking
- Alert thresholds
- Graceful degradation when budget is hit
- Cost attribution (which feature is burning money?)
6. ๐ The Observability Tax
When your AI gives a bad answer, how do you debug it?
You need:
- Full request/response logging (with PII redaction)
- Prompt version tracking
- Token usage metrics
- Latency percentiles
- Error rate monitoring
- Quality score tracking
Traditional software has logs. AI software needs a forensic lab.
7. ๐ The Retry & Fallback Tax
LLMs are nondeterministic. They:
- Time out
- Return malformed JSON
- Refuse valid requests
- Rate-limit you unexpectedly
- Occasionally hallucinate wildly
You need retry logic with exponential backoff, circuit breakers, fallback chains (try GPT-4 โ try Claude โ try Gemini โ cache โ degrade gracefully), and response validation.
8. โฑ๏ธ The Latency Tax
LLM calls are slow. 1-30 seconds slow. So your entire UX architecture changes:
- Streaming responses become mandatory
- Loading states need to be thoughtful
- Optimistic UI patterns are essential
- Background processing becomes the norm
- Users need progress indicators, not spinners
9. ๐งช The Testing Tax
How do you write unit tests for nondeterministic output?
You don't. Not really. You build:
- Semantic similarity tests
- Golden dataset evaluations
- Human eval pipelines
- Regression test suites for prompts
- A/B testing infrastructure
Testing AI is fundamentally different from testing traditional software. And it's harder.
10. ๐ฆ The Prompt Versioning Tax
Prompts are code. They need:
- Version control
- A/B testing
- Rollback capability
- Environment-specific variants (dev/staging/prod)
- Performance tracking per version
Most teams store prompts in strings. In code. Mixed with business logic. It works until it doesn't. ๐
11. ๐ The Security Tax
Your AI processes user input. That input might contain:
- Prompt injection attempts
- Data exfiltration payloads
- Adversarial examples
- PII that needs protection
You need input validation, output sanitization, access controls, and audit logging โ all specific to the LLM context.
12. ๐งฉ The Integration Tax
Your AI doesn't live in isolation. It needs to:
- Call functions / use tools
- Access your database
- Respect user permissions
- Integrate with existing workflows
- Handle authentication
Each integration point multiplies the scaffolding complexity.
The Real Architecture of an AI Feature
Here's what the architecture actually looks like for a "simple" AI-powered search feature:
User Query
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Input Validation โ โ Prompt injection defense
โ PII Detection โ โ Privacy compliance
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Context Assembly Engine โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโ โ
โ โ User Prefs โ โ History โ โ Knowledge โ โ
โ โ (filtered) โ โ (summed) โ โ (RAG'd) โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโ โ
โ Token Budget Manager โ โ Counts, truncates, prioritizes
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Provider Router โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โ
โ โ OpenAI โ โ Claude โ โ Gemini โ โ
โ โ (fast) โ โ (smart) โ โ (cheap) โ โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โ
โ Rate Limiter โ Circuit Breaker โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Response Pipeline โ
โ Parse โ Validate โ Filter โ Transform โ
โ Log โ Track โ Cache โ Attribute Cost โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Output Delivery โ
โ Stream to client โ Update state โ Notify โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
This is for one feature. One query box. One AI call.
Now multiply this by every AI-powered feature in your product. ๐
How to Survive the Scaffolding Tax
Okay, doom and gloom over. Here's how smart teams are managing this:
โ 1. Build a Scaffolding Layer First, Features Second
Don't build scaffolding per-feature. Build a shared AI platform layer:
// Instead of this per-feature:
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [...],
});
// Build this once:
const result = await aiPlatform.complete({
task: "summarize-document",
input: document,
userId: user.id,
budget: { maxTokens: 4000, maxCost: 0.05 },
fallback: "cache-or-degrade",
});
The platform handles routing, budgeting, logging, retries, and guardrails. Features just declare intent.
โ 2. Use AI Middleware Libraries
The ecosystem is catching up. Tools that handle the scaffolding:
- LangChain / LlamaIndex โ Context management, chains, agents
- Guardrails AI โ Output validation and structured extraction
- LiteLLM โ Multi-provider abstraction (100+ providers, one API)
- LangSmith / Helicone โ Observability and cost tracking
- PromptLayer / Promptfoo โ Prompt versioning and testing
Don't build what you can buy or borrow. The scaffolding tax is real, but you don't have to pay it from scratch.
โ 3. Make the Tax Visible
Track scaffolding hours separately in your project management:
Feature: AI-powered document summary
โโโ Core logic: 8h (actual feature)
โโโ Scaffolding: 34h (infrastructure)
โ โโโ Context mgmt: 10h
โ โโโ Error handling: 8h
โ โโโ Guardrails: 6h
โ โโโ Logging: 5h
โ โโโ Cost controls: 5h
โโโ Tax ratio: 81%
When leadership sees the real numbers, they make better decisions about which features deserve the AI treatment.
โ 4. Start With the Hardest Scaffolding First
Most teams build the feature first and bolt on scaffolding later. Flip it:
- Week 1: Provider abstraction, logging, cost controls
- Week 2: Guardrails, retry logic, testing framework
- Week 3: Now build the actual feature
The feature will ship faster because the infrastructure is ready. And you won't be retrofitting security at 2 AM before launch.
โ 5. Know When NOT to Use an LLM
Not every problem needs an AI. Seriously.
| Problem | AI? | Why? |
|---|---|---|
| Sentiment analysis | โ Yes | NLP is hard; LLMs are great at it |
| Email validation | โ No | Regex exists. Use it. |
| Code generation | โ Yes | Complex output; LLMs excel |
| Null check | โ No | Please don't. |
| Document summarization | โ Yes | Core LLM strength |
| Sorting a list | โ No |
.sort() doesn't hallucinate |
The Scaffolding Tax is $0 for features that don't use LLMs. Choose wisely.
The Uncomfortable Truth
Here it is, the thing nobody wants to say out loud:
The AI Scaffolding Tax means your team is building a platform company whether you want to or not.
Every AI feature you ship adds to your internal platform. You're not just building a product anymore โ you're building the infrastructure to build the product. That's a fundamentally different kind of engineering effort, and it needs to be resourced accordingly.
Companies that treat AI features like regular features will drown in scaffolding debt. Companies that acknowledge the tax and invest in the platform will move 10x faster.
The scaffolding isn't waste. It's the real product. The LLM is just the engine โ the scaffolding is the car. ๐
TL;DR ๐
- 70% of AI engineering effort goes to infrastructure, not features
- The "Scaffolding Tax" includes: token counting, context management, guardrails, logging, cost controls, multi-provider support, testing, and more
- Build a platform layer first, features second
- Use existing tools (LangChain, LiteLLM, Guardrails AI) instead of building from scratch
- Make the tax visible โ track scaffolding hours separately
- Know when NOT to use an LLM โ not every problem needs AI
- Companies that invest in the scaffolding platform will win
What's Your Scaffolding Horror Story? ๐ฌ
I want to hear from you. What's the most ridiculous piece of infrastructure you've had to build just to make an LLM work in production? How much of your engineering time goes to scaffolding vs. actual features?
Drop a comment below. Let's commiserate. ๐ป
If this post saved you from a scaffolding surprise, give it a reaction ๐ and follow for more honest takes on building with AI. No hype, just engineering.
Cover image: The AI Iceberg โ the model is the tip; the scaffolding is everything beneath the surface.



Top comments (0)