DEV Community

Cover image for The AI Scaffolding Tax ๐Ÿ’ฐ: The Hidden 70% Nobody Warns You About When Building with LLMs
Mamoor Ahmad
Mamoor Ahmad

Posted on

The AI Scaffolding Tax ๐Ÿ’ฐ: The Hidden 70% Nobody Warns You About When Building with LLMs

The AI Scaffolding Tax ๐Ÿ’ฐ: The Hidden 70% Nobody Warns You About When Building with LLMs

"The model is 30% of the work. The other 70% is everything around it. And nobody warned me."

You've seen the demos. A few lines of Python, an API call to GPT/Claude/Gemini, and boom โ€” magic. Your app can summarize documents, write code, answer questions.

Then you try to ship it to production. And reality hits like a freight train. ๐Ÿš‚


The Demo-to-Production Gap Is a Chasm

Let me paint you a picture:

Iceberg

That clean openai.chat.completions.create() call? It's the tip of the iceberg. Beneath the surface lies what I call The Scaffolding Tax โ€” the massive layer of infrastructure code that exists solely because you chose to use an LLM.

Here's what your "simple AI feature" actually requires in production:

๐Ÿงฑ The Scaffolding Stack

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚        Your "Simple" Feature        โ”‚  โ† What stakeholders see
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚     Prompt Engineering & Versioning โ”‚
โ”‚     Context Window Management       โ”‚
โ”‚     Token Counting & Budget Control โ”‚
โ”‚     Multi-Provider Abstraction      โ”‚
โ”‚     Response Parsing & Validation   โ”‚
โ”‚     Retry Logic & Fallback Chains   โ”‚
โ”‚     Rate Limiting & Queueing        โ”‚
โ”‚     Logging & Observability         โ”‚
โ”‚     Guardrails & Content Filtering  โ”‚
โ”‚     Caching & Cost Optimization     โ”‚
โ”‚     A/B Testing Framework           โ”‚
โ”‚     Prompt Injection Defense        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
        โ†‘ This is the Scaffolding Tax
Enter fullscreen mode Exit fullscreen mode

Every single one of these is mandatory for production-grade AI apps. None of them have anything to do with your actual product.


Let's Talk Real Numbers ๐Ÿ“Š

I tracked the engineering hours across three AI-powered features we shipped last quarter:

Component Hours % of Total
Core LLM logic (the "actual feature") 38h 28%
Context management & chunking 22h 16%
Error handling & retries 18h 13%
Guardrails & safety filters 15h 11%
Logging & observability 12h 9%
Token budgeting & cost controls 10h 7%
Multi-provider abstraction 8h 6%
Prompt versioning & testing 7h 5%
Caching layer 5h 4%
Total scaffolding 97h 72%

Seventy-two percent. Almost three-quarters of our engineering effort went to infrastructure that exists only because we used an LLM. If we'd used a traditional algorithm for the same feature, those 97 hours would have been roughly 10.

Money on fire

That's the Scaffolding Tax. And it's due on every. single. feature.


The 12 Taxes You're Paying (Whether You Know It or Not)

1. ๐Ÿ”ค The Token Counting Tax

You can't just send text to an LLM. You need to count tokens before sending, because:

  • Context windows have limits
  • Costs are per-token
  • Chunking strategies depend on token counts

So now you need a tokenizer library, a chunking algorithm, and budget enforcement โ€” for every provider you support.

# "Simple" code that took 3 days to get right
def chunk_text(text, max_tokens=8000, overlap=200):
    tokens = tokenizer.encode(text)
    chunks = []
    i = 0
    while i < len(tokens):
        chunk_tokens = tokens[i:i + max_tokens]
        chunks.append(tokenizer.decode(chunk_tokens))
        i += max_tokens - overlap
    return chunks
Enter fullscreen mode Exit fullscreen mode

This looks simple. The edge cases are not. What about multi-byte characters? What about code blocks that shouldn't be split? What about markdown headers that need context? ๐Ÿคฏ

2. ๐Ÿง  The Context Management Tax

Your LLM has no memory. Every call is stateless. So you need to:

  • Maintain conversation history
  • Decide what to include (summarize? truncate? sliding window?)
  • Handle the "context window is full" event gracefully
  • Manage token budgets across multiple context sources
// The conversation manager nobody tells you about
class ConversationManager {
  private history: Message[] = [];
  private maxContextTokens: number;
  private summaryCache: Map<string, string>;

  async addMessage(msg: Message): Promise<Context> {
    this.history.push(msg);
    const totalTokens = this.countTokens(this.history);

    if (totalTokens > this.maxContextTokens * 0.8) {
      await this.compressHistory();  // โ† This is a whole project
    }

    return this.buildContext();
  }

  // This method alone is 200+ lines in production
  private async compressHistory() { /* ... */ }
}
Enter fullscreen mode Exit fullscreen mode

3. ๐Ÿ”„ The Multi-Provider Tax

"What if OpenAI goes down?" Your boss asks on day two.

So now you're abstracting across OpenAI, Anthropic, Google, and maybe a local model. Each has:

  • Different API formats
  • Different token limits
  • Different rate limits
  • Different error codes
  • Different streaming behaviors

You're basically building a mini-cloud abstraction layer. For text completion. ๐Ÿซ 

4. ๐Ÿ›ก๏ธ The Guardrails Tax

Users will try to break your AI. They will:

  • Ask it to reveal system prompts
  • Try to make it say offensive things
  • Attempt prompt injection attacks
  • Feed it adversarial inputs

So you need:

  • Input sanitization
  • Output filtering
  • Topic restrictions
  • PII detection
  • Prompt injection detection

Each of these is a mini-project with its own edge cases.

5. ๐Ÿ’ธ The Cost Control Tax

"Can you add AI to this feature?" = "Can you add unpredictable variable costs to this feature?"

You need:

  • Per-user token budgets
  • Per-feature cost tracking
  • Alert thresholds
  • Graceful degradation when budget is hit
  • Cost attribution (which feature is burning money?)

Budget meeting

6. ๐Ÿ“Š The Observability Tax

When your AI gives a bad answer, how do you debug it?

You need:

  • Full request/response logging (with PII redaction)
  • Prompt version tracking
  • Token usage metrics
  • Latency percentiles
  • Error rate monitoring
  • Quality score tracking

Traditional software has logs. AI software needs a forensic lab.

7. ๐Ÿ” The Retry & Fallback Tax

LLMs are nondeterministic. They:

  • Time out
  • Return malformed JSON
  • Refuse valid requests
  • Rate-limit you unexpectedly
  • Occasionally hallucinate wildly

You need retry logic with exponential backoff, circuit breakers, fallback chains (try GPT-4 โ†’ try Claude โ†’ try Gemini โ†’ cache โ†’ degrade gracefully), and response validation.

8. โฑ๏ธ The Latency Tax

LLM calls are slow. 1-30 seconds slow. So your entire UX architecture changes:

  • Streaming responses become mandatory
  • Loading states need to be thoughtful
  • Optimistic UI patterns are essential
  • Background processing becomes the norm
  • Users need progress indicators, not spinners

9. ๐Ÿงช The Testing Tax

How do you write unit tests for nondeterministic output?

You don't. Not really. You build:

  • Semantic similarity tests
  • Golden dataset evaluations
  • Human eval pipelines
  • Regression test suites for prompts
  • A/B testing infrastructure

Testing AI is fundamentally different from testing traditional software. And it's harder.

10. ๐Ÿ“ฆ The Prompt Versioning Tax

Prompts are code. They need:

  • Version control
  • A/B testing
  • Rollback capability
  • Environment-specific variants (dev/staging/prod)
  • Performance tracking per version

Most teams store prompts in strings. In code. Mixed with business logic. It works until it doesn't. ๐Ÿ’€

11. ๐Ÿ” The Security Tax

Your AI processes user input. That input might contain:

  • Prompt injection attempts
  • Data exfiltration payloads
  • Adversarial examples
  • PII that needs protection

You need input validation, output sanitization, access controls, and audit logging โ€” all specific to the LLM context.

12. ๐Ÿงฉ The Integration Tax

Your AI doesn't live in isolation. It needs to:

  • Call functions / use tools
  • Access your database
  • Respect user permissions
  • Integrate with existing workflows
  • Handle authentication

Each integration point multiplies the scaffolding complexity.


The Real Architecture of an AI Feature

Here's what the architecture actually looks like for a "simple" AI-powered search feature:

User Query
    โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              Input Validation                 โ”‚ โ† Prompt injection defense
โ”‚              PII Detection                    โ”‚ โ† Privacy compliance
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           Context Assembly Engine             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ User Prefs  โ”‚ โ”‚ History  โ”‚ โ”‚ Knowledge โ”‚ โ”‚
โ”‚  โ”‚ (filtered)  โ”‚ โ”‚ (summed) โ”‚ โ”‚  (RAG'd)  โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚              Token Budget Manager             โ”‚ โ† Counts, truncates, prioritizes
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           Provider Router                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚  โ”‚ OpenAI  โ”‚ โ”‚ Claude  โ”‚ โ”‚ Gemini  โ”‚       โ”‚
โ”‚  โ”‚ (fast)  โ”‚ โ”‚ (smart) โ”‚ โ”‚ (cheap) โ”‚       โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ”‚     Rate Limiter โ”‚ Circuit Breaker           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           Response Pipeline                   โ”‚
โ”‚  Parse โ†’ Validate โ†’ Filter โ†’ Transform       โ”‚
โ”‚  Log โ†’ Track โ†’ Cache โ†’ Attribute Cost        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           Output Delivery                     โ”‚
โ”‚  Stream to client โ”‚ Update state โ”‚ Notify    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Enter fullscreen mode Exit fullscreen mode

This is for one feature. One query box. One AI call.

Now multiply this by every AI-powered feature in your product. ๐Ÿ˜…


How to Survive the Scaffolding Tax

Okay, doom and gloom over. Here's how smart teams are managing this:

โœ… 1. Build a Scaffolding Layer First, Features Second

Don't build scaffolding per-feature. Build a shared AI platform layer:

// Instead of this per-feature:
const response = await openai.chat.completions.create({
  model: "gpt-4",
  messages: [...],
});

// Build this once:
const result = await aiPlatform.complete({
  task: "summarize-document",
  input: document,
  userId: user.id,
  budget: { maxTokens: 4000, maxCost: 0.05 },
  fallback: "cache-or-degrade",
});
Enter fullscreen mode Exit fullscreen mode

The platform handles routing, budgeting, logging, retries, and guardrails. Features just declare intent.

โœ… 2. Use AI Middleware Libraries

The ecosystem is catching up. Tools that handle the scaffolding:

  • LangChain / LlamaIndex โ€” Context management, chains, agents
  • Guardrails AI โ€” Output validation and structured extraction
  • LiteLLM โ€” Multi-provider abstraction (100+ providers, one API)
  • LangSmith / Helicone โ€” Observability and cost tracking
  • PromptLayer / Promptfoo โ€” Prompt versioning and testing

Don't build what you can buy or borrow. The scaffolding tax is real, but you don't have to pay it from scratch.

โœ… 3. Make the Tax Visible

Track scaffolding hours separately in your project management:

Feature: AI-powered document summary
โ”œโ”€โ”€ Core logic:           8h  (actual feature)
โ”œโ”€โ”€ Scaffolding:         34h  (infrastructure)
โ”‚   โ”œโ”€โ”€ Context mgmt:    10h
โ”‚   โ”œโ”€โ”€ Error handling:   8h
โ”‚   โ”œโ”€โ”€ Guardrails:       6h
โ”‚   โ”œโ”€โ”€ Logging:          5h
โ”‚   โ””โ”€โ”€ Cost controls:    5h
โ””โ”€โ”€ Tax ratio:           81%
Enter fullscreen mode Exit fullscreen mode

When leadership sees the real numbers, they make better decisions about which features deserve the AI treatment.

โœ… 4. Start With the Hardest Scaffolding First

Most teams build the feature first and bolt on scaffolding later. Flip it:

  1. Week 1: Provider abstraction, logging, cost controls
  2. Week 2: Guardrails, retry logic, testing framework
  3. Week 3: Now build the actual feature

The feature will ship faster because the infrastructure is ready. And you won't be retrofitting security at 2 AM before launch.

โœ… 5. Know When NOT to Use an LLM

Not every problem needs an AI. Seriously.

Problem AI? Why?
Sentiment analysis โœ… Yes NLP is hard; LLMs are great at it
Email validation โŒ No Regex exists. Use it.
Code generation โœ… Yes Complex output; LLMs excel
Null check โŒ No Please don't.
Document summarization โœ… Yes Core LLM strength
Sorting a list โŒ No .sort() doesn't hallucinate

The Scaffolding Tax is $0 for features that don't use LLMs. Choose wisely.


The Uncomfortable Truth

Here it is, the thing nobody wants to say out loud:

The AI Scaffolding Tax means your team is building a platform company whether you want to or not.

Every AI feature you ship adds to your internal platform. You're not just building a product anymore โ€” you're building the infrastructure to build the product. That's a fundamentally different kind of engineering effort, and it needs to be resourced accordingly.

Companies that treat AI features like regular features will drown in scaffolding debt. Companies that acknowledge the tax and invest in the platform will move 10x faster.

The scaffolding isn't waste. It's the real product. The LLM is just the engine โ€” the scaffolding is the car. ๐Ÿš—


TL;DR ๐Ÿ“

  • 70% of AI engineering effort goes to infrastructure, not features
  • The "Scaffolding Tax" includes: token counting, context management, guardrails, logging, cost controls, multi-provider support, testing, and more
  • Build a platform layer first, features second
  • Use existing tools (LangChain, LiteLLM, Guardrails AI) instead of building from scratch
  • Make the tax visible โ€” track scaffolding hours separately
  • Know when NOT to use an LLM โ€” not every problem needs AI
  • Companies that invest in the scaffolding platform will win

What's Your Scaffolding Horror Story? ๐Ÿ’ฌ

I want to hear from you. What's the most ridiculous piece of infrastructure you've had to build just to make an LLM work in production? How much of your engineering time goes to scaffolding vs. actual features?

Drop a comment below. Let's commiserate. ๐Ÿป


If this post saved you from a scaffolding surprise, give it a reaction ๐Ÿ‘ and follow for more honest takes on building with AI. No hype, just engineering.

Cover image: The AI Iceberg โ€” the model is the tip; the scaffolding is everything beneath the surface.

Top comments (0)