DEV Community: Jinav Shah

AI is moving into the real world. Here is what we need to build alongside it.

Jinav Shah — Sun, 28 Jun 2026 17:16:08 +0000

By Jinav Shah. Views are personal.

We are at an inflection point.

AI systems are moving from answering questions on a screen to taking actions in the world. Booking appointments. Approving transactions. Navigating physical environments. Writing and executing code.

With that shift comes a natural question. Not whether AI is capable enough, but whether we understand it well enough to deploy it confidently in high stakes situations.

This is not a question about AI being good or bad. It is a question about how well we understand what is happening inside these systems when they get something wrong.

The honest answer is that we are still early. Not because researchers are not trying. But because the nature of how AI stores and processes information is fundamentally different from anything we have debugged before.

Every major technology shift built infrastructure alongside capability

Cars brought crash testing. Electricity brought circuit breakers. Medicine brought clinical trials.

Nobody called these anti-progress movements. They were the responsible infrastructure that allowed those technologies to scale into everyday life.

AI needs the same. Not resistance. Infrastructure.

The specific infrastructure AI needs is different from anything we have built before. Because the failure modes of AI are different from anything we have debugged before.

In software, a bug has an address

A line number. A variable. A function that received the wrong input. You set a breakpoint, step through execution, and find exactly where reality diverged from expectation.

The source code is human written, which means human readable. The logic is explicit. The intent is traceable.

When an AI model gets something wrong, none of this exists. There are billions of floating point numbers whose collective meaning emerged from training on human text. No line numbers. No variables. No explicit logic.

You can only observe what goes in and what comes out.

So what do we do when AI gets something wrong

We change the model. Upgrade versions. Tweak the prompt. Add a guardrail.

All reasonable. None of them root cause.

It is like treating recurring headaches with stronger painkillers each visit without ever investigating why they started. The reason we rely on external interventions is not laziness. It is that root cause investigation requires understanding what is happening inside, and that is one of the hardest unsolved problems in the field.

The reason goes deeper than tooling

Think of a hex color code. Three primary colors. Six digits. Yet 16 million possible combinations, most indistinguishable to the naked eye.

A transformer represents each word as 256 numbers. Not 256 separate meanings in 256 separate boxes. 256 numbers that combine, overlap, and interact to represent potentially thousands of concepts simultaneously.

This is called superposition. And it is not a design flaw. It is an inevitable consequence of matrix multiplication, the core operation of every transformer. Mixing happens by default.

The word bank does not have a dedicated slot for financial institution and another for riverbank. Both meanings, and dozens of others, are folded into the same 256 numbers, in overlapping directions, sharing the same space.

When bank is interpreted wrongly, there is no single number to point to. The error is distributed across all 256 numbers, in combinations we have not named, emerging from interactions we cannot enumerate.

You cannot fix what you cannot locate.

And that is just one token, in one layer

Large models have 96 layers. Each with its own Q, K and V transformations mixing information differently. Then non-linear FFN layers introducing combinations no linear operation can express. Then MoE, Mixture of Experts, where the same input takes different computational paths depending on context.

There is no single moment where the wrong answer was decided. It emerged gradually, collectively, across the entire network.

That is what you are trying to debug.

Now we are giving these systems tools to act

In software, a function call is deterministic. Same input, same output, traceable every time.

In AI with tool calling, the model decides whether to call a tool, which one to call, what parameters to pass, how to interpret the result, and whether to call another tool based on that result. Each decision is a probabilistic inference emerging from the same unlocatable internal state.

Tool outputs feed back into context. Errors compound across steps. Actions happen in the real world.

In software, a multi-step process has a call stack. You can inspect every frame, every variable, every state transition.

In an AI agent, there is no call stack. There is a sequence of probabilistic decisions producing actions that can be irreversible.

We have taken a system we cannot fully debug and given it the ability to act. The debugging problem did not just get harder. It got consequential.

This is solvable. But it requires the right framing.

Chris Olah at Anthropic has spent years trying to reverse engineer what concepts a model has learned internally, a field called mechanistic interpretability. His team has found recognisable features, grammar rules, factual associations, sentiment.

But the honest truth is that we find features by looking for things we already suspect exist. Nobody has a complete list. An accuracy failure caused by a feature you have not named is one you cannot systematically prevent.

The hex code has 16 million combinations and we can see them all.

AI has infinite combinations and we have named a fraction.

The path forward is not slowing AI down. It is building the interpretability tools, accountability frameworks, and testing infrastructure that let us deploy it confidently as the stakes rise.

Cars did not wait for perfect safety before scaling. They built safety infrastructure alongside scale.

That is exactly where we are with AI. And the work, by researchers, engineers, and product teams, is already underway.

The better news is that we now at least know what we are up against.

Connect on LinkedIn: https://www.linkedin.com/in/jinav-shah-27b3a255/

You're probably using AI wrong. And it's costing you more than you think.

Jinav Shah — Tue, 16 Jun 2026 04:58:54 +0000

Most companies today have one AI setup: send everything to the most powerful model available. Pay the bill. Repeat.

It works. But it's expensive, slower than it needs to be, and honestly — a bit like hiring a surgeon to change a lightbulb.

The problem nobody talks about

Imagine a hospital where every patient — whether they need open-heart surgery or a bandage on a paper cut — is seen by the senior consultant first.

The consultant is brilliant. But the waiting room is chaos. The costs are sky-high. And half his time is spent on things a nurse could have handled in two minutes.

That's what most AI pipelines look like today.

When your team sends something to an AI model, it might be a Python file, a customer complaint in Hindi, a SQL query, or a casual Hinglish support ticket. These are completely different problems requiring different expertise, different depth, different cost.

Yet most systems send them all to the same model, at the same price, with the same wait time.

The smarter approach: right model for the right job

Some inputs have hard, deterministic boundaries. A .py file contains Python. A .sql file contains SQL. You don't need the most powerful AI in the world to figure that out — you need a rule.

Here's what a smarter pipeline looks like:

Input arrives
      ↓
Orchestrator SLM — a small, fast model that reads
the input and decides: what is this, who handles it?
      ↓
├── Python file   → Python specialist model
├── SQL query     → SQL specialist model  
├── Hindi doc     → Hindi specialist model
└── Ambiguous     → Frontier model directly
      ↓
Specialist outputs + original input
→ Frontier model → Final answer

There is no separate routing system to build. The orchestrator is itself a small AI model — trained to classify inputs and direct traffic. It costs almost nothing to run.

The powerful frontier model — your Claude, your GPT-4 — stays in the loop for the final answer. It just isn't doing the sorting anymore.

One insight most teams miss

When specialist models pass findings to the frontier model, the instinct is to format outputs for human readability. Paragraphs. Explanations. Full sentences.

Wrong target.

The downstream consumer is another model — not a human. Specialist models should produce machine-readable structured output. Dense. Precise. No explanation.

{
  "language": "python",
  "issues_detected": ["unbounded loop at line 47"],
  "confidence": 0.94
}

This isn't for a person to read. It's for a model to consume efficiently. The constraint is baked into training — not imposed by external truncation at runtime. Prevention over correction.

Think of it like a doctor handing a consultant a structured chart instead of a five-page narrative. Same information. Faster to read. More room to think.

What happens when context gets too large?

The frontier model has finite working memory. Multiple specialists contributing outputs fills it fast. Here's the fallback stack, in order:

First — specialists send only essential structured signal. No reasoning traces. This is the default.

If needed — summarise the original input first. Compress before routing.

If still needed — feed specialist outputs one at a time. The frontier model builds context incrementally. Slower, but accurate.

Last resort — skip specialists entirely. Raw input directly to the frontier model. Full cost, guaranteed quality.

The pipeline always has a path to the right answer. You're just choosing how much it costs.

Does this actually save money?

Honest answer: only at scale.

Specialist models are open-source — free to use, but you pay for compute. A reasonable GPU setup costs $1,000–1,100 per month. The savings come from routing a large share of queries away from expensive frontier API calls.

Monthly AI API spend	Does this make sense?
Below $2,000	Probably not — keep it simple
$3,000–$5,000	Worth evaluating
Above $5,000	Very likely yes

One important caveat. If your team currently uses Claude.ai, Claude Code, or any managed AI interface — this architecture means moving away from that. You'd be calling APIs directly from your own system, which means building and owning the interaction layer your employees use.

How you use AI today	What this means
Managed interface (Claude.ai, etc.)	Build a custom interface first — factor in engineering cost
Already using APIs with custom tooling	Plugs in naturally

You've already seen this architecture

If you've used agent mode in Cursor — the AI coding tool — you've experienced this exact pattern without realising it.

Cursor doesn't send your entire codebase to one model and hope for the best. A lightweight orchestrator reads your request, decides what to do — read a file, search the codebase, run a terminal command — routes to the right tool, then a frontier model synthesises the final response.

Enterprise tools like Atlassian's Rovo are moving in the same direction for workplace workflows.

The companies that built these tools figured out that one model doing everything is wasteful. The question is whether the AI pipelines inside your organisation are designed with the same intelligence — or still sending every query to the most expensive model available.

The real lesson

Most AI cost and speed problems aren't model problems. They're routing problems.

The best AI pipelines look less like "one genius doing everything" and more like a well-run team: a smart receptionist, skilled specialists, and senior judgment applied only where it genuinely matters.

The question isn't which model is best.

It's: are you using the right model for the right job?

What routing decisions is your organisation making — or avoiding? Would love to hear in the comments.

Views expressed are my own and do not represent my employer.