DEV Community

Tom Tokita
Tom Tokita

Posted on

Someone Called My AI System a Tool. Then They Showed Me Theirs.

Someone at a conference asked me what I'd been building. I described a system I use daily. Over 200 sessions of accumulated learnings. 45 mechanical hooks that fire before and after every action. Anti-fabrication gates that block the AI from stating anything it hasn't verified. Memory that survives context compression. Deploy protections that physically prevent wrong-target pushes. A behavioral identity that gets re-injected every message so the system doesn't drift into generic assistant mode.

He nodded and said, "Oh, so you built a tool."

Then he described his. "I built something similar," he said. An agent framework. A React dashboard. A task board. Some cron jobs. A dozen agents with names. A job worker that shells out to the agent CLI and captures stdout. He showed me the architecture diagram. Three boxes connected by arrows.

I asked about guardrails. "What do you mean?" I asked what happens when an agent hallucinates a data point and the next agent downstream treats it as fact. He said that hasn't happened yet. I asked about credential scoping. Every agent had the same API keys with the same permissions. I asked what happens when context compresses mid-task. He didn't know what context compression was.

We were not building the same thing.

The Assembly Pattern

This pattern is everywhere right now. Pull an open-source agent framework. Fork a React cockpit from GitHub. Wire them together with a thin HTTP layer. Add some agent definitions with fun names. Ship a demo. Call it "AI infrastructure."

It works in the demo. It works for the screenshot. It even works the first five times you run it.

It stops working when an agent fabricates a statistic and your client reads it. When a retry loop burns $400 in API calls overnight because nothing capped the spend. When an agent with write access to your production database decides to "clean up" records it hallucinated as duplicates.

The assembly is the easy part. The demo is the easy part. What comes after the demo is where the actual engineering lives.

What's Missing From Every Patchwork Build I've Reviewed

I've audited three of these setups in the past year. Internal team builds, partner builds, open-source-assembled stacks. The gaps are identical every time.

What Production Requires What the Patchwork Has
Pre-action gates (mechanical blocks before execution) Nothing. Agent output accepted as final answer
Anti-fabrication (every claim must trace to a source) Nothing. Whatever the LLM says is treated as fact
Anti-drift detection (behavioral correction over long sessions) Nothing. Agents drift silently
Persistent memory with session recovery Stateless. Fresh context every run
Captured learnings (compound knowledge over time) Nothing. Same mistakes are repeatable indefinitely
Credential scoping per agent Shared keys, full permissions, no boundaries
Human checkpoints on multi-step tasks Fully autonomous, no review loop

The common response: "We'll add that later." In my experience, later means after the first production incident. And the first production incident in an unharnessed AI system is rarely small.

Assembly Is Not Engineering

I want to be clear. I'm not against using open-source. I use open-source tools constantly. MIT-licensed projects power parts of my own stack. Pulling from the community is smart and efficient.

But there's a gap between assembling components and engineering a system. Assembly is connecting boxes. Engineering is understanding what happens at every connection point when things go wrong. What happens when the model hallucinates at step 3 of a 7-step pipeline? What happens when context compresses and the agent forgets the rules you set 40 messages ago? What happens when an agent gets a poisoned input from an unaudited MCP server?

If you can't answer those questions, you haven't built infrastructure. You've built a demo with a longer runtime.

"I'll Just Have My AI Build It"

This is the part that genuinely worries me.

The assembly pattern is accelerating because people are using AI to do the assembling. "I'll just have Claude/GPT scaffold my agent system." The AI reads some docs, maybe runs a web search, ingests a few blog posts about agent frameworks, and produces something that looks like architecture. Clean folder structure. Reasonable-sounding agent definitions. Maybe even a README with a diagram.

But it's architecture by hallucination. The AI doesn't know what breaks in production because it's never been in production. It doesn't know that context compression silently erases behavioral rules at message 180. It doesn't know that an unscoped MCP server will happily route your client data through an endpoint you never audited. It doesn't know that "just add a retry" turns a $0.20 task into a $40 task when the retry loop has no ceiling.

What you get is a system that looks engineered but isn't. It passes the screenshot test. It passes the "show the team" test. It fails the Tuesday afternoon test, when something unexpected happens and there's no gate to catch it, no captured learning to reference, no incident history to draw from.

AI is intelligent. It can write code, generate configurations, and produce plausible architectures. What it cannot do is architect from pain it hasn't experienced. Every rule in a real harness exists because something specific went wrong. The AI building your system hasn't had things go wrong yet. It's working from blog posts and documentation, not from the 11 PM deploy that almost went to the wrong org.

The irony is thick. An unharnessed AI building the infrastructure that's supposed to harness AI. The output will be confident, well-structured, and missing every lesson that only production teaches.

What "Infrastructure" Actually Means

The system I described at that conference didn't start as infrastructure. It started as a mess. A rules file that grew from 5 entries to 27 because the AI kept finding new ways to surprise me. A hook I wrote at 11 PM because the system nearly pushed metadata to the wrong environment. A memory protocol I built because the AI forgot everything after context compression and started making the same mistakes I'd fixed three hours earlier.

Every rule in the harness traces to a specific failure. That's not architecture by design. It's architecture by incident. But it compounds. 200+ sessions of captured learnings means the system knows things a fresh agent never will. Platform quirks, client-specific constraints, failure patterns that repeat across projects. None of that lives in an agent framework you pulled from GitHub last Tuesday.

I wrote about this convergence pattern recently. Multiple teams, from OpenAI to Martin Fowler's group to a solo practitioner in Manila, arrived at the same conclusion independently: the harness is the product, not the model. A disciplined harness on a weaker model beats an unconstrained stronger model every time.

The Uncomfortable Question

Next time someone shows you their "AI infrastructure," ask them three questions:

  1. What happens when an agent fabricates a data point? Is there a mechanical gate, or do you just hope it doesn't?
  2. What happens after context compression? Does the system recover its behavioral rules, or does it revert to a generic assistant?
  3. Can you trace every rule in your system to a specific incident that forced you to add it?

If the answers are "hasn't happened yet," "what's context compression," and a blank stare, you're looking at a patchwork. Not infrastructure.

And that's fine. Everyone starts with a patchwork. I did. The question is whether you know the difference.

If you want to start building the real thing, I wrote a hands-on tutorial with three production-tested gates and starter code. The gates are also packaged as a ready-to-clone repo on GitHub. Zero dependencies, works with any LLM provider.


I'm Tom Tokita. I run Aether Global Technology out of Manila. I've been building and operating a production AI system daily for over 200 sessions. I write about what works, what breaks, and the gap between demos and production. More on tokita.online.

Top comments (0)