DEV Community: Harikrishnan P H

In multi-agent systems, the skill is the abstraction level — not the agent count

Harikrishnan P H — Tue, 07 Jul 2026 08:57:57 +0000

Everyone building multi-agent systems knows the starting pattern by now: a router reads the request, works out what's actually being asked, and dispatches to specialized agents — one to set up the problem, one to execute, one to explain, one to narrow scope. Orchestration graph, specialized nodes. Fine. Everyone builds this. It's table stakes.

The part nobody teaches — the part that actually decides whether your system survives a year — is quieter:

Choosing the right level of abstraction.

Not the number of agents. Not the cleverness of the orchestration. The level. And there are two symmetric ways to get it wrong.

Trap 1: too coarse — the graph inside the graph

An agent grows up. It now has five branches, three tools, a loop, sub-decisions. The obvious move — the one every framework nudges you toward — is to give it its own internal graph. Then, later, a graph inside that.

It feels natural. It's also how a multi-agent system quietly becomes a monolith. Everything shares the parent's state. A change deep in a sub-graph has a blast radius you can't see. You can't test the inner piece in isolation, because it only exists inside its parent. You can't scale it, hand it off, or reason about it separately — there's no seam. It becomes one tightly-coupled organism only its author understands, and only for a few more weeks. (I've watched this exact shape before, in embedded systems, when "just one more special case" ran for two years.)

Trap 2: too fine — a hundred agents in a trench coat

The opposite mistake is louder right now, because AI made spinning up an agent nearly free. So people spawn dozens — hundreds — of tiny agents and wire them together, mistaking more agents for more intelligence.

It isn't. It's more ways to disagree. More hand-offs to misfire. More latency stacked end to end. More "which of these forty agents was wrong, on which step?" You've traded a monolith for a distributed system you didn't design and can't observe. The cost of an agent was never the code to create it — it's the coordination it drags in behind it. When agents are cheap to make, that cost is exactly the thing you stop noticing.

The actual skill: draw the boundary where it earns its keep

Both traps are the same failure — dodging the judgment call. The skill is sitting in the middle and deciding, deliberately, where a boundary belongs.

My rule: keep it a node while it's simple; promote it to a standalone agent behind a protocol (A2A) the day you catch yourself giving it its own graph — and not a day before. That moment isn't a coding decision; it's the design telling you the piece has outgrown living inside its parent.

When you promote it to a protocol-served agent, everything the nested version fought you on gets easy: it has a contract, you can test it in isolation, you can scale and version it independently, someone else can own it, and — because it speaks a protocol — anyone can call it, including the customer's own agents. A nested sub-graph can only ever be talked to by its parent.

The restraint clause (this is the whole skill)

Don't read this as "decompose everything into agents." That's just the nested-graph mistake wearing a different costume — premature boundaries you have to maintain before they earn their keep.

The judgment is in the timing. Keep it a simple node while it's simple. The signal to promote is specific: the day you catch yourself giving a node its own graph. Extract it then — not before, not five branches too late. The right level of abstraction means as few boundaries as possible, and no fewer.

And none of it tells you when it'll break

One caveat before anyone ships this: the right abstraction makes a system maintainable, not correct. A beautifully-factored multi-agent system can still be confidently wrong. The only thing that tells you when it will fail is deliberate planning and testing at the seams — not vibes. That's a whole discipline of its own; a post is coming. For now: if you're not testing the boundaries, you're guessing.

The takeaway

The multi-agent conversation is obsessed with orchestration and agent count. The engineers whose systems last are obsessed with something less glamorous: the level of abstraction. Too coarse and you get a monolith. Too fine and you get a swarm. The whole job is knowing where the boundary belongs — and having the restraint to draw no more than that.

The agents are the easy part. The abstraction level is the engineering.

If you build multi-agent systems: what's your signal for "make it its own agent" vs. "keep it a node"? I want to hear how others draw the line.

Originally published on my blog.

From Kernel Hardening to AI Agents: What the Metal Taught Me About Building AI That Survives Production

Harikrishnan P H — Mon, 29 Jun 2026 11:10:14 +0000

Most people writing about AI agents today started about six months ago.

I started in firmware.

My first real project out of college was a small streaming dongle — the kind of device that has to run Netflix, Prime, and YouTube on roughly 500MB of memory and 800 MFLOPS of compute. No thermal headroom. No margin for error. I owned most of the stack: RDK-V, a custom embedded browser, Bluetooth, WiFi, driver configuration, kernel hardening. And because shipping it meant five different companies' code had to behave as one product, I spent as much time on the seams between systems as on any one of them.

For years after that I lived in that world — set-top boxes, C++, an open-source WebKit browser, network failover routing that had to switch between Ethernet and WiFi without the user ever noticing. It is about as far from "prompt an LLM" as engineering gets.

And yet, now that I build production multi-agent AI for a living, I've realized something I didn't expect: embedded didn't make me a worse AI engineer. It made me a better one. Here's why.

What the metal actually drills into you

Embedded systems teach you a worldview before they teach you any particular skill. Three things stuck:

What can fail, will fail. When you've hardened a kernel, you stop trusting the happy path. You assume every input is hostile, every dependency will time out, every "this can't happen" eventually happens at 2am in someone's living room. (More on the bug that taught me this in a moment — it's my favorite.) That kind of debugging rewires you permanently.

Resources are not free. When your entire budget is 500MB and 800 MFLOPS, you develop a physical sense for cost and latency. You feel a wasted allocation. You don't get to wave it away and add a bigger box.

The best code is the boring code. My second engagement, on a WebKit-based browser, is where I learned that writing code that works is the easy part — writing code that's readable, maintainable, and documented is the real craft. A clever solution that no one can maintain is a liability, not an achievement.

The bug that rewired me

The story I still tell to explain how I think about systems is about a clock.

Screens on the dongle would freeze. Randomly. No pattern we could pin down and — worse — nothing we could reproduce on demand. The one thing we noticed was that it seemed to happen more the longer a device had been running. So we did what everyone does: we blamed heat. Thermal throttling, long sessions, a hardware stress problem. It's the kind of explanation that feels right and quietly eats a week of your life.

We tore through the entire stack chasing it. The application. The media player. Down into the firmware. Then the hardware itself. Nothing. Every layer looked innocent in isolation.

The real cause was nowhere near where the symptom showed up. When the WiFi connection dropped — even for a second — the system clock reset to epoch: January 1, 1970. And the media player, suddenly handed a timestamp that had jumped backward by decades, simply froze. A network event, three layers away, surfacing as a frozen screen the user blamed on the app.

Nobody would have guessed that from the symptom. You only find it by refusing to trust the obvious story and tracing the whole system, end to end, until the pieces line up.

That bug rewired me permanently. It's why, to this day, the symptom is never where I start — and why "it's probably the heat" (or today, "it's probably the model") is the explanation I distrust most.

Then I made the jump that scared me: new language, new domain, new stack, into AI and data. Almost nothing transferred at the surface level. The only thing I could reuse was that worldview. It turned out to be enough.

The bridge: treating agents like production systems, not magic

Here's the first place the metal shows up. When something feels off in one of our agents, my reflex isn't to re-prompt and hope. It's to debug — exhaust every mechanism I have, from logs to search to AI assistance — and if nothing works, go looking for a different approach entirely. And I judge that approach the way I learned to judge any dependency: is it maintained, is it documented, does it actually solve the problem? Not "is it new," not "is it clever." Maintainability is a feature.

That sounds obvious. It isn't, in this field. A lot of AI engineering right now is built on the assumption that the model is magic and the surrounding system is an afterthought. The embedded instinct is the exact opposite: the model is just one more component that will fail in ways you didn't predict, so you design for the failure path first. "Wrong call costs real money" isn't a slogan when your agent is moving real inventory for a real retailer — it's the reason you treat it like a production system instead of a feature.

Where it actually pays off, part 1: the data fights back

If you ask me the single hardest thing about shipping retail agents, it isn't the coordination between agents. Coordination you can engineer your way out of with clean contracts. The data is the part that fights back.

A retailer's reality lives across a handful of legacy vendor systems, each of which defines a "SKU" — and the truth — a little differently. The actual work is pulling that out, reconciling it, and shaping it into something an agent can reason over — and doing all of it fast enough that the person on the other end never feels the lag. Correctness and latency pulling in opposite directions is the real design problem, and it's the same tension I first met on a 500MB device. The metal prepared me for this exactly.

Where it pays off, part 2: the war story

The best illustration I have is the part of our product that decides how to move inventory — allocation and rebalancing.

It started almost trivially: get inventory from the warehouse to the stores. So I wrote what you'd expect. A greedy algorithm. Move the stock to where it's needed most, simplest rule first.

Then reality showed up — one conversation with the customer at a time.

What if inventory is starved? Then you can't just feed the hungriest store; you have to decide between giving it all to the best store or spreading it fairly. What about size coverage? A store should carry at least some minimum percentage of sizes or the assortment is useless. And then another rule. And another.

Greedy stopped working, so I moved to custom scoring — rank every option, weigh the trade-offs. That held for a while. Then it didn't. Every new constraint meant re-tuning the scoring, and the weights started fighting each other.

The moment I remember clearly is this: I'd fix one constraint, and another would fall straight out the window. Patch the size-coverage rule, break the fairness rule. Fix fairness, break something else. I was adding special case on top of special case, and the algorithm was becoming genuinely unmanageable.

That feeling — fix one thing, break another, complexity you can no longer hold in your head — is exactly the feeling embedded taught me to distrust. It's the signal that you're hand-rolling a worse version of a tool that already exists. I wasn't writing an allocation algorithm anymore. I was writing a bad, undocumented constraint solver.

So I stopped patching and switched approaches entirely — away from hand-rolled heuristics, toward a declarative model where you express the rules rather than code the logic that enforces them.

The difference is the whole point. With heuristics, every rule is imperative — you write the logic, and the rules tangle and fight each other. With a declarative approach, you state the rules — minimum size coverage, fairness under scarcity, warehouse limits — and let the system find an assignment that satisfies all of them at once. A new merchant requirement stops being an algorithm rewrite and becomes just one more rule.

# Illustrative — not our actual code

# Before: imperative heuristics that fight each other
score = w1*need + w2*fairness + w3*size_coverage - ...
# every new rule means re-tuning the weights and hoping

# After: declarative rules the system reconciles
require  allocation[store] <= demand[store]
require  size_coverage[store] >= MIN_COVERAGE
require  fair_split  when inventory is scarce
# add a rule -> add a line, not a rewrite

The lesson isn't about any one technique. It's the embedded lesson, again: know when to stop being clever and reach for the right tool. The same instinct that makes you choose a boring, maintainable library over a clever hack is the one that tells you a pile of special cases has outgrown its paradigm.

The throughline

Embedded didn't teach me machine learning. It taught me something that turned out to matter more: assume failure, respect constraints, prize maintainability, and recognize when the thing you're building has outgrown the approach you're using.

The internet is overflowing with AI demos. What it's short on is people who've watched a system fall over in production and learned to design for that moment first. If you came to AI from systems, or telecom, or logistics, or anywhere with real stakes and real failure — you have an advantage you might be underrating. The model is the easy part. Everything around it is the engineering, and that part you already know.

I'm writing more about building AI that survives production — multi-agent systems, integrating with legacy enterprise data, and earning trust from expert users. If that's your world too, let's compare scars.

This post was originally published on my blog.