choutos

Posted on Feb 21 • Originally published at wanderclan.eu

From Microservices to Agent Mesh: Why Your Next Infrastructure Won't Be Coded

#ai #kubernetes #microservices #architecture

Here is a sentence that would have been absurd three years ago: markdown is becoming a programming language.

Not metaphorically. Not in the way people say "YAML is the new XML" with a weary sigh. Literally. Teams are defining autonomous software agents—their behaviour, their personality, their decision logic, their safety boundaries—in plain prose, stored in .md files, interpreted at runtime by an LLM. The "compiler" is a language model. The "source code" is a paragraph that says what the agent should do. And the resulting system runs on hardware that costs less than lunch.

If you've spent the last decade building microservices architectures, this should make you deeply uncomfortable. It should also make you curious.

The collapse of the programming layer

The traditional path from intention to execution has always had a translation step in the middle. A human knows what they want. They express it in a programming language. A compiler or runtime turns that into behaviour. The entire craft of software engineering lives in that middle layer—the translation from intent to code.

What's happening now is that the middle layer is thinning to nothing.

An agent defined by markdown looks like this: a directory containing a handful of prose files. SOUL.md describes identity and core directives. TOOLS.md lists available capabilities. WORKFLOWS.md defines multi-step procedures. GUARDRAILS.md sets boundaries. The runtime—a lean orchestrator under ten megabytes—reads these files, calls an LLM to interpret them, and acts.

This isn't a toy. GitHub Copilot agents are configured via markdown instruction files. Anthropic's system prompts are, functionally, prose programs. CrewAI defines agents in YAML one step removed from natural language. The pattern is converging from multiple directions, which is usually how you know something is real.

The skill shift is subtle but profound. "Development" becomes description. Debugging means reading a reasoning trace, not setting breakpoints. Code review becomes prose review: does this paragraph capture the intended behaviour? Refactoring is rewriting for clarity. And rollback—that perennial source of deployment anxiety—is git revert. The agent runtime picks up the old files and behaves accordingly. No rebuild. No blue-green deployment. Just swap the text.

The entire CI/CD pipeline collapses to: edit, commit, push.

The brain doesn't live where the hands are

Here's the architectural insight that makes the whole thing work: the agent runtime and the inference backend are separate concerns. They almost never live on the same device—and they shouldn't.

Think of it as hands and brain. A ten-dollar microcontroller in a field is the hands: it reads sensors, triggers actuators, manages state. But the brain—the LLM that interprets the markdown and makes decisions—lives elsewhere. A cloud API. An on-premise GPU box running open-source models. A tiered hybrid that routes simple decisions locally and complex reasoning to heavier infrastructure.

The most practical pattern is tiered inference, and here's what's elegant: the markdown itself specifies the routing policy. A few lines of prose can say: routine sensor readings go to the local model; anomaly classification routes to the on-premise server; novel situations escalate to the cloud; patient health data never leaves the building. Configuration as natural language, living alongside the behaviour definition. No separate config management system. No environment variables. Just prose.

The economics of this split are startling. A single consumer GPU—an NVIDIA RTX 4090, roughly £1,300—running vLLM or Ollama can serve an eight-billion-parameter model at a hundred tokens per second. That's enough to support fifty to a hundred edge agents making periodic inference calls. Data never leaves the premises. Latency drops from two hundred milliseconds (cloud round-trip) to fifty (local network). And the cost is fixed: no per-token billing that scales with usage, no monthly invoice that grows as your mesh expands.

The strategic bet underlying all of this: inference cost approaches zero. GPT-4-class performance cost sixty dollars per million tokens in 2023. By early 2026, it's under fifty pence. The value isn't in running models—that commoditises. The value is in the markdown definitions themselves, and in the orchestration layer that makes them collaborate.

Kubernetes enters the picture (and it fits perfectly)

If you're a platform engineer, you might be thinking: this is charming for edge devices, but what about enterprise? What about the compliance requirements, the audit trails, the operational maturity we've spent a decade building?

The answer is that the same ultra-lightweight runtime that runs on a ten-dollar ESP32 is also an exceptionally good container base image.

Picture a Kubernetes pod. Inside it, two containers. The first holds the agent runtime plus its markdown files—together, well under ten megabytes. Compare that to the two hundred megabytes to a gigabyte of a typical microservice image. The second container is an A2A sidecar, analogous to an Envoy proxy in a service mesh, but handling agent-to-agent communication instead of HTTP and gRPC. It manages discovery, routes inter-agent messages, advertises capabilities.

This maps onto Kubernetes primitives with an almost suspicious neatness. ConfigMaps hold the markdown files and inference API keys. Horizontal Pod Autoscalers scale agent replicas based on message queue depth or token budget consumption. Rolling deployments mean updating a ConfigMap triggers a new pod rollout—zero-downtime behaviour change. Namespaces become mesh boundaries: per-tenant, per-department. NetworkPolicies control which agents can talk to each other. ServiceAccounts grant tool access permissions. And Custom Resource Definitions can model AgentDefinition, AgentMesh, and InferenceBackend as first-class objects in the cluster.

The result: enterprise teams get agent meshes with all the operational maturity they expect—autoscaling, RBAC, observability, audit logs—while the "application" is still just prose in a ConfigMap.

This is the bridge. Edge meshes on cheap devices serve the long tail. Kubernetes-hosted agent meshes serve the enterprise. Same runtime. Same markdown format. Same A2A protocol. Different substrate, same paradigm.

What a mesh of agents actually feels like

Abstract architecture only becomes real through use. So let me paint three scenarios—not as feature lists, but as lived experiences.

The vineyard. Fifty sensor agents on eight-dollar devices scattered across the blocks, each with a markdown file that says something like: "You monitor soil moisture in block 7. Check every fifteen minutes. If below thirty per cent, tell the irrigation agent to activate drip zone 3 for twenty minutes. If temperature exceeds thirty-five degrees, increase frequency. Log everything. Alert the farmer if the pump doesn't respond." One GPU box in the equipment shed handles inference for the entire mesh. Total hardware cost: under two thousand pounds. A vineyard in Stellenbosch or the Barossa Valley gets the same precision agriculture that previously required a hundred-thousand-dollar system from an industrial vendor. Edit one markdown file to adjust for clay soil versus sand. No developer needed. The viticulturist is the developer.

The supply chain. Each supplier, warehouse, and transport vehicle in a network runs an agent. The warehouse agent's markdown: "I'm the receiving agent for Warehouse 7. When goods arrive, scan the QR code, verify against expected shipments, flag discrepancies, update inventory, notify the distribution agent." Cross-company communication flows through the A2A protocol. A fifty-person manufacturer in Ho Chi Minh City participates in the same agent mesh as their buyer in Hamburg. The per-node cost is a thirty-dollar Android phone running an agent. Supply chain visibility—previously an SAP implementation costing millions—becomes accessible to SMEs. The agent definitions live in git. The factory manager writes them. The IT department barely knows they exist.

The enterprise mesh. A financial services firm runs compliance checking, trade reconciliation, client onboarding, and regulatory reporting—each as a minimal Kubernetes pod, each defined by markdown, each communicating via A2A sidecars, each scaling independently. Adding a new agent is a pull request containing prose files, reviewed by the compliance officer who wrote them. Not a three-month development project. Not a sprint planning session. A pull request, reviewed, merged, deployed in an afternoon.

In each case, the pattern is the same: the domain expert describes the behaviour. The runtime interprets it. The mesh coordinates it. The infrastructure—whether a cluster of microcontrollers or a Kubernetes namespace—is substrate, not structure.

The hard problems (honestly)

It would be irresponsible to sketch this vision without naming what's genuinely difficult. Three problems are blocking, and intellectual honesty demands we face them squarely.

Behavioural testing. This is the big one. In traditional software, a typo in code fails loudly—the compiler catches it, the test suite flags it. An ambiguity in prose fails silently. The agent does something almost right, and you don't discover the gap until production. We need scenario-based test frameworks for natural language specifications: describe a situation, assert the agent's response. Behavioural regression suites. Adversarial probing that tries to make agents violate their guardrails. None of this exists in mature form today. Whoever solves it first captures enormous credibility.

Security. A markdown file that defines an agent is functionally equivalent to executable code—it determines what the system does. A malicious markdown file is malicious code, but harder to audit because natural language hides intent more gracefully than Python. The field needs capability-based security (agents can only use explicitly granted tools), signed markdown bundles (like signed container images), runtime sandboxing, and comprehensive audit trails. The good news is that these are well-understood patterns from the container world; they need adaptation, not invention.

Coordination at scale. Individual agents can be remarkably capable. Getting a hundred of them to collaborate reliably is a different beast entirely. Research consistently shows that flat coordination fails catastrophically—a five per cent error rate on individual LLM calls compounds across a mesh until failures are virtually guaranteed somewhere. Hierarchical coordination patterns are essential: some agents must be designated orchestrators. The A2A protocol is the leading candidate for agent-to-agent communication, but it was designed for cloud-to-cloud scenarios and needs significant work for edge-to-edge deployments.

These aren't reasons to wait. They're the engineering challenges that define the next two years. And they're tractable—harder than building the runtime, but not harder than the problems the container ecosystem solved between 2013 and 2018.

The timeline: what's real, what's next, what's horizon

Today, in early 2026, the foundations exist. Agents defined by markdown work. On-premise inference via Ollama and vLLM is production-ready. Kubernetes can host agent pods. The pieces are real, shipping, and in use by early adopters.

Over the next one to two years, expect standardisation. A common format for markdown agent definitions. Small mesh deployments of ten to fifty agents becoming routine. Behavioural testing tools moving from "doesn't exist" to "early but usable." The A2A protocol maturing from first deployments to common practice. Kubernetes-hosted agent meshes reaching production grade.

On the five-year horizon: self-organising meshes of hundreds or thousands of agents. Agent marketplaces where you download a greenhouse climate agent and customise the temperature ranges for your region. Agents that improve their own markdown based on operational experience—powerful, and requiring strong guardrails. Natural language as the dominant interface for business automation.

The cost trajectory reinforces all of this. Inference cost is dropping by roughly an order of magnitude per year. The hardware is already commodity. The runtime will be open source. Which means the durable value concentrates in four places: the methodology for decomposing operations into agent specifications; the battle-tested markdown templates for specific domains; the frameworks for validating that prose-defined agents behave correctly; and the architectural expertise for designing meshes that are resilient, secure, and effective.

The provocation, restated

We are at the beginning of a transition where the dominant artefact of software creation shifts from code to prose. Where the "developer" for a supply chain agent is a logistics manager, not an engineer. Where infrastructure means a directory of markdown files and a cluster of devices cheap enough to lose without caring.

This doesn't eliminate software engineering—someone still builds the runtimes, the protocols, the testing frameworks. But it changes what most people do when they want a computer to do something for them. They describe it. In natural language. In a markdown file. And the system figures out the rest.

The microservices era taught us to decompose monoliths into small, independent services. The agent mesh era asks: what if those services weren't coded at all? What if they were described? What if deployment was a git push and scaling was plugging in another ten-dollar device?

Your next infrastructure might not be coded. It might be written.

And that changes everything about who gets to build it.

DEV Community