DEV Community: Anzal Ansari

Determinism has left the chat

Anzal Ansari — Wed, 08 Jul 2026 09:15:16 +0000

Six ways agents rewrote the contract between you and your software

Agents have changed software as we know it. Here is something that was never true before: two users can send the identical message to the same product and get different work done, at different cost, over different durations, with different results. This is not a bug. It is the new contract between the user and the product.

I build with agents every day, and increasingly through them. What I keep noticing is that this isn't one change, it's six separate ones, and most teams still treat them as one thing ("we added AI"). Taken apart:

1. The request–response cycle has entirely changed

A request used to be a straight line. The user clicks, the server computes, the response renders. Every step was designed ahead of time by an engineer who knew exactly what would happen.

A request now involves a series of chained decisions, made at runtime: which model to use, what context to include, what to pull from cache, which tools to call, and when to stop and ask. Each decision trades the quality of the output against the cost the model incurs and the performance — per request, decided by software, not by an engineer at design time.

The engineering implication is that you're no longer building endpoints; you're building a decision policy. The product implication is the one at the top of this essay: software no longer promises to be deterministic.

2. Performance metrics have changed

Performance work used to mean shaving milliseconds off time-to-interactive. An agent task takes seconds, sometimes minutes, and users are fine with it — on one condition: they have to be able to see that real progress is happening.

This is the reason we have quick acknowledgements, eager acknowledgements ("On it — reading your files now"). It's the reason answers stream token by token instead of arriving finished. And it's why the modern equivalent of the loading animation narrates what it's doing, sometimes inventing new words while it works. The narration is what buys the patience.

So the metric has moved: from time-to-interactive to time-to-first-token, and from raw latency to something softer: whether the user trusts that the work is underway. A ninety-second task that narrates itself feels faster than a nine-second task behind a frozen screen.

3. Cost has become a design decision

Cost is immensely important now, and it's in the product's hands. Depending on how you handle the input context, what you cache, and more importantly, whether you choose the right model for each task, the cost incurred in serving one request can vary by 3 to 10x, or even more.

That variance used to be an infrastructure concern — engineers optimised for latency and let finance worry about servers. Now every feature has a per-request cost line, and margin is something you design. Route the summary to a small model and the reasoning to a large one, cache the system prompt, trim the context, and you've changed the unit economics of the product without touching a pixel.

If you're building on top of models, you should be able to say what one request costs you. If you can't, the pricing strategy is a guess.

4. The UI surface has changed

The standard, increasingly, is that most of the work is achieved through a chat UI. And the chat UI has grown a few more elements: a way to add additional context, and a canvas in which the output is displayed. Together these give the user one unified surface — the same place where you ask for the work, scope it, watch it happen, and receive the result.

This changes what interface design even is. The interesting design work has moved from "where do we put the button" to "what does the agent show, ask, and decide."

5. Mastery has become transferable

Traditional products had a different type of mastery curve. The UI/UX changed from product to product, and knowing Photoshop didn't mean knowing 3ds Max. That curve was also the moat — years of accumulated fluency kept professionals loyal.

With the chat UI, the learning transfers. If you know how to use Claude Code well, you likely know how to use Codex well, or you learn it very quickly — because the skill was never the tool. The skill is delegation: specifying what you want, scoping it, checking the result, and knowing when to step in.

That's great for users and uncomfortable for incumbents. UI lock-in is dissolving, and the moat is moving from "they'd have to relearn everything" to "we remember everything about them" — that is, from interface to memory and context.

6. The security model has changed

Traditional software could only do what its endpoints allowed. An agent with tools can do anything its tools allow — read files, call APIs, spend money. So the old questions ("is this endpoint safe?") have been joined by a new one: what is this agent permitted to do on my behalf? Scoping the tools is the new least-privilege.

The inputs have become adversarial too. With prompt injection, the attack doesn't have to come through your network edge: a webpage, an email, a README that the agent reads can carry instructions aimed at the agent itself. The trust boundary has moved from the network edge to the context window, and that is a genuinely new place for it to be.

And the user-facing side of security has changed with it. The confirmation dialog has grown into permission modes, plan modes, and "ask before running": how much trust to extend is now a setting the user chooses. Budget caps are the new rate limits, because an agent spends money, not just compute. The audit log has become a product surface: users read the agent's action history the way admins used to read server logs.

What to do about it

If you're building product in this world:

Design the wait. Acknowledge the request instantly, stream the output, and narrate progress while the work runs.
Measure new things. Time-to-first-token, cost per task, and intervention rate, meaning how often the user has to step in.
Treat cost as a feature. Model routing and caching are product decisions. Put them on the roadmap.
Assume your users arrive pre-trained. They learned delegation somewhere else. Don't make them learn your dialect of it.
Scope the tools and cap the spend. Least-privilege applies to agents more than it ever applied to endpoints, and the budget is part of the permission.

The losing move, visible across the industry right now, is bolting a chat box onto an existing product and calling it agentic. That is what happens when these six changes are treated as one.

Each of these is worth talking about more — every section here could be its own piece. But that's for later.

Thank you for reading!

# Zero to Agent Swarm, Part 2: A Team of Agents

Anzal Ansari — Sun, 22 Mar 2026 10:21:56 +0000

This is the second part of the Zero-to-Agent-Swarm tutorial.
In the first part, we went from zero to a working AI agent. If you want to check that out, it’s here:

← Part 1: Birth and Upgrades — building a single agent from scratch.

This part is about going from one agent to an agent swarm — making agents work together for us.

We’ll need a new mental model. In Part 1, the model was about what an agent is — Triggers + loop(LLM + Tools + Memory). Now we need to think about what agents do when there are many of them. We need to consider them — I hate to say this — more like humans. Not because AI is sentient, no, not by a large measure. But because we’ve built computers to resemble human functions, and we have centuries of experience making human systems productive. We can steal a lot of that for agent systems.

Here’s the progression:

Specialization — same code, different agents
Delegation — agents calling agents
Workspace — shared state for coordination
Project execution — structured plans with parallel execution

1. Specialization — Same code, different agent

Adding to the model: the **Genome* that defines each agent.*

One agent is useful. But real work often needs specialists — a researcher, a coder, a reviewer — each with their own tools, memory, and responsibilities. A single agent can context-switch between roles, but it loses focus. Dedicated agents stay sharp — and they can work in parallel.

What makes one agent different from another?

Its Thinking (which model, what system prompt)
Its Memory (what it knows)
Its Tools (what it can do)
Its Triggers (what wakes it up)
Its Container (what it can see)

Package these together into a config — the agent’s genome — and from one codebase you can spin up as many specialized agents as you need.

Explanation · Code · Skill

But spinning up specialists isn’t enough — right now they’re isolated. Each one works alone. How do we get them to collaborate?

2. Delegation — Agents calling agents

Adding to the model: **ask_agent, the simplest possible multi-agent pattern.

The first step towards a team is asking for help when needed. If the coder needs documentation, why write it itself when there’s a writer agent that can do it better and cheaper?

The simplest way: let an agent call another agent as a tool.

User → Researcher: "Get the weather in Toronto and have someone write a summary"
         ├── weather: checks Toronto weather
         ├── ask_agent("writer", "summarize the weather in Toronto")
         │     └── Writer runs → returns summary
         └── delivers: "Toronto weather summary"

The implementation is straightforward: we add a new tool ask_agent. One agent’s loop runs inside another agent’s loop — the same pattern from Part 1, just nested.

Explanation · Code · Skill

This works, but it has a bottleneck: all information flows through the delegator. The researcher becomes a middleman, passing data between agents it doesn’t need to understand. What if agents could share data directly?

3. Workspace — Shared state for coordination

Adding to the model: a **Global Workspace* where agents coordinate through shared tasks and artifacts.*

A global workspace solves the middleman problem. It’s a shared directory on disk with two coordination primitives: Tasks — a shared to-do list where the manager posts work and specialists claim it, and Artifacts — a key-value store where agents drop research findings, drafts, or anything another agent might need. We also appoint a manager agent — the bridge between the user and the specialists.

The manager loop

A manager agent drives the whole thing. Its identity is simple: break the goal into tasks, delegate each to a specialist, check progress, repeat until done. The manager never does the work itself — it orchestrates.

Without the workspace, the manager has to micromanage — relaying data between agents like a middleman. With the workspace, agents self-serve: the manager says "check the workspace for open tasks" and each specialist claims work, reads artifacts for context, does the job, and marks it done. The manager doesn't relay data — it just points agents at the workspace and checks progress.

This is the difference between a manager who dictates every detail and one who says "the work's on the board — go."

Explanation · Code · Skill

Checkpoint: We now have a manager agent that breaks work into tasks, delegates to specialists who coordinate through a shared workspace, and loops until everything is done. That's a working swarm. But everything still runs one task at a time — even when tasks are independent.

3.5 Intermission — A web UI

Before we tackle that, let's make the swarm easier to watch. Reading JSON files and terminal output gets old. A web dashboard gives you a live window into everything at once — tasks on a kanban board, agents chatting, artifacts appearing, log events streaming in.

Explanation · Code · Skill

4. Project execution — From flat task lists to DAGs

Adding to the model: a **Task Tree* that captures dependencies, enabling parallel + serial execution.*

The workspace gives us coordination, but the execution is flat. The manager posts tasks one-by-one, checks progress in a loop, and everything runs serially — even when tasks have nothing to do with each other. Real projects have structure: some things must happen in order, others can happen at the same time.

A DAG (Directed Acyclic Graph) captures what actually matters: which tasks depend on which. Everything else can run simultaneously. The manager thinks in task trees — nested groups marked as sequential or parallel — and the runtime flattens them into a dependency graph, executing waves of unblocked tasks with Promise.all. Dependent tasks automatically inherit the results of their prerequisites, so agents never duplicate work.

The difference: a flat plan says "check Toronto, then London, then compare" (serial — slow). A DAG says "check both cities at the same time, then compare" (parallel where possible — fast). The DAG finds the fastest path through the work.

Explanation · Code · Skill

Hurray! We now have a manager agent that decomposes goals into structured task trees, executes them as DAGs with maximum parallelism, passes context between dependent tasks, and visualizes the whole thing in a live web dashboard.

There are many more concepts to explore — reliability, error recovery, human-in-the-loop, cost control, evaluation — and I'm planning to publish them as smaller, standalone pieces alongside this series.

Thanks for reading! Follow me for more first-principles breakdowns of modern AI systems.

Zero to Agent Swarm: A hands-on guide to building AI agents from scratch

Anzal Ansari — Wed, 11 Mar 2026 13:13:34 +0000

Is this for you?

This tutorial is for engineers who already know how to build software but want to understand the agent ecosystem. The code is in TypeScript / Node.js — familiarity helps but isn't required. We'll use Docker in Phase 2, and you'll need an LLM API key (the default is Gemini, but any provider works — just swap the LLM call).

The goal isn't to walk you through a codebase. It's to give you the thinking tools to design agent architectures, features, and systems — so that by the end, you understand how a single agent works and how multiple agents coordinate.

The mental model

Here is the model we'll build toward, one piece at a time:

Agent = Triggers → Loop(Thinking + Tools + Memory), inside a Container

Triggers are what start the loop. A user message is the obvious one, but triggers can also be a schedule, a webhook, a file appearing in a directory, or another agent handing off a task.

Thinking is where the LLM lives. On each iteration, the agent looks at its goal, what it knows so far, and what just happened — then decides what to do next. This includes perceiving the input (understanding what arrived before reasoning about it) and assembling context — the system prompt, identity, memory, conversation history, and latest observation that get sent to the LLM.

Tools are how the agent acts on the world. In this model, the agent has no default output channel — it can't print, respond, or signal completion without a tool. Text generation is just internal reasoning until a tool carries it somewhere. That's a deliberate design choice: it forces you to think explicitly about every action the agent can take, because nothing happens implicitly. The set of tools you give an agent is its interface with the world.

Memory is what persists across iterations. Without it, each thinking step starts from scratch. With it, the agent can accumulate facts, track progress, and avoid repeating itself. Memory can live in the context window, in a file, or in a database — depending on how long it needs to last.

The container is the environment the agent runs in. It's easy to overlook, but it matters: it defines what tools are available, what the agent can access, and — critically — what it can't damage. A well-designed container is what makes autonomy safe enough to actually grant.

A note on loops: The formula shows one loop — but real applications are rarely that flat. In practice, loops nest. A single agent might run an inner loop to complete a subtask, while sitting inside a larger loop that coordinates multiple agents, manages retries, or waits for external triggers. An agent swarm is really just loops containing loops, with handoffs between them.

The roadmap

We build in three phases:

Phase	Goal	What you'll have
1. Birth	Build a single agent from scratch	A local assistant that can explore your filesystem
2. Upgrades	Make it powerful and safe	Memory, a Docker container, bash, autonomy
3. Swarm (coming soon)	Run multiple agents together	Specialized agents coordinating on tasks

Let's build one.

Phase 1: Birth of an Agent

We start even simpler than an LLM call — a plain input/output loop with no intelligence at all — and build up step by step until we have something that genuinely qualifies as an agent.

1. Make it talk. A Channel = 1 Trigger + 1 Tool

Adding to the model: the first **Trigger, the first **Tool, and therefore the first **Channel.

A message arriving is a Trigger. A reply going out is a Tool — not in the API "tool use" sense, but in the first-principles sense: it's a capability the agent uses to act on the world. Together, a Trigger and a Tool form a Channel: something that listens and something that speaks back.

We start with the simplest possible version: a REPL. You type something, it prints it back. No LLM, no logic. Just the Channel: input in, output out.

This is the scaffold everything else will hang on.

Explanation · Code · Skill

2. Make it think.

Adding to the model: Thinking.

Now we wire in the LLM — the Thinking layer. The input still comes in through the same Channel, but instead of echoing it back, we send it to the model and return what comes out.

Think of it like the association cortex — it takes input and transforms it. Tokens in, tokens out. The conversation history acts as working memory: the agent remembers what was said in this session, but nothing beyond it.

At this stage we have a Channel + Thinking — a traditional chatbot. It can reason and respond, but it has no persistent Memory and no Tools beyond replying.

Explanation · Code · Skill

3. Give it a choice. A second tool.

Adding to the model: more Tools.

Replying to the user is already a Tool — the first one. Now we add a second. This is what gives Thinking a choice: based on the input, the LLM decides whether to reply directly or invoke the other Tool. If it doesn't invoke anything, the default action fires — reply to the user.

For this step we'll use a list_files Tool — it lists the contents of a directory. It's a good first Tool because it's read-only and relatively safe. The agent can look around but can't break anything.

Explanation · Code · Skill

4. Give it a decision loop

Adding to the model: the **Loop* that binds Thinking, Tools, and Memory.*

Right now the agent thinks once and acts once. Without a loop, it shoots in the dark — it uses a Tool and stops. It doesn't check whether the action worked. It doesn't know if the task is done. It doesn't report back. It just stops.

The Loop is the engine at the center of the model. A Trigger fires, and the Loop takes over: think, act, observe the result, think again. Tools act on the world from inside the loop — every iteration can produce side effects. The loop exits when Thinking decides the task is done and replies to the user. If a tool call fails, the agent sees the error and adapts — retry, try something else, or give up and explain why.

This is what separates a chatbot from an agent. The Loop turns Thinking + Tools + Memory from a one-shot into a sustained process. Memory is what gives the loop continuity — without it, each iteration would be blind to what the agent just tried.

Stopping conditions. The loop needs two exit mechanisms: the LLM decides the task is done (produces a final response instead of a tool call), and a hard cap on iterations (MAX_STEPS) so a confused agent can't loop forever.

Trigger
   │
   ▼
Think
   │
   ▼
Choose Tool
   │
   ▼
Execute Tool
   │
   ▼
Observe Result
   │
   ▼
Done? (or max steps?)
 ├─ yes → respond
 └─ no  → Think again

Explanation · Code · Skill

Checkpoint: We now have Triggers → Loop(Thinking + Tools + working memory). That's a working agent.

Phase 2: Upgrades

Now that we have a basic agent, we'll fill in the rest of the model: upgrade Memory from ephemeral to persistent, build the Container, give the Loop more powerful Tools, then add more Triggers so it can act on its own.

1. Better memory

Upgrading the model: persistent **Memory.

The Loop already has working memory — the conversation history that accumulates as the agent thinks and acts. But it's ephemeral. Once the session ends, it's gone. Apart from the model's built-in knowledge and whatever is in the system prompt, the agent has nothing to draw on next time it wakes up.

We upgrade Memory with persistence in two ways:

Always loaded — files that get injected into every session automatically:

identity.md — who the agent is, how it behaves. Human-curated. Stable.
notes.md — what the agent has learned across past sessions. Agent-curated. Grows over time.

Retrieval-based — when memory grows too large to load in full, the agent queries it instead. Embeddings and vector search let it pull only what's relevant to the current task. Out of scope for this tutorial, but the mechanism is straightforward: more Tools that query an embedding store.

Explanation · Code · Skill

2. Stronger containment

Adding to the model: the Container.

With more Tools comes more risk. As we give the agent more capabilities and more autonomy, mistakes get expensive. At the application level, we can restrict what the agent is allowed to do — but these controls are code, and code has bugs.

The safer approach is an OS-level Container — in our case, a Docker container. Here we define exactly what the world looks like for the agent: what filesystem it sees, what it can touch, what it can't. If the agent goes wrong and tries to delete everything it knows, your actual data stays safe.

We set up the Container now, before giving the agent more power. Safety first.

Explanation · Code · Skill

3. Bash access — real power, safely contained

Expanding the model: powerful Tools, safely contained.

Now that the Container is in place, we can safely give the agent real power.

Let's give it bash — the most versatile Tool there is. The agent can now run the code it writes, do git operations, install packages, and — if we allow it — modify its own codebase.

Explanation · Code · Skill

4. File watcher + clock — the agent wakes itself

Expanding the model: more Triggers for autonomy.

Currently, the agent wakes up when you message it, does its work, saves to Memory, and goes back to sleep. Your message is the only Trigger.

What if you want more? We add two new Triggers: a file watcher that fires when something changes in the workspace, and a clock that fires on a schedule. Both feed into the same Loop — the agent doesn't care which Trigger woke it up.

Explanation · Code · Skill

Checkpoint: Our agent now has all the pieces — Triggers → Loop(Thinking + Tools + Memory), inside a Container. Time to multiply it.

Voilà - we now have a functional agent. The core building blocks are in place. From here you can extend it in many directions: add new channels like WhatsApp or Slack, give it more tools, introduce new triggers, or spin up additional agents that coordinate with it.

Phase 3: A Party of Agents (coming soon)

One agent is useful. But real work often needs specialists — a researcher, a coder, a reviewer — each with their own Tools, Memory, and responsibilities. A single agent can context-switch between roles, but it loses focus. Dedicated agents stay sharp — and they can work in parallel.

What makes one agent different from another? Its Thinking (which model, what system prompt), its Memory (what it knows), its Tools (what it can do), its Triggers (what wakes it up), and its Container (what it can see). Package these together into a config — the agent's genome — and from one codebase you can spin up as many specialized agents as you need.

Phase 3 will cover:

Agent genome — Package the model into a config. Same code, different capabilities.
State Management - Shared memory, context pruning, and thread persistence.
Agent teams — Coordination patterns: serial handoffs, parallel work, shared context.
Routing and orchestration — An outer agent that reads a task, picks the right specialist, and manages the workflow.

Thanks for reading! Follow me for the next part and more first-principles breakdowns of modern AI systems.