최해일

Posted on Jun 29 • Originally published at github.com

The Loadout Pattern: Handing the Wheel to an Autonomous LLM

#agents #ai #architecture #llm

The Loadout Pattern: A Structure I Found While Building With an AI

This isn't a post teaching you the One True Way.
Over a few days of building cron-driven autonomous routines with an AI, my code kept
converging on the same shape. It had no name, so I had to re-explain it every time — and
eventually we settled on calling it the Loadout Pattern.
This post writes that discovery down: naming it, defining it, and showing how to build it.

Honest disclaimer: I don't actually know whether this already exists somewhere under another name. The adjacent pieces — cron waking an agent, a model actively choosing its own tools, separating mission from mechanics — are scattered all over the place. What I couldn't find while building was one post that ties those three into a single named thing. So I named the structure I arrived at. If a prior name exists, treat this as the same mountain climbed by a different trail.

Audience: engineers building agentic/automation systems. There's code, and there's a bit of philosophy — because the philosophy is what makes the code shaped the way it is.

Two words, kept distinct (the whole post hinges on this):
a toolbox (or catalog) is every tool you own — the whole armory.
A loadout is the curated subset a routine equips for one mission — what it actually suits up with. The entire MCP server is a toolbox; a loadout is the handful of tools one routine is handed at wake.

The moment it clicked

It started small. The AI and I wrote one routine that wakes on a cron, digests overnight news, and posts a briefing. Then a queue-watcher. Then a ledger-reconciler. Somewhere in there I noticed I was re-typing the same curl, the same DB query, the same hardcoded IDs into every routine's prompt. Change one notification URL and I'd edit five places.

So I lifted the mechanics out of the prompts — and the same shape was left behind in every routine: a mission (the judgment) and a list of tool names the mission was allowed to use. That's it. At wake, the model received that list, equipped itself, and decided what to do. It was exactly like picking your weapons before a mission in a game. I started calling that equipped subset a loadout, and the whole structure built around it the Loadout Pattern.

The core idea

Conventional automation executes a procedure — code runs a fixed sequence of steps and decides nothing; same input, same path, every time. The loadout pattern keeps the steps but moves the deciding to the model. At each step the brain — an autonomous LLM — judges: what matters, which tool to reach for, whether to act at all. It's handed a purpose and the latitude to pursue it, and it drives — choosing its own tools as it goes. Code executes; the brain decides. Those tools come as a loadout — a curated, self-describing set drawn from a shared toolbox — and the brain is observed at the interface it calls, not by the side effects it leaves behind. The model is the driver; your system is the suit it wears. Everything below is how to build that.

Most LLM integrations bolt a model into your code. This is about the opposite: letting the model drive your system — equipping itself, on its own initiative, with a loadout: the curated, self-describing set of tools it picks for each mission. The system stops being the program that calls an LLM, and becomes the suit the LLM wears.

How this relates to prior work

This didn't come from a vacuum. A few neighboring trails I ran into while building, named honestly:

Routing cron through the agent (feeding scheduled triggers onto the same message bus the agent already consumes) is a known structure. The Loadout Pattern isn't about that "waking" half — it's about what the woken brain wears and sees.
There's research on letting a model actively request tools on demand rather than being pre-injected with all of them. Loadout solves that same "don't drown the model in tools" problem differently: with per-mission pre-curation — it suits up with the right subset at wake instead of discovering at runtime.

In short, I didn't invent a new part. I named one consistent way of assembling the parts.

The usual way, and the constraint hiding in it

In a typical LLM integration the model lives inside your process. Your code calls it:

answer = agent.invoke({"input": "What changed in the market overnight?"})

This is great for human-triggered work: a person asks, the system fetches and answers. The human is the caller; nothing happens until they show up. The LLM is a component — a function your program calls and pays per token to use.

This post is about the other mode: the LLM doing the work on its own initiative. A routine wakes on a schedule and gets on with it — digesting overnight news every hour, posting a morning briefing, watching a queue, reconciling a ledger. No one asked; the routine is its own caller. Wake the model on a cron — say, a headless Claude Code session every hour — and it is no longer a component inside your program. It's outside, periodically taking the wheel and deciding what to do.

The line that matters isn't human-vs-cron — and it isn't even steps-or-no-steps. It's executing versus deciding: a script runs its steps and decides nothing, while the brain — even when it follows steps — decides at each one, choosing its own tools toward the goal.

That inversion changes what your system should be.

The metaphor: you, the brain, and the suit

Three layers, and it matters which is which:

You are Tony Stark — the owner. You delegate intent and occasionally override.
The LLM is JARVIS — the brain. An AI that operates the suit autonomously on your behalf: it judges, acts, and reports, and you don't micromanage it. (Throughout this post, "the brain" means exactly this — the LLM that drives.)
Your system is the suit — sensors, memory, tools, power.

Here's the leverage that falls out of this: you don't hand-author JARVIS's intelligence. It comes from the model — and it improves when you swap in a better model, not when you write more code. What you build is the suit — what the brain can sense, remember, and do. So the central question of the whole system becomes: how do we equip the brain well — give it the right loadout — and let it reach for the right tool at the right moment?

The problem: skills that tangle mission with mechanics

When you first wire a cron-woken routine, you write a prompt ("skill") that mixes two very different things: the mission (what to judge, the actual work) and the mechanics (raw curl, database queries, hardcoded IDs). A real before-state:

# news-digest skill (before)
1. Query Mongo for new headlines since the watermark:
   docker exec db mongosh app --eval 'db.news.find({publishedAt:{$gt: ...}})...'
2. Decide which are new stories vs updates vs noise.  <- the actual mission
3. Post the briefing:
   curl -X POST http://localhost:9000/notify -d '{"type":"SIGNAL", ...}'
   Then create a Notion page: data_source_id "<your-notion-data-source>", icon, ...

Two problems compound. First, the mission (step 2 — judgment) is drowned in plumbing. Second, every other routine that needs to "post a notification" re-describes that same curl in its own prompt. Change the notification URL and you edit five skills. The mechanics are copy-pasted prose.

The pattern: a per-mission loadout + mission-only skills

Split the system along the seam between interface and implementation.

1. Tools are named capabilities — a stable name over a swappable implementation. Most are small, dumb, independent scripts, but the name is the only thing the brain depends on; what sits behind it is free to vary. Usually it wraps mechanics (a curl, a DB query, a stubbed no-op, a different backend tomorrow). But a tool can just as well hand off to another agent — a sub-brain with its own loadout — or trigger the next task in a pipeline. To the brain it's all the same: a name it can reach for. So a tool is sometimes an interface over mechanics, and sometimes the next move — another agent, or the start of the next step. notify sends a notification; read_news reads. They don't know about each other. Together, all of them are your toolbox (the catalog).

#!/usr/bin/env bash
# notify.sh - send a notification (hides the URL/payload mechanics)
set -euo pipefail
[ "${1:-}" = "--describe" ] && { echo "notify|action|send a notification"; exit 0; }
TYPE="$1"; TITLE="$2"; MSG="$3"
payload="$(jq -n --arg t "$TYPE" --arg ti "$TITLE" --arg m "$MSG" '{type:$t,title:$ti,message:$m}')"
curl -s -X POST "${NOTIFY_URL:-http://localhost:9000/notify}" \
     -H 'Content-Type: application/json' -d "$payload"

2. Tools describe themselves. One line, --describe, is the single source of truth for what the tool is. Not the skill, not a wiki — the tool.

3. A loadout assembler hands the brain its kit. Given a list of tool names (the loadout), it prints their self-descriptions. This is what the routine runs at wake — it downloads its loadout from the toolbox.

#!/usr/bin/env bash
# loadout.sh <tool> [tool...] - print the self-descriptions of the named tools (the loadout)
set -euo pipefail
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
echo "loadout for this mission"
for t in "$@"; do
  IFS='|' read -r name kind desc <<<"$(bash "$DIR/$t.sh" --describe)"
  echo "  - $name ($kind): $desc"
done

4. The skill becomes mission only. It states what to do and names its loadout. The descriptions arrive with the loadout, not re-written in the skill:

# news-digest skill (after)
## Loadout - download at start
bash tools/loadout.sh read_news write_story notify publish_notion

## Mission
Turn new headlines into a running ledger of stories: skip repeats, extend ongoing
stories, open new ones, ignore noise. Each morning, post a briefing from the ledger.

5. The brain thinks for itself — tools don't auto-chain. Keep notify and publish_notion separate; do not make "writing to Notion" secretly also send a notification. The moment you fuse two tools in the plumbing you've frozen a policy — you can no longer publish quietly, or notify without publishing. Leave the tools independent and let the brain reason about whether to call one, the other, or both. The thinking is the brain's job; the wiring must not pre-decide it.

From the model's point of view, this is the whole win. When the routine wakes, it receives two cleanly separated things: a mission — what to accomplish and how to judge it — and a loadout — the named capabilities it is allowed to use. It never has to excavate the how (a URL, a query, an ID) out of the what; the mechanics are simply not in its field of view, leaving only the decision and the set of moves available to make it. The skill carries judgment (which changes often); the toolbox carries capability (stable, shared); a loadout is just the names a routine picks from it. A new routine lists tool names and gets their descriptions for free — change a URL and you edit one tool, not five prompts.

Observability: log the interface, not just the result

Side effects are not proof. A notification arriving does not establish that the model invoked the tool, and a tool whose implementation is a no-op stub produces no side effect at all even when the model used it correctly. Verifying behavior therefore means observing the interface — the moment a tool is called — separately from what its implementation did.

Each tool logs at that boundary:

# _log.sh (sourced by every tool)
tlog() {  # tlog <event> [detail]   event: INVOKED | OK | DRY | ERR
  printf '%s | %-12s | %-7s | %s\n' \
    "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$(basename "$0" .sh)" "$1" "${*:2}" \
    >> "$LOG_FILE"
}

The log separates two questions that side effects conflate:

... | notify | INVOKED | SIGNAL | Morning briefing   # the interface was called
... | notify | OK      | SIGNAL HTTP 200             # the implementation sent it
... | notify | DRY     | SIGNAL                      # called, but did not send (DRY_RUN)

INVOKED records that the model used the tool, independent of any outcome; OK/DRY/ERR records what the implementation did. Because the model depends on the interface rather than the implementation, the same routine can run in a shadow mode — where notify only logs and never sends — with no change in the model's behavior. The boundary log is also the reliable way to audit a past run: it records what executed, not merely what the skill instructed.

A pattern, and its limit

This is a pattern, not a framework — and that's its honest limit: nothing enforces it at runtime. There's no base class, no inversion of control, nothing that prevents a routine from ignoring its loadout or inlining raw mechanics again. Adherence is a matter of discipline — or, if you want a guardrail, a CI check that every skill carries a mission and a declared loadout. What you get in exchange is lightness: nothing to install, and incremental adoption — one tool, one routine at a time, in any stack that runs a script and a prompt.

And one more honest note: this is a pattern I believe I found, not a validated standard. If you know a better name — or earlier prior art — I'd genuinely like to hear it.

Why it matters

Go back to the suit. You upgrade the brain by adopting a better model — that's not code you write, it's a model you swap in. Your day-to-day engineering goes into the equipment: what the brain can discover, reach for, and be observed using. And because the brain depends on interfaces — a loadout of named tools — the suit is model-agnostic: change the model and the same loadout still fits. A self-describing, observable loadout is precisely how the brain takes the wheel: it wakes, downloads the tools it's allowed, sees what it can do, and acts — and you can watch it do so at the interface, not by guessing from side effects. The system stops being a program that occasionally calls a model, and becomes a suit a capable model wears.

Recipe (TL;DR)

Tools = small scripts, one capability each, mechanics hidden — together, your toolbox. Add a --describe line and a boundary log (INVOKED + OK/DRY/ERR). Side-effecting tools support a DRY_RUN.
Loadout = an assembler that prints a routine's named tools' self-descriptions. The routine runs it at wake to download its loadout.
Skills = mission (judgment) + a loadout list. No mechanics. Let the brain compose tools; never auto-chain them in the wiring.

Runnable examples are in examples/.

DEV Community