DEV Community: Dhrupo Nil

I gave 8 AI agents an island and watched a society emerge — wars, gossip, grudges, and peace

Dhrupo Nil — Sun, 14 Jun 2026 07:30:50 +0000

Tiny Civilization: what happens when AI agents have to live together

I grew up on Age of Empires, Sid Meier's Civilization, and Rise of Nations. The thing that hooked me was never the graphics — it was the systems. You set a few rules in motion and a whole world spills out of them: economies, rivalries, alliances, betrayals.

Years later I watched OpenAI's hide-and-seek multi-agent video (writeup), where agents that were only rewarded for hiding and seeking invented tools and counter-strategies nobody coded — ramps, box-surfing, fort-building. Emergent behavior from simple pressure. That broke something open for me.

So I asked a smaller question: forget winning a game — what if AI agents just had to live in a society together? Would they behave like us? Hold grudges? Gossip? Make peace because they're tired of fighting?

That became Tiny Civilization — a browser sim where 2–8 agents with distinct personalities live on a small island, gathering, building, trading, stealing, gossiping, holding grudges, making peace, and remembering it all across lives.

👉 Live demo — runs keyless in "instinct mode," or plug in a key for LLM minds.

The whole thing — every line — was built with Claude Code, using the Fable model, right before Fable retired. It felt fitting to send a storytelling model off by having it build a world full of little stories.

The problem: pure-LLM agents are bankrupting and pure-utility agents are boring

The first design decision was the hardest. Two obvious options, both bad:

Call the LLM every tick. Every agent, every day, makes an API call. Beautiful, expressive — and it costs a fortune and crawls.
Pure utility AI (the classic RTS approach). Fast and free, but agents can't scheme, can't talk, can't surprise you. It's just min-maxing.

So I split the brain in two:

Layer	Decides	Cadence	Cost
LLM mind	Strategy (`gather`/`build`/`trade`/`befriend`/`aggress`/`reconcile`/`defend`), per-neighbor stances, an inner thought, and all dialogue	~every 15 sim-days	~150 calls / 1,000 days
Utility engine	Each day's concrete action — eat, sleep, gather, steal, attack, gift, trade, make peace	every tick	free, local

The LLM declares intent — "aggress against Kai, he raided my base" — and that biases the utility scores for the next two weeks. The body runs on instinct (hunger, energy, storms); the mind sets direction. This is the trick that makes it both affordable and alive.

Memory across lives — where it got strange

When a run ends, each agent's life is distilled into memory lines:

"you won with score 200"
"Maya destroyed your home"
"you and Kai made peace after a feud"
"this life hardened you — you trust less now"

Stored in localStorage, keyed by agent name, and injected into next run's prompts. Agents start referencing past lives in dialogue, pre-emptively paying reparations to remembered enemies, trusting remembered allies — sometimes to their own ruin.

How I actually built and balanced it

This is the part I'm proudest of, and it's pure childhood-strategy-game energy: you can't balance a society by vibes. So the workflow was:

A pure, deterministic simulation core — zero DOM, zero AI. The same runTick powers the browser, the tests, and a batch runner.
A seeded experiment runner. npm run experiment -- --runs 30 --days 1000 --seed 1 runs 30 reproducible lifetimes and spits out a win-rate/score table. Every balance change landed with a before/after table. (Example: a Hermit rebalance moved one agent from 0/30 wins to 9–11/30 without breaking the other archetypes.)
A 16-gate regression suite. The justification gate (no grievance → no violence), war burnout, reconciliation pricing, positive-sum trade, granary protection, homelessness-death, trait drift — each one locked behind a headless test so balance changes can't silently regress behavior.

Change a dial in constants.ts → run the experiment → read the table. That was the entire loop.

What emerged (none of this is scripted)

Running the same island over and over, with memory on, produced a coherent arc:

Massacres. Early on, the warrior just killed everyone. No deterrence existed.
Forever wars. I added a justification gate (violence needs a real grievance — theft, attack, trespass). That fixed unprovoked killing… but now wars never ended: 495 fruitless attacks across 1,500 days.
Diplomacy. Reconciliation + escalating reparations + war-weariness made endings inevitable. Attacks per 2,000-day run collapsed: 594 → 14 → 0.
The kleptocracy. With war capped, theft became the unpunished crime — 340 thefts/run. I fixed it the human way: granaries. Fortification, not punishment.
The golden age. A clean-slate run, no memories: zero attacks in 1,000 days, and the Warrior won by out-trading everyone (118 trades, 1 attack).
The fall. The very next run — now remembering that golden age — collapsed. Remembered trust lowered everyone's guard, which raised the payoff of betrayal. Scores dropped ~15%; every relationship ended negative. Peace between strangers turned out to be easier than peace between old friends with open tabs.

The recurring lesson: every time I patched one form of conflict, the agents found the next-cheapest one. Massacres → wars → theft → litigation. Exactly like us.

Stack

TypeScript, React, Zustand, Vite, Recharts. Default mind is z.ai GLM, but any OpenAI-compatible provider works per-agent — so you can literally pit Claude vs GLM vs Gemini in the same village and watch model-vs-model diplomacy. Keys never touch the browser (server-side proxy), and an adaptive-pacing controller learns each key's real rate ceiling.

Try it: https://multiagentciv.netlify.app/
Code: https://github.com/dhrupo/multi-agent-civilization

If you played the same strategy games I did, I think you'll feel right at home watching this thing run.

I translated my 7,000-line AI-coding handbook with 12 parallel AI agents

Dhrupo Nil — Sun, 07 Jun 2026 09:32:01 +0000

Last week I shipped a complete AI-coding handbook in Bangla — 79 terms decoded with everyday analogies, hands-on CLI guides, a 6-chapter story, and a 30-day practice path. The premise was the vocabulary wall: words like "token", "context window", and "harness" sound like spells until someone explains them plainly.

Turns out the vocabulary wall isn't a Bangla problem. It's a beginner problem.

So today the whole book exists in English:

🔗 https://github.com/dhrupo/dictionary-of-ai-coding-english

What's inside (the 60-second version)

📖 Part 1 — The words. 62 terms across 7 sections, each with a daily-life analogy instead of a textbook definition. A model is a calculator that never presses its own buttons. And Goldi 🐠 — a goldfish with a 3-second memory — keeps reminding you that models are stateless too.

🛠️ Part 2 — The tools. 7 hands-on guides: real CLI commands (Claude Code + Codex), AI-friendly folder structure (AGENTS.md, .claude/), bad-prompt → good-prompt rewrites, popular community skills (brainstorming, grill-me, TDD, handoff…), 17 extra terms (RAG, embeddings, temperature, hooks…), token economics, and safety — prompt injection explained with a postman analogy.

📜 Part 3 — The story. Nothing to install: Rafi, a 9th grader, builds his first portfolio site with an AI agent, makes every classic mistake (vague prompts, a hallucinated library, dragging a session deep into the dumb zone), and recovers using the tools from Part 2. You read over his shoulder.

🗓️ The 30-day path. One 5–15 minute mission a day. The first two weeks need nothing but a free AI chat. Plus a myth-busting FAQ, a 79-term index, and copy-paste templates/.

The fun part: the book translated itself the way it teaches you to work

Here's the meta-story. The handbook spends three chapters teaching subagents, handoff artifacts, and verification-before-completion. So when it came time to translate ~7,000 lines of Bangla markdown across 36 files… I used exactly that.

12 parallel AI agents, one session. Each agent got the same conventions block — character names (Goldi, Rafi), recurring section labels, file renames — plus its own slice of the book: three dictionary chapters here, the whole 6-chapter story to a single agent there (one translator = one consistent voice).

Three things made it work:

1. Deterministic rules beat agent judgment. The Bangla book's term headings were ### বাংলা-শব্দ (English Term). The convention: the English heading becomes exactly the parenthetical term. That one rule made every cross-file anchor predictable — agents working on different files in parallel could link into each other's not-yet-written chapters and have the slugs match.

2. A machine-verifiable gate, not vibes. The repo ships a tiny CI link checker (scripts/check-links.py) that validates every markdown link and anchor, GitHub slug quirks included. First run after the agents finished: 44 broken links. Leftover Bangla anchors, one agent guessing #esc-stop where another wrote ### ESC (stopping), a third doubling slugs like #-grill-me-grill-me. Twenty minutes of scripted fixes later: 36 files, every link alive. That checker now runs on every PR.

3. Verify, don't assume. The index agent didn't compute its 79 anchors — it read the actual translated files and extracted the real headings. Zero misses. Meanwhile an asset agent scanned every terminal-GIF source script for Bangla characters and found… none. The GIFs had been authored in English all along. The "re-render 26 GIFs" task evaporated because an agent checked instead of assuming.

If that workflow sounds useful, it's literally what Part 2 and Part 3 of the book teach — subagents, handoffs, verification — just pointed at itself.

It's a community thing

Everything is CC BY 4.0, in both languages, and the two editions cross-link. Issue forms for "I found a mistake" and "suggest a new word" are ready in both repos.

💙 Built on the shoulders of Matt Pocock's Dictionary of AI Coding — Parts 2 and 3 grew far beyond it, but the soul is his.

And the offer from the Bangla launch stands: if you speak another language, steal this structure. The deterministic-heading trick + a link checker means the next translation is mostly an afternoon of agent-wrangling. I'd genuinely love to see a Spanish, Hindi, or Vietnamese edition.

🔗 https://github.com/dhrupo/dictionary-of-ai-coding-english
🇧🇩 Prefer Bangla? https://github.com/dhrupo/dictionary-of-ai-coding-bangla

I built a complete AI-coding handbook in Bangla - words, tools, a story, and a 30-day path

Dhrupo Nil — Fri, 05 Jun 2026 17:14:15 +0000

A year of AI-coding discourse taught me one thing: the hardest part isn't the tools — it's the vocabulary wall. If English isn't your first language, "token", "context window", "hallucination", "harness" don't sound like concepts. They sound like spells. 🪄

Matt Pocock's brilliant Dictionary of AI Coding showed how teachable these words really are. So I adapted it into Bangla — and then kept going, way past where the original stops. Today it's a complete, free, open-source learning path:

🔗 https://github.com/dhrupo/dictionary-of-ai-coding-bangla

The four-stage arc: শব্দ → হাতিয়ার → গল্প → অভ্যাস

(words → tools → story → habit)

📖 Part 1 — The words. 62 terms across 7 sections, each with a daily-life analogy instead of a textbook definition. A model is a calculator that never presses its own buttons. Parameters are mixing-board knobs. And there's গোল্ডি (Goldie) — a goldfish with a 3-second memory who keeps reminding you that models are stateless too. 🐠

🛠️ Part 2 — The tools. This is where it goes beyond the original: 7 hands-on guides covering real CLI commands (Claude Code + Codex, mapped back to the concepts — /compact is just the Compaction button), AI-friendly folder structure (AGENTS.md, .claude/, who reads what and when), bad-prompt → good-prompt rewrites, the popular community skills (brainstorming, grill-me, TDD, handoff…), 17 extra terms (RAG, embeddings, temperature, hooks…), token economics, and safety (including prompt injection, explained with a postman analogy).

📜 Part 3 — The story. Instead of a tutorial you have to install things for, it's a 6-chapter read-along: রাফি, a 9th grader, builds his first portfolio site with an AI agent. He makes every classic mistake — vague prompts, trusting a hallucinated library, dragging a session deep into the dumb zone — and recovers using the tools from Part 2. Margin notes call back every concept at the exact moment it bites. The reader installs nothing.

🗓️ The 30-day path. Reading isn't owning. So the book ends with 30 daily missions (5–15 min each). The first two weeks need zero setup — just any free AI chat and open eyes ("ask the same question twice — did the answers match?"). Weeks 3–4 go hands-on, but every mission has a read-only alternative.

Plus: a myth-busting FAQ ("AI will eat my job", "you need a CS degree"), a 79-term alphabetical index, copy-paste templates/ (a starter AGENTS.md, a working /handoff command, a SKILL.md example), and 26 terminal GIFs — all reproducible from committed VHS scripts.

Things I learned building it

Analogies are load-bearing, not decoration. "Prefix cache" never landed until it became "the waiter remembers your usual order."
A story teaches sequencing in a way lists can't. You can document /compact vs /clear vs handoff perfectly and people still won't know when. Watching রাফি lose his color-scheme decision to a long session — that sticks.
Bengali text breaks your tooling in fun ways. grep couldn't match half my content (decomposed Unicode), GitHub slugs for emoji-prefixed headings start with a stray hyphen, and nested code fences need 4-backtick outers. The repo's link checker is a tiny Python script for a reason.
Beginners deserve the real thing. Simplified ≠ dumbed down. The dictionary keeps the exact English terms visible everywhere, so readers can walk into any English doc afterward and feel at home.

It's a community thing

Everything is CC BY 4.0. There are Bangla issue forms for "I want a new word" and "I found a mistake" — contributions of any size welcome, from a typo to a better analogy to a whole new section.

And if you speak another language: steal this structure. The vocabulary wall exists in every non-English-speaking dev community, and the fix is surprisingly fun to build. 💙

🔗 https://github.com/dhrupo/dictionary-of-ai-coding-bangla

Your AI Coding Agent Wastes 80% of Its Context. Fixed That with Graph Theory.

Dhrupo Nil — Mon, 25 May 2026 13:45:00 +0000

The problem nobody admits

When you give Claude Code, Cursor, or Codex a task like "fix the login validation bug", here's what they usually do:

Run grep -l login src/ → 17 files
Read all 17 files top-to-bottom (because context is "free")
Spend 80% of the model's context window on irrelevant imports, type aliases, and helper functions the bug doesn't touch
Generate a fix using whatever 20% of attention is left

This works. Sort of. But it's wasteful — and on big codebases, it's wrong: the agent runs out of context before it sees the actual buggy function.

The instinct is to throw a bigger model at it. Bigger context window, fancier RAG, vector embeddings. All of which trade real cost for diminishing returns.

There's a better answer that's been sitting in classical CS the whole time: treat the repo as a graph.

The idea, in one paragraph

Your codebase already is a graph. Functions call functions. Modules import modules. Classes extend classes. Pick a node (the symbol your task is about), and the structurally-closest neighborhood is almost certainly what an agent needs to see.

So I built mincut-context — an npm package that:

Parses your repo into a symbol graph (tree-sitter, supports TS/JS/Vue/Python/PHP)
Derives seed nodes from your task description (keyword IDF on symbol names + file paths)
Runs personalized PageRank with the seeds as the restart vector
Picks the minimum-cut subgraph that fits a token budget you choose

The output: a list of files + line ranges that an agent should look at. Nothing more, nothing less.

Show me the numbers

I built an evaluation suite into the repo itself. 28 hand-labeled tasks across 3 real codebases at a 4,000-token budget:

strategy	precision	recall	F1	token-efficiency
mincut	0.27	0.83	0.39	0.270
mincut + `--embed` (semantic)	0.27	0.83	0.39	0.270
grep keyword baseline	0.11	0.42	0.16	0.105
random selection (control)	0.01	0.04	0.01	0.009

Per-repo breakdown:

repo	tasks	mincut recall	grep recall	mincut F1	grep F1
mincut-context (self)	12	0.97	0.56	0.44	0.30
FluentForm (PHP+Vue+JS)	8	0.88	0.13	0.43	0.04
Fluent Player (TS/JSX)	8	0.63	0.56	0.31	0.13

mincut catches ~2× more of the correct files than grep, at ~2.5× better token efficiency. Reproducible with npm run eval. Add your own labeled tasks under eval/fixtures/ to score against your own codebase.

The math, briefly

Given a symbol graph $G = (V, E, w)$ where:

$V$ are code units (functions, classes, methods)
$E$ are dependency edges (imports, calls, references)
$w(v)$ is the token cost of including symbol $v$
$B$ is your token budget
$S \subseteq V$ are seed nodes derived from the task

Find $T \supseteq S$ with $\sum_{v \in T} w(v) \le B$ minimizing the boundary cut cost:

$$\text{cut}(T, V \setminus T) = \sum_{e \in E, \text{ crossing}} w(e)$$

In plain English: pick a connected, low-token region that has few "loose ends" pointing outside it. The inside of the cut is what the agent needs; the outside is safely ignorable.

The objective is submodular, so a greedy algorithm gives a $(1 - 1/e) \approx 0.63$ approximation guarantee. The full pseudocode is in the README; the implementation is ~200 lines in src/core/select.ts.

Three ways to use it

1. As an MCP server — recommended for agents

Drop this block into your Claude Code / Codex / Cursor settings:

{
  "mcpServers": {
    "mincut-context": {
      "command": "npx",
      "args": ["-y", "mincut-context", "mcp"]
    }
  }
}

Your agent now has six new tools: pack_context, expand_node, find_callers, find_callees, search_symbols, explain_selection. They operate on the cached graph from the most recent pack_context call — effectively free traversal after the first pack.

2. As a CLI

npm install -g mincut-context

mcx pack "fix the login validation bug" --budget 4000             # plain output
mcx pack "..." --format tree                                       # directory-grouped
mcx pack "..." --format json | jq                                  # pipe to anything
mcx pack "..." --interactive                                       # Ink TUI: vim keys + preview
mcx pack "..." --embed                                             # semantic seeding
mcx pack "..." --cache                                             # 5× warm-run speedup
mcx watch "..." --debounce 300                                     # re-pack on file change
mcx doctor                                                         # environment self-check

mcx doctor is my favorite — it tells you in 6 lines what's installed and what isn't:

3. As a library

import { pack } from 'mincut-context';

const result = await pack({
  task: 'fix the login validation bug',
  repo: process.cwd(),
  budget: 4000,
  cache: true,
  parallel: 4,
  chunk: { enabled: true, maxTokens: 400 },
});

for (const f of result.files) {
  console.log(f.path, f.score.toFixed(3), f.tokens, '·', f.reasons[0]);
}
// → src/auth/login.ts        0.541  612 · seed — matched directly by task
// → src/auth/session.ts      0.408  483 · attached (60%)

What I learned by building this

1. Embeddings are oversold for this problem

Adding semantic embeddings (--embed flag, via @xenova/transformers running locally) did not improve recall on any of my three eval task sets. Why? Because the labels were named honestly. When you label "stripe payment processor" → StripeProcessor.php, the keyword match catches it without help. Embeddings only earn their keep when your task vocabulary diverges from the code's — "centrality and ranking" → PageRank, that kind of gap.

I left --embed in because it doesn't hurt, and there are real users whose mental model doesn't match the code. But the marketing-friendly "AI-powered" framing for this stuff is mostly noise.

2. Greedy beats CELF for this objective

I implemented CELF (Cost-Effective Lazy Forward, Leskovec 2007) hoping for a free speedup over the naive greedy. It diverged — not just slower (8× slower on FluentForm) but wrong: it produced smaller, structurally weaker selections.

Why: our "no isolated nodes" acceptance rule (a candidate must have at least one edge into the current selection) breaks CELF's submodular-monotone assumption. A candidate's eligibility flips discontinuously when a node with an edge to it joins T. The lazy cache becomes unreliable.

I wrote the dead end up in eval/ALGORITHM-RESEARCH.md so nobody re-treads it. Honest negative results are worth shipping.

3. Sub-symbol chunking matters more than I expected

Big legacy codebases have huge functions. A 500-line function is one symbol in the graph, and if it gets selected, the whole thing eats your budget. So --chunk splits big functions at statement boundaries — each chunk becomes its own sub-symbol, individually selectable.

On FluentForm: indexing without chunking → 4,333 symbols. With --chunk → 4,878 symbols (+545 chunks). Same budget, much finer-grained selection. The greedy can pick just the relevant if/for/try block instead of all-or-nothing.

4. Test coverage of 88% isn't the whole story

The CI gates on 85% statements / 80% branches / 90% functions / 85% lines. But the genuinely-untestable files — worker scripts, lazy-loaded LSP clients — are excluded from the calc. Honest reporting means saying what is tested, not just the headline number.

The honest tradeoffs

Honest tradeoff	What we do
True optimal min-cut is NP-hard	Greedy submodular — `(1−1/e)` bound
Tree-sitter symbols are syntactic, not type-aware	`--lsp` refines TS/JS via typescript-language-server
Embedding model adds ~22 MB on first run	Opt-in behind `--embed` flag
LSP startup is slow (~1–5s)	Opt-in; cached after init
Cold start parses whole repo	`--cache` (5× speedup) + `--parallel n` (2.7× speedup)

What I'd build next if you asked

The roadmap that's not checked off yet:

Pyright / Intelephense LSP adapters — type-aware calls for Python and PHP (~1–2 days each on the existing LSP infrastructure)
Svelte / Rust / Go parsers — one file each on the parser template
Incremental neighborhood caching in the greedy — keep attach(v, T) cached and update only when a node with an edge to v is added. Expected 3–5× speedup on graphs with bounded degree.

Each is bounded effort and additive. The core is done.

Stop building, start using

The hardest lesson: a tool's value comes from someone actually using it on real work, not from feature count. mincut-context is at v1.7.0 — 261 tests, 88.6% coverage, CI green on Ubuntu + macOS × Node 18/20/22. There's no honest "but it's not ready" excuse left.

If you've watched an AI agent burn 80% of its 200k-token context on imports it doesn't care about, install it now and tell me what breaks:

npm install -g mincut-context

🔗 GitHub: github.com/dhrupo/mincut-context
📦 npm: npmjs.com/package/mincut-context
📊 Reproducible benchmarks: eval/CROSS-REPO-RESULTS.md

I'd love feedback — especially "your numbers don't replicate on my codebase" feedback. That's literally what the eval suite is for.

If you got value from this, ⭐ the repo or drop a comment about a tooling problem you're solving. mincut-context is open-source MIT; the eval suite welcomes new fixtures.