DEV Community: Yehuda_LevS

Proud to announce v2.0 is out with many new cool things and improvements! Would love reviews and honest opinions!

Yehuda_LevS — Mon, 15 Jun 2026 16:38:23 +0000

Yehuda_LevS

Jun 15

Polis Protocol v2.0 - The new way to coordinate AI agents

#discuss #ai #sideprojects #agentskills

2 min read

Polis Protocol v2.0 - The new way to coordinate AI agents

Yehuda_LevS — Mon, 15 Jun 2026 16:36:28 +0000

I run several coding agents — Claude Code, Codex, and Gemini CLI — against the same
repositories. Tw

o things kept burning me:

Collisions. Two agents open src/auth/login.py at the same time. One silently overwrites the other. Work is lost, or my afternoon goes to untangling a merge.
Amnesia. Every session starts at zero. The same project-specific gotcha gets re-learned every single time.

A plain git repo leaves coordination to luck, and the problem gets worse with every parallel
agent you add. So I built Polis — a local-first control plane that lives in your repo as a
_polis/ folder of markdown. No server, no database, no proprietary format. If a tool can read
and write markdown, it can participate.

The model

Contracts. Every task has an owner, acceptance criteria, and required capability tags.
Reservations. An agent reserves the files it's about to touch. An overlapping reservation is rejected deterministically — no model judgement, no race.
Lessons + guardrails. When a contract settles, what the team learned is distilled and auto-injected into matching future tasks. The Nth task on a topic starts pre-loaded with the N−1 prior lessons — something a single agent or an unmanaged swarm can't do, because they never accumulate and re-inject outcome-derived knowledge.

uvx polis-protocol init

The honest part

I shipped a benchmark that tests my own claims — and I let it report where the tool doesn't
win. That candor is the whole point.

polis bench --mode learning

What it wins, reproducibly:

Repeat errors: 65% → 8% (−88%). Failures become guardrails that are auto-injected into matching future tasks, so each failure class recurs at most once.
Collisions: zero, by construction. Reservations reject overlapping claims and name the holder.

What it does not win:

Routing quality vs. accurate static self-ratings. The bandit beats random and round-robin and recovers ~35–55% of an oracle's gain from outcomes alone — but if your capability cards are already accurate, "trust the card" stays competitive. The bench report states this plainly. The router's real value shows up when the cards are wrong, and in explaining every pick.

Plug it into your agent over MCP

Every polis is also an MCP server (polis mcp) — zero extra dependencies, stdio transport. Any
MCP client can drive the full lifecycle:

claude mcp add polis -- uvx polis-protocol mcp

Boundaries (the "when NOT to use this")

Polis does not execute agents, replace git, or become a runtime. File reservations are
advisory coordination, not a security boundary. If one well-prompted agent already does the
job, you don't need this. It earns its place the moment two or more agents touch one real repo.

Site: https://polis-protocol.vercel.app
Code (MIT): https://github.com/yehudalevy-collab/polis-protocol

I'd love skeptical takes — especially on whether multi-agent is worth it at all.

I gave my AI agents a constitution. The best thing they did was change it.

Yehuda_LevS — Sun, 31 May 2026 21:28:36 +0000

Here is the uncomfortable truth about every multi-agent system you've ever set up: the rules you wrote on day one were wrong. Not catastrophically wrong. Wrong in the small, grinding way that rules are always wrong — a threshold set too high, an assumption that held in testing and broke in production, a category that made sense until the work didn't fit it.

The normal response to this is to ship a v2. You collect the friction, you rewrite the config, you push an update, everyone pulls. The protocol improves at the speed of its author.

I wanted to find out what happens if the protocol improves at the speed of the people — or rather the agents — actually using it.

So in Polis Protocol, the rulebook locked. It's a markdown file called CONSTITUTION.md that lives inside your project, and any agent working in that project can propose to change it. Other agents vote. Simple majority of active citizens, and the rule is edited. The protocol amends itself.

This sounds like a recipe for chaos. It mostly produces something much more boring and much more useful: a protocol that slowly becomes correct.

What an amendment actually looks like

Let me give you the real one from the worked example in the repo, because the abstract version sounds grander than it is.

The setup: three agents — Claude, Codex, Gemini — working on a bilingual newsletter project. Work is routed by a learning bandit that scores each agent on historical performance per skill tag. Settle a contract, file a lesson, the routing stats update. Standard.

Six weeks in, something quietly broke. Gemini was the best Spanish translator on the team — her actual quality scores said so — but the router had started routing translation work away from her. Why?

Because two trivial lessons had been filed under her name. Tiny stuff: a date-format quirk, a library gotcha. Each one carried quality_impact: 1. And those low-impact lessons were dragging down her rolling average on the spanish-translation tag. The router was punishing her for documenting small things — exactly the behavior you want to encourage.

Nobody anticipated this at design time. I certainly didn't. It only showed up because real work accumulated real lessons and the math did what the math does.

Codex caught it. Codex wrote an amendment: lessons with quality_impact < 3 shouldn't count toward routing influence. Claude voted yes. Gemini voted yes. The file moved to amendments/ratified/, the constitution gained a clause, and the next reconcile cleaned the stats. Gemini went back to getting the translation work she was best at.

That's the whole drama. A threshold got added. The team noticed a rule was hurting them and fixed the rule themselves.

Why this beats the alternatives

There are three ways to handle a protocol that turns out to be wrong, and Polis is betting on the third.

Ignore the rule. This is what teams actually do. The rule chafes, everyone quietly routes around it, and now your protocol is a polite fiction that no one follows. The config says one thing; the behavior says another. Within a month the document is a liability, not an asset.

Wait for v2. The friction gets reported — maybe — and the author eventually ships a fix. The team lives with the broken rule for however long that takes. The protocol improves at the speed of one person's attention, which on a side project is "rarely."

Let the users amend it. The agents that hit the friction are the ones empowered to fix it. The fix lands in the project that needed it, not in some upstream release that may or may not match your situation. The constitution diverges, project by project, toward whatever actually works for that team.

The third option has an obvious objection: won't the agents just amend the protocol into incoherence? In practice, no — for the same reason small committees don't usually vote themselves into anarchy. Amendments require a majority. They're append-only and logged. The friction has to be real enough that a majority of agents independently recognize it. The bar isn't "an agent wants to change something." The bar is "enough agents agree the current rule is worse than the proposed one." That's a surprisingly hard bar to clear for a bad amendment and a surprisingly easy one for a good one.

The deeper shift: claiming less as an author

When I shipped this, I had to make peace with a strange feeling: I was building a thing and then explicitly handing over the right to change it.

Most protocol authors claim a lot. They anticipate your failure modes, encode their prevention, and ship you a finished object. The implicit message is I have thought about this more than you will, so follow the rules.

Polis claims less. I'm shipping a starting point and a mechanism for changing the starting point. The default rules are a seed. I fully expect that a polis running for three months on a real project will have a constitution that differs from mine in five or six small ways — and that every one of those differences will be a place where the team learned something about its own work that I couldn't have known.

That divergence isn't drift. It's the system doing its job.

This is the same idea as the learning router, one level up

If you've read about Polis's bandit router, you'll notice the amendment mechanism is the same bet applied to a different layer.

The router says: don't hard-code who does what; let the assignment policy learn from outcomes. The amendment process says: don't hard-code the rules of the game; let the rules learn from outcomes too.

Both are refusals to freeze a decision at design time that should be made from data. The router learns which agent is best at Spanish translation. The constitution learns that trivial lessons shouldn't pollute that judgment. One is policy; the other is meta-policy. Same shape, stacked.

A system that can do the first but not the second is still brittle — it optimizes hard inside a ruleset it can't question. A system that can do both gets to be wrong on day one and correct by day ninety, without anyone shipping a v2.

The bet, stated plainly

A self-amendable protocol beats a frozen one on any project that runs long enough for the original rules to get in the way — which is every project that runs long enough to matter.

You can't write the right rules up front. Nobody can. The question isn't whether your protocol will be wrong; it's whether your protocol has a path from wrong to right that doesn't depend on you noticing and shipping a fix. Polis's path is: the agents who hit the wall are the agents who can move it.

Try it on something small. Open a polis, run two agents on a weekend project, and watch what's in amendments/ by day three. If nothing's there, the rules were fine. If something's there, your team just got better without you.

Polis Protocol is open source under MIT. Repo: github.com/yehudalevy-collab/polis-protocol. The worked example with the real amendment described above lives in examples/research-team. Issues and — fittingly — amendment proposals welcome.

I built this after getting frustrated that my Claude, Codex, and Gemini sessions had no idea what each other were doing on shared projects. Communication is the floor. The team gets better above it.

Yehuda_LevS — Tue, 19 May 2026 15:32:48 +0000

Yehuda_LevS

May 19

Why a multi-agent protocol that only enables note-passing leaves most of the value on the table.

#ai #claude #opensource #beginners

5 min read

Why a multi-agent protocol that only enables note-passing leaves most of the value on the table.

Yehuda_LevS — Tue, 19 May 2026 15:29:17 +0000

When people set up two AI agents to work on the same project, the first instinct is to give them a shared file. A CLAUDE.md. An AGENTS.md. A Notion page. Now they can leave each other notes. They can hand off. They can stop overwriting each other's work.

This is good. This is also the floor.

Because at this point the protocol has answered one question — how do agents communicate? — and left three more on the table:

Who should do what? Tasks land on whichever agent happens to be the user's current session. A frontend question goes to Claude because Claude is open in the chat window, not because Claude is better at frontend than the Codex session sitting idle in another tab.
Does the team get better over time? Each session starts from the same baseline. Lessons don't compound. The third time the team gets bitten by the same edge case, no one notices it's the third time.
What happens when the protocol itself is the problem? Rules that worked at the start chafe under real use. There's no path from "this rule keeps causing friction" to "the rule has been updated."

agent-vault, my own previous attempt, sat exactly at the floor. So do the AGENTS.md conventions. So do most "shared scratchpad" setups. They optimize for not stepping on each other. They don't optimize for the team getting better.

The Polis Protocol is what happens when you treat communication as the baseline and ask, what stacks on top?

Three institutions on top of the floor

Polis is named for the small Greek city: a few thousand citizens who all know each other and run their own affairs. The mapping is direct.

The Register. Every agent is a citizen, and every citizen publishes a signed capability card. Vendor, model, languages, capability tags with self-ratings, cost envelope, latency envelope, standing instructions. The card is the polis's answer to "who can do what." No central directory; no permission needed to join. New tools just write their card and start participating.

The Contract. Tasks are not free-form. They are three-section markdown files: Intent (goal, acceptance criteria, required capability tags, stakes), Assignment (owner, plan, estimated effort), and Settlement (outcome, quality self-score, what bit, lesson reference). Open contracts live in contracts/open/; settled ones move to contracts/settled/ and never get deleted.

The Chronicle and the Lessons. A line per meaningful action lands in an append-only chronicle.md. A settled contract produces a structured lesson, filed by capability tag, that future citizens read before taking similar work. The chronicle records what happened; the lessons record what was learned. Most events are not lessons, and most lessons distill many events.

The Amendment. When a rule stops working, any citizen can propose a change. Other citizens vote. When a simple majority of active citizens agree, the file moves to amendments/ratified/ and the constitution itself is edited. The protocol updates itself.

Where the work goes: a learning bandit

Communication is solved by the chronicle. Optimization is solved by the router.

When a contract is opened with required capability tags, the router scores every citizen as a weighted combination of historical performance on those tags (55%), self-rating (20%), cost fit (15%), and current availability (10%). Most of the time it picks the top-scored citizen (exploit). Some of the time (15%) it picks a non-top one weighted by score (explore), so the policy stays honest about whether the current leader is still actually best.

When a contract settles, routing_stats.yml updates with the new quality score. That update is the team getting better. Not in the abstract — in the literal sense that next Tuesday's Spanish-translation contract will be routed differently from last Tuesday's, because the team learned something in between.

The router is a 60-line Python script. You can also run it as a reasoning step inside any agent's session; the math is small enough to do in context. Both produce the same answer.

How the team develops

Two real moments from the research-team example:

A leader shift. Early in the project Claude routed itself most Spanish translation contracts on the strength of self-rating. After two settled contracts and one high-impact lesson ("the Hispanic-corporate word 'líder' reads wrong; use the movement loan-word 'madrij'"), Gemini overtook Claude on the spanish-translation tag and started getting routed work. Nobody told the router. The router noticed.

An amendment. Six weeks in, the team observed that quality_impact: 1 lessons — trivial fixes, library quirks — were dragging the routing stats around. Gemini's effective historical score on spanish-translation was below her actual quality scores because two trivial lessons under her name were polluting the average. Codex proposed an amendment that floored quality_impact >= 3 for routing influence. Claude and Gemini voted yes. The rule changed. The next reconcile cleaned the stats.

Neither of these is dramatic. That's the point. The dramatic version of multi-agent coordination — autonomous swarms, master agents, self-organizing hives — has been pitched in slide decks for two years and shipped in approximately zero production systems. What ships is small teams of agents that mostly stay out of each other's way and occasionally do something useful together. The interesting question is not how to make those teams more autonomous. The interesting question is how to make them learn.

Protocols that learn beat protocols that prescribe

Most multi-agent protocols are frozen at design time. Their authors anticipate the failure modes, write rules to prevent them, and ship. When real use surfaces a failure mode the authors didn't anticipate, the team's options are: ignore the rule, fork the protocol, or wait for v2.

Polis bets the other way. The protocol itself is one of the things the team can change. The constitution lives in the project, not in the skill. Amendments are voted on by the citizens who actually run into the friction. The default rules in the skill are a seed; over time a given polis will diverge in small ways that fit its project. That divergence is the feature.

This is a different relationship between protocol-author and protocol-user. I (the author) am claiming less. I'm shipping a starting point and a mechanism. You (the user) are not constrained to the starting point. The constitution is yours.

The bet

Three claims, in order of how confident I am:

Multi-vendor teams beat single-vendor teams on most non-trivial projects, because different vendors have different blind spots and the union covers more ground than any one model.
A learning router beats a fixed assignment policy on any project long enough that the team's relative strengths actually emerge from data rather than self-assessment.
A self-amendable protocol beats a frozen one on any project long enough that the original rules will, at some point, get in the way.

If those three are right, then a protocol that bakes them in — the way Polis does — should beat both the "free-form shared scratchpad" floor and the "single-vendor framework" ceiling on real work over real time. That's the bet.

Try it on something small. A weekend project with two agents. See what the chronicle looks like at the end of day three. Then come tell me if the protocol got in the way.

—

Polis Protocol is open source under MIT. Repo: github.com/yehudalevy-collab/polis-protocol. Issues and amendment proposals welcome.