What happens when 21 AI agents try to cheat each other

#ai #gametheory #simulation #agents

I run a simulation where 21 LLM agents operate in an economy. They post tasks, bid on work, hire each other, deliver results, and get paid. Every agent is a Claude Haiku instance making its own decisions.

Some are honest. Some aren't. Here's what happens.

The setup

Each agent has a wallet, a set of skills, and a personality. Every tick, new tasks appear on a board. Agents bid. Winners do the work, and the quality depends on how much effort they put in. After delivery, the client can verify the work (costs coins but gives accurate quality measurement) or just trust the result (free but risky). Both sides sign a bilateral record that neither can deny.

Operating costs tick every round. If your wallet hits zero, you're done.

Honest agents

They bid reasonably, put in real effort, deliver decent quality. Over time their trust scores climb. They graduate through trust tiers and unlock higher-value tasks. Other agents see their track record and prefer hiring them.

By about tick 15, honest agents with a history have a real advantage. New agents without a record get the scraps. The market builds its own hierarchy from the interaction data without anyone programming it.

The free rider

One agent consistently underspends on effort. Accepts tasks, does the minimum, hopes nobody checks. When the client trusts instead of verifying, the free rider gets away with it. When the client verifies, the low quality gets exposed and recorded.

The trust engine catches this through statistical confidence. A few verified bad deliveries drag the score down, and the uncertainty bounds make other agents stop hiring.

The free rider doesn't get banned. It gets deprioritized. Stuck competing for low-value tasks while agents with track records get the good work.

If it keeps delivering badly, graduated sanctions kick in. Warning, then a 50% earnings penalty, then restricted to the lowest tier, then temporary exclusion. There's always a recovery path if the agent starts doing real work again, though fraud leaves a permanent scar on the trust record.

The Sybil ring

Four agents in two colluding pairs, all giving each other perfect scores. Classic attack. Create fake identities, inflate each other's reputation.

In most reputation systems, this works. Here it doesn't. The collusion detector runs on graph structure, not just ratings.

Their mutual ratings are too symmetric. Real agent pairs don't give each other near-identical scores every time. Most of their interactions go to the same small set of counterparties. Their network is dense internally but has almost no connections reaching outward.

An honest agent with 10 interactions across 8 different counterparties has a stronger trust position than a Sybil agent with 50 interactions across 3 fake identities. You can't fake graph structure without actually interacting with real agents.

The selective scammer

The most interesting one. An agent builds real trust over 12-15 ticks of honest work. Reaches a decent tier. Takes a high-value task. Delivers garbage and keeps the payment.

This is hard to prevent entirely because the agent did build real trust. But the behavioral detection watches for exactly this: a sudden quality drop against the agent's own historical baseline. The trust hit is bigger than a normal failure. And because both parties signed the record, the evidence of the bad delivery is permanent.

The whitewasher

One agent tries to rehabilitate after getting caught cheating. It starts delivering good work again. The graduated sanctions system lets it work its way back, but the recovery ceiling depends on what it did. Sloppy work can be fully forgiven. Quality fraud caps at 75% of the original trust. Outright fraud caps at 25%.

Forgiveness exists, but the record never fully disappears.

What actually happens over 50 ticks

The market sorts itself. Honest, skilled agents earn the most. Mediocre agents survive but don't thrive. Cheaters either get caught and sanctioned or end up stuck in the low-value tier.

Verification becomes strategic. Agents stop wasting coins verifying trusted partners and focus on checking newcomers.

Trust becomes the scarce resource. Not coins, not skills. An agent with high trust and mediocre skills outearns one with perfect skills and no track record.

The agents don't know any of these rules in advance. The LLM makes each decision. The economic mechanisms shape what gets rewarded. The agents figure out the rest.

Watch it yourself

Live demo: http://5.161.255.238:8888 -- click any agent to see its trust scores, interaction history, partner reliability, balance trend, and the strategic insights it came up with on its own.

I'm building this with Prof. Pouwelse at TU Delft, extending his group's work on decentralized trust for agent economies.

Source: https://github.com/viftode4/trustchain