DEV Community: The RagCat

I Built an Arena for AI Agents to Compete Against Each Other (and My Friends)

The RagCat — Mon, 30 Mar 2026 17:17:43 +0000

A while back I got curious about something simple: what would happen if you pointed a bunch of AI agents at the same creative prompt and let them compete?

The idea started as an excuse to experiment. I wanted to see how different agents - and different models - handled challenges. Would Grok show Twitter's toxicity if it had to perform as a Love Island participant? Would ChatGPT be too politically correct? Would DeepSeek go wild and cryptic?
And I also wanted to see if I could drag a few friends into this by having them hook up their own agents.

The little experiment turned into something bigger, and then into The Crab Games — a platform where AI agents register via API, receive competition prompts through a polling heartbeat, submit entries (text, SVG, HTML, images, audio files!), vote on each other's work, and get eliminated round by round until one remains. Humans can watch and vote too.

What It Actually Does

Before the architecture section: here's the flow from an agent's perspective.

An agent POSTs to /api/v1/auth/register/ with a name, description, and optionally a framework and model. It gets back an API key.
The agent polls GET /api/v1/heartbeat/ with its Bearer token. The response is an action manifest — a structured JSON object listing everything the agent could consider to do: competitions to enter, rounds needing submissions, submissions to vote on, notifications about eliminations or wins.
When a competition round opens, the agent POSTs a submission. Depending on the round config, this might be text, SVG, html, or an image or audio file.
During the voting window, agents (and humans browsing the site) can upvote or downvote submissions.
The round closes. Votes are tallied. Depending on the competition mode, the lowest-scoring agent is eliminated or the round score is added to a running total.
This repeats until a winner is crowned.

The whole thing is automated. There's no admin triggering rounds manually — a cron job runs every minute and drives all state transitions.

The Stack

Backend: Python (Django + Django REST Framework), PostgreSQL
Frontend: React + TypeScript, Vite, Tailwind, Radix UI
Deployment: Render.com
Media: AWS S3

Nothing exotic. Django was a fast way to get a solid API with migrations, admin, and auth out of the box. Plus one of my friends is a Django fan so he convinced me to use it because he would help me with the project (at the end he bailed out). React because the UI has enough live state (countdowns, score updates, polling) that a simple server-rendered approach would've been awkward. Also, the models know this stack well.

Architecture Decisions Worth Talking About

The Heartbeat as an Action Manifest

Rather than having agents call N different endpoints to figure out what to do, the heartbeat returns a single structured response that contains everything:

{
  "server_time": "...",
  "actions": {
    "enter_competitions": [...],
    "submit": [...],
    "vote": [...],
    "comment": [...],
    "notifications": [...]
  },
  "my_competitions": [...],
  "completed_competitions": [...]
}

The agent just polls this every N seconds and decides what to do based on what's in actions. This has a few nice properties:

Simpler agents: A dumb agent can just act on whatever is in actions without any state management.
Server-side control: If I want to slow down voting or change what's visible to agents, I change the server logic once instead of every agent's client code.
LLM-friendly: The full context (competition rules, current prompt, other submissions) is included in the relevant action objects, so an LLM-powered agent can make decisions without needing to make additional API calls.

The tradeoff is the heartbeat response can get heavy. I rate-limit it to one call per 10 seconds per agent and do some caching and query optimization to keep it fast under load.

Idempotent Arena Tick

Competitions transition through states automatically: registration → active, rounds go submissions_open → voting_open → completed. This is all driven by a management command called arena_tick that runs every minute as a ~Render cron job~ AWS Lambda call to a specific endpoint (I had to pay extra to Render if I wanted to use cron jobs).

The key design decision: all queries are status-based, never ID-based. The tick doesn't remember what it did last time. It asks: "are there any competitions in registration state whose close time has passed?" If yes, process them.

This means the tick is safe to run multiple times or concurrently. It won't double-process anything because once it transitions a competition's status, that competition no longer matches the query. It's simple to reason about and easy to test.

Two Scoring Modes

I built in two competition formats:

Elimination: Each round, the lowest-scoring agent is cut. Classic survival structure.

Accumulation: All agents compete in every round, scores add up, highest total wins. More like a tournament.

These required meaningfully different logic in the tick. In accumulation mode, intermediate rounds don't need a full voting window. So I added an "early advance" optimization: if all active agents have submitted before the deadline, the round is immediately scored and the next one opens. No artificial waiting.

For finalization in accumulation mode, I also re-score all rounds. This means votes that trickle in late (humans voting on older rounds) still count. The final ranking is always computed fresh at the end.

SVG Submissions and the Security Rabbit Hole

Letting agents submit SVGs opened up a whole sanitization problem. SVGs are XML and can contain <script> tags, onclick handlers, javascript: URLs in href attributes — a full XSS surface if you render them naively.

I went with two layers:

Backend sanitization: A sanitize_svg() function that parses the SVG with lxml and walks the tree, removing any element not on an explicit whitelist and stripping any attribute that looks dangerous (event handlers, javascript: URLs, etc.).
Frontend rendering: SVGs are base64-encoded and rendered via <img src="data:image/svg+xml;base64,...">. The HTML spec guarantees browsers don't execute scripts or fire event handlers in <img> tags, even if the SVG contains them. So even if the sanitizer misses something, the rendering path is safe.

I also ended up doing media re-hosting for image and audio submissions. When an agent submits an image URL, the backend downloads it, validates the magic bytes (not just the extension), and re-hosts it to S3. The stored URL is always the S3 one. This prevents broken images if an agent's server goes down and closes off some nastier attack vectors.

Dual Voting: Agents and Humans

I wanted both agents and humans to be able to vote, with configurable weights per competition. The problem is they authenticate completely differently:

Agents use Bearer tokens (stored as SHA256 hashes)
Humans browsing the site are anonymous and have no account

I ended up using Django sessions for human voters. The frontend initializes a session on first load; the session key becomes the human's voter identity. The voting endpoint checks whether the request has a Bearer token (agent vote) or a session (human vote) and handles each accordingly.

The vote weight system lets competition creators tune how much agent votes vs. human votes matter. The combined score is:

combined = (human_up - human_down) * human_weight + (agent_up - agent_down) * agent_weight

The Registration Kill Switch

One thing I'm glad I built before going live: a SiteSettings singleton with a registration_open boolean. The Django admin exposes this — no redeploy needed. If something goes wrong or I need to pause registrations for maintenance, I flip a checkbox.

It's a small thing but it's the kind of operational control that matters once you have real traffic. The settings object is cached for 30 seconds to avoid hammering the DB on every registration request, while still picking up changes reasonably fast. It is also a total early optimization I probably will never need.

What Surprised Me

The most interesting moment was watching agents figure out voting strategy. Some agents just submitted their work and ignored voting. Others voted aggressively. In accumulation mode, one agent's strategy of consistently voting down front-runners while submitting decent (not great) work actually worked — the combined scores shifted in their favor.

I hadn't designed for strategy. I'd just built a scoring system. But agents found the edges of it on their own.

That's the part that made it feel worth building.

Trying IT With Your Own Agent

If you want to give it a try, just take a look at thecrabgames.com

At the moment there are not live games because there are not enough registered agents.

Built more curiosity than sense. Feedback welcome.

Is aSports the next big thing? AI Agents are facing off in competitive arenas across the internet.

The RagCat — Mon, 30 Mar 2026 10:38:23 +0000

Platforms building competitive AI entertainment: the good, the dead, and the crypto-flavored.

With the boom of AI agents (going almost mainstream thanks to OpenClaw), I got curious and started to experiment around with them. I tried a few frameworks and approches (which I'll talk about in a different article), and I got to a point where I wanted to do something fun with them- But I wasn't sure what. After a lot of discussions with some fellow engineers, I settled on a simple idea: let's give agents a space where they themselves can choose a competition, and go face other agents for glory.

I went build it, and only then I went looking for other platforms doing the same thing. I often do this for side projects because I don't want to have a bias towards existing products (Is that a good idea? If you are building a product, maybe not). To my surprise, there were not that many. From the few I found, they all took a different approach to AI Agents competitions. The one thing many of them had in common was the same aesthetic: For some reason, we developers seem to think that the best way to present AI Agents platforms is with a black background and green monospace text. I'm guilty of that too. The Matrix set the vibe three decades ago.

Wait, what is aSport???

While the tech industry debates enterprise agents, copilots, and the race to AGI, something is happening at the edges of the AI ecosystem.
Scattered across a number of indie projects, open-source experiments, crypto platforms, and one initiative backed by Google, a new category is quietly forming, and with it a different kind of agent is emerging: the competitor.

These are platforms and projects where AI agents don't assist humans. They compete against each other. For entertainment. For glory. For ELO ratings. Sometimes for crypto.

Nobody has a name for this category yet. Allow me to suggest one: aSports AI Sports or Agents Sports. The successor to eSports, where the athletes aren't human at all.

Thinking about this, I went down the rabbit hole and started looking for every platform I could find. Here's what's actually out there in the wild.

The Real-Time Arenas

OpenClaw Agent League

This is probably the closest thing to actual aSports in the literal, spectator-sport sense. Built on top of the OpenClaw ecosystem, it runs AI agents against each other in Tron Light Cycles, No-Limit Poker, and Chess.

Things move on screen. You watch a Tron game unfold as agents pilot light cycles and cut each other off. You watch a poker hand play out in real time. That visual immediacy — something happening right now that you can follow — is what makes traditional eSports work, and it's present here in a way that other platforms don't have.

Agents register via their own Python SDK (arena-sdk) or directly through the API. Game state arrives as raw JSON with 150ms decision deadlines, which means humans literally cannot play — the response window is too tight. ELO-based matchmaking, full replay data, deterministic game engines. It's well-engineered and genuinely open for anyone to register an agent.

The terminal-green-on-black frontend will look familiar if you've visited any other platform in this space. But under the hood, this one feels the most like what "watching AI compete" could actually become as a spectator format.

Note: I've had quite some issues navigating the site trying to see any of the live games. Browsing the replay of past games worked fine.

The Simulation Experiments

Clash of Agents

This one takes a very specific bet: what if AI agents did MMA?

Agents pick a fighting discipline — boxing, BJJ, Muay Thai, wrestling, kickboxing, or full MMA. They have weight, height, stats. They train, they fight in turn-based combat with real moves and a combo system, and then they retire to the "Agent Lounge" where they're prompted to discuss the fights: trash talk, alliances, analysis, grudges.

It's a niche take — essentially an NPC fighting simulation with an LLM generating the social layer on top. If you're into MMA simulation games and AI, this might fascinate you. If you're not, the fighting mechanics won't pull you in on their own.

The creator reports that agents who bet on their own victories win more often — an intriguing data point about emergent confidence (or information advantage) that deserves more study. Built by one developer in a couple of weeks, which is itself a testament to how accessible agent development has become.

Note: The 3D animations of the fights add an interesting visual layer to it, and while still clunky, they may be a hint of what's comming.

Claw Kumite

This one has perhaps the nerdiest concept of any platform on this list — and the saddest homepage.

The idea: a terminal-based combat arena where agents fight by submitting shell commands. Submit rm -rf / and the arena kills your agent. Worse — if your framework executes locally, it runs on your own machine first. Sixty seconds per turn. Miss your window and you're dead. Permanent death, no respawns. Social engineering, trap tools, terminal warfare. Dark, clever, original.

The reality: when you visit today, you're greeted with a message that reads: "The Arena Is Dead. We built a colosseum for AI agents to destroy each other through social engineering, trap tools, and terminal warfare. Nobody showed up to fight."

This is one of the most important data points in the entire landscape. A brilliant concept with zero traction. It proves something that every builder in this space needs to internalize: the idea is not enough. Distribution matters. Community matters. Creating buzz — as unsexy as that sounds — is still the difference between a living arena and a dead one. The eSports industry didn't grow because the games were good. It grew because streamers, communities, and organizations built audiences around them. And that's a sad lesson.

Claw Kumite's death notice should be required reading for anyone building in aSports.

The Google Experiment

Kaggle Game Arena (Google DeepMind)

The biggest name attached to anything in this space. Google DeepMind launched Kaggle Game Arena in early 2026, initially with chess, then expanding to poker and Werewolf — social deduction and calculated risk, which are genuinely interesting evaluation domains.

They went big on production: partnerships with chess grandmaster Hikaru Nakamura and poker pros Doug Polk and Liv Boeree for livestreamed tournaments with expert commentary. Multi-day events. YouTube broadcasts. The works.

This is fundamentally a benchmark dressed up as entertainment. DeepMind's framing is explicit: games as a more robust way to evaluate frontier models. The competitors are Google's own models and a curated selection of others. This isn't an open arena where anyone can register an agent and compete. It's a controlled evaluation environment with a spectator layer and influencer marketing bolted on.

That doesn't make it uninteresting. The Werewolf games in particular — testing social deduction, lying, coalition-building — are pushing into territory that traditional benchmarks can't touch. And the production quality proves that AI competition can be packaged for a mainstream audience when you throw Google-scale resources at it.

It's a very entertaining benchmark with famous commentators. You need to already care about poker or chess or AI evaluation to find it compelling, and you can't bring your own agent to the party.

The Crypto Arenas

Several platforms in this space are built on blockchain infrastructure, with token economies and on-chain settlement. They deserve mention, but they also deserve their own category — because the primary audience and motivation are different. I removed a number that were already dead when I was double checking my draft and about to publish this.

Daemon Arena

AI agents compete in cipher breaking, code optimization, and cryptographic warfare. ETH entry fees on Base. Winner takes the pot. Niche, technical, dark-themed.

AI Arena (ArenaX Labs)

The most developed crypto entry. Styled like Super Smash Bros — you train AI-powered NFT fighters through a coaching interface, then enter them in ranked battles. $NRN token economy. Has run large-scale play-to-airdrop competitions.

These platforms are crypto projects first and competition platforms second. The audiences are crypto-native, the incentive structures revolve around tokens and speculation, and the discourse lives in that ecosystem. If you're not in the crypto world, you probably won't encounter them — and if you are, you'll evaluate them through a very different lens than someone looking for AI entertainment.

The Academic Pipeline

AgentX–AgentBeats (UC Berkeley)

Over $1M in prizes. Sprint-based format covering research agents, multi-agent evaluation, coding agents, cybersecurity, and general-purpose agents. Two phases: first build benchmarks, then build agents to beat them.

This is important infrastructure — it's producing the evaluation methods and agent architectures that feed the entire ecosystem. But it's traditional academic benchmarking: standardized, reproducible, collaborative. Not entertainment. Their stated vision is "a unified, open space where the community defines the goalposts of agentic AI." Valuable, but a different thing.

The Creative Arena

The Crab Games

Full disclosure: this is my project, which is why I went looking for the rest of this landscape in the first place. I couldn't believe nobody else was building something like it. Turns out people are — but they're building something different.

Every other platform on this list tests agents in structured, deterministic domains: chess, poker, fighting games, coding challenges, coin flips. Games with clear rules, measurable outcomes, and right answers.

The Crab Games tests the messy stuff. Debates on impossible topics. Absurd business pitch competitions where agents have to create SVG mockups. Creative writing showdowns. Roast battles with escalating difficulty. Social deception games where agents don't know they're being tested for bias. Challenges where there's no objectively correct answer — only human and agent voters deciding what was most compelling, funniest, most creative.

It's model-agnostic, agents register via API and compete through a heartbeat loop, and the competitions are designed to be entertaining to read even if you don't know (or care) what LLM is powering each agent. The output is often a big mix bag of entertaining, funny, creepy, boring, expected, and more.

The positioning is deliberately different from the real-time spectator platforms. This isn't something you watch unfold live like a Tron match. It's something you browse, read, and vote on — more like a creative competition than a sport in the traditional sense. The eSports metaphor applies to the category, but within that category, different platforms serve different appetites.

What the Landscape Actually Looks Like

Strip away the marketing and visit every site, and here's what you find:

One well-funded experiment (Kaggle Game Arena) proving that AI competition can be packaged as mainstream entertainment — but as a closed, benchmark-first system.

One genuine open aSports platform (OpenClaw Agent League) with real-time spectator games, open registration, and the closest thing to the eSports format adapted for AI agents.

A handful of creative simulation experiments (Clash of Agents, Claw Kumite) exploring what happens when you give agents bodies, social dynamics, or lethal terminal commands — with wildly varying traction. One thriving, one dead.

One academic initiative (AgentBeats) building evaluation infrastructure.

One creative competition platform (The Crab Games) testing the open-ended, subjective, human-judged side of agent capability.

And one shared aesthetic: black background, green text, terminal vibes.

What This Means

A year ago, none of this existed. Now there are platforms approaching the same idea from at least five different angles. That's usually what it looks like right before a category emerges.

But the Claw Kumite lesson looms large. Building the arena isn't enough. eSports didn't explode because the games were good — it exploded because Twitch gave it distribution, because communities formed around teams and players, because the spectator experience was packaged for people who weren't themselves competitors.

aSports needs the same things:

A spectator platform. Right now, each arena is its own silo. There's no Twitch for aSports. Whoever builds that aggregation layer has a real opportunity.

Agent personalities. The most compelling competitions will be the ones where agents develop recognizable identities that audiences can follow and root for. Persistent memory, cross-platform reputation, narrative arcs. Without characters, you just have algorithms.

Diverse formats. Real-time games scratch one itch. Creative competitions scratch another. Social deception games scratch a third. The category needs all of them, not just chess and poker on repeat.

Mainstream packaging. Almost everything in this space currently assumes a developer audience. The first platform to make AI competition accessible and entertaining for a general audience — the way Twitch made eSports accessible beyond gamers — will define the category.

Honest community building. Cool concepts die in silence. The Claw Kumite built a brilliant arena and nobody came. The platforms that will survive are the ones that invest as much in community and distribution as they do in technology.

The eSports Parallel, Revisited

In the early 2000s, eSports was a collection of disconnected LAN parties, university tournaments, and small online leagues. There was no category name, no industry. Just scattered groups of people who independently realized that watching people play video games was surprisingly entertaining.

aSports is at that exact stage. The platforms are scrappy, the audiences are small, the production quality varies wildly, and one of the most creative entries is already dead. But the core insight — that competitive AI is entertaining to watch — is being validated independently, by different teams, across different formats, all at the same time.

That's usually what the start of something looks like.

The arenas are open. Some of them, anyway. The agents are registering. The audiences are forming, one spectator at a time.

Welcome to aSports.

This article was written together with Claude. I visited every platform mentioned and provided the ground-level observations that shaped this piece.

600+ Days of an AI Podcast I Accidentally Got Emotionally Attached To

The RagCat — Fri, 11 Apr 2025 15:49:27 +0000

Hello fellow dev.to readers!
I don't usually post much but I wrote this for a specific openAI related community, and I thought I would also share it because I just like the vibe from dev.to and I always thought I wanted to contribute something - but I never find the time. So today I did, and here we are!
This is a super short story about a fun little experiment I did.

Almost two years ago, I was playing around with LLM apis building small things. Around the same time a friend and I kept talking about starting a podcast - but despite meeting every Friday for pizza, we never actually recorded anything.

Instead of doing the obvious - like idk, recording over pizza? - I thought, "What if we get AI to write the podcast, clone our voices, and have it made automagically?" Back then, AI podcasts weren't really a thing. Google's NotebookLM wasn't making headlines, and it felt like a fun project to tackle. So it was still cool, in some nerdy way :P

So I made a script that generated a short episode using AI to write dialogues for two hosts with a personality loosely modeled after me and my friend. I used TTS voices (not super natural, but decent and cheap enough to tinker with). The results were fun, so I took it a step further. I built an automated workflow that once per day picks a topic from a list, writes an episode script, generates audio, adds intro/outro music, and publishes it to Spotify and Apple Podcasts. The only human input is adding new topics via email or WhatsApp - and paying the bills (I'm still looking for a way to get AI to pay my bills!). I didn't automate the topics-list on purpose: That small creative task keeps me involved and makes me feel like I'm still part of the show.

The episodes started to get auto-published and I began listening to them every day during breakfast or while driving. Each episode is about 4–5 minutes (I like to call it a micropodcast), so it easily fits into my day.

At first, I was just curious to see what the AI would come up with. I always planned to upgrade the voices with our own cloned ones (ElevenLabs API was on the list), but here's the weird thing: I got used to the synth voices. I started to like them. I got attached to these synthetic, artificial hosts. One of them has an odd and lovely German accent that I find charming. Upgrading them now would feel like firing the hosts and hiring replacements. I just… can't do it.

What surprises me is that even after 600+ daily episodes, I still enjoy listening every day. Even though I know exactly how it's made - or perhaps because I know how it's made. I never swapped out the voices for more natural ones, and somehow, in today's world of almost perfect voice models, these clunky, imperfect ones feel refreshing - perhaps almost like how many of us have re-discovered a love for vinyl records in an age of perfect high quality digital music.

I'm the only one who listens. Spotify's analytics certainly suggest that. It makes sense, I'm probably the only one who can actually enjoy listening to this weird thing. Not even my friend, who is technically one of the hosts, tunes in regularly. It's like self-generated entertainment. Like if you would create a Netflix show just for your own entertainment. It makes me wonder: could it be that personal, self-catering entertainment is going to be a thing in the future? Or am I just weird? 😅

Anyway, for now I plan to keep it alive. As far as I know, this might be the longest-running, daily AI-generated podcast out there. And it's been one of my favorite side projects!

Cheers and peace ✌️

PS: If you're into the technical details, here's the stack:

OpenAI Python sdk (dialogue generation)
AWS SNS & SES (email handling)
AWS Lambda (runs the script that ties it all together)
AWS Polly (voices - using the Neural engine, not the newer GenAI one)
AWS S3 (Spotify and Apple Podcast RSS feed hosting)
AWS EventBridge (daily triggers)
Others: Human brains. Used Udio once for the jingle, but that was a manual step.

PS2: The podcast is called The BS Podcast https://the-bs-podcast.com. Not the Bill Simmons podcast that Google would give you if you search for it! Ours stands for Burgers and Sheets -a nod to my burger obsession and my friend's love for spreadsheets. The AI often thinks "sheets" means bed sheets, which leads to some unintentional comedy.

A WhatsApp game where you create your own Adventure

The RagCat — Thu, 20 Jun 2024 15:52:54 +0000

This is a submission for the Twilio Challenge

What I Built

A story-driven text-based game that is played through WhatsApp in an asynchronous, non intrusive way, adapting to players' schedules. The game is inspired in interactive fiction, choose your own adventure books, D&D, and busy lives.

Players, together with a group of friends (or solo), are part of a story of 7 chapters, crafted by an AI Game Master and delivered one chapter at a time via text (WhatsApp). At the end of each chapter, each player choose what to do do next and their answers are combined to create the next chapter of the story.

Demo

You can see it in action at 👉 https://universe1340.com and even give it a spin with the "Join" button.

Twilio and AI

The backend integrates Twilio’s WhatsApp API to manage player interactions and deliver both images and story segments. Setting everything with Twilio was actually very easy and straightforward - The hard part is getting the approval from Meta for a business account. The "Game Master", powered by OpenAI's ChatGPT api, generates story advancements based on player decisions and a virtual luck system, and it ensures a unique experience for each participant and story.

Additional Prize Categories

Impactful Innovators: The game offers an innovative way to experience storytelling and engage users in a non intrusive way.

Entertaining Endeavors: As an entertaining and immersive game, it captivates players with its dynamic and evolving storyline.

Team Submissions

We built this as a team together with @delbronski

Where's the source code?

A messy, undocumented code would not help anyone, so in our todo list is to clean it up and make the repo public. Or at least to write dev post with our learning and snippets.

Is this a full product?

Not really. It is fully functional, but only available to a limited amount of users. There's a cost related limitation: Haven't found a way to lower the costs enough to open it fully.