What Six Arguing AI Agents Taught Me About Building One That Actually Works

#alibabachallenge #ai #linux #hackathon

I broke my own project on purpose, twice, before it worked. Here's the
story.

Round one: the debate club

My first idea for this hackathon sounded great in my head. Six AI agents,
each with a "role" — security, architecture, performance, whatever — and
they'd debate each other across multiple rounds before agreeing on a final
answer. Like a mini panel of experts arguing it out.

I built it. I ran it against some vulnerable test code. It came back with
127 findings.

I got excited for about four minutes. Then I actually read them.

Maybe three were real. The other 124 were the agents politely agreeing with
each other about problems that didn't exist, or restating the same bug five
different ways because five different agents happened to notice it.
Precision was somewhere around 2%. Worse than a single model working alone.

That stung a little, not going to lie. I'd spent days on the debate logic.

Round two: quieter, and better

So I ripped it apart. No more debate rounds. No more six agents shouting
over each other. I went down to four, gave each one exactly one job, and —
this is the part that actually fixed things — made them depend on each
other in order instead of all firing at once.

One agent maps out the code first. Two others use that map to look at
security and quality separately. A last one compares what they found,
throws out duplicates, and — importantly — actually checks the line numbers
against the real file instead of trusting the AI's word for it.

Same test file. This time: real vulnerabilities, correctly flagged, nothing
made up. Point it at clean code afterward and it correctly said nothing was
wrong, which honestly felt like a bigger win than finding the bugs did.

The annoying lesson

I wanted this project to feel impressive. More agents, more debate, more
"look how sophisticated this is." What actually worked was the boring
answer: fewer agents, clear roles, one checking the other's work instead of
everyone talking at once.

I named the final version Synod, after the idea of a council that actually
deliberates and reaches a verdict, instead of a crowd that just makes noise.

The version that's on GitHub today is the second architecture, not the
first. It's running on Alibaba Cloud, powered by Qwen, and includes a CLI so
I can review code, chat about a project, or scan an entire repo right from
the terminal. If you want to poke at it or roast my code, it's open source:
github.com/02NIN20/Synod

Built for the Global AI Hackathon Series with Qwen Cloud — Track 3: Agent
Society.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.