Ramagiri Tharun

Posted on May 28

I Built 4 AI Agents That Debate Startup Ideas — No Scripts, No Safety Rails, Just Disagreement

#ai #machinelearning #opensource #startup

The Problem with AI Demos

Every "multi-agent" demo I have seen online follows the same pattern: two chatbots exchanging perfectly crafted responses. The conversation flows smoothly because someone wrote it beforehand.

That is not multi-agent AI. That is a screenplay with extra steps.

I wanted to build something different. A system where multiple AI agents with distinct personalities debate real technical decisions — and genuinely disagree with each other.

Architecture

The system is called TechieMates. Four agents, each with a distinct role:

Agent	Role	Personality
Tarun	CEO	Speed-to-market, aggressive growth
Vibha	Operations	Process-oriented, risk-aware
Bunny	Research	Data-driven, thorough analysis
Chota	Coder	Technical feasibility, implementation detail

The pipeline works like this:

A topic is selected (startup tech stack, go-to-market strategy, pricing model, etc.)
Each agent receives the topic and the previous agents' responses
Agents can agree, disagree, build on, or completely reject previous points
No agent sees a "correct answer" — they form their own positions

Infrastructure

Model: qwen2.5:3b via Ollama (CPU-only, no GPU)
Bridge: Python HTTP server routing requests to the inference engine
Viewer: Web UI + REST API on port 7800
Storage: conversations.json — persistent, last 100 kept
Cost: Runs on a single VPS. Under $5/month.

What Actually Happens

After 160 conversations, here are the patterns that emerged without being programmed:

1. Coalition Formation

The researcher and coder frequently team up against the CEO when technical risk is high. The ops lead acts as the swing vote on cost-sensitive decisions.

Nobody programmed coalition behavior. It emerged from giving each agent distinct priorities and letting them argue.

2. Mind-Changing

Agents occasionally change their position mid-conversation. The CEO might push for a fast rollout, then soften after the coder breaks down the actual implementation timeline.

This is not role-playing. The model genuinely updates its position based on new information from other agents.

3. Consensus is Rare

Maybe 20% of conversations end with full agreement. The rest produce minority reports — dissenting opinions that surface risks the majority missed.

This is the feature, not a bug. Most AI systems optimize for consensus. This one surfaces conflict.

Performance

Metric	Value
Agents per conversation	4-5
Time per agent	~25 seconds
Total conversation time	~124 seconds
Model size	3B parameters
GPU required	No
Conversations generated	160
Cost per conversation	~$0.003

The Key Insight

The value of multi-agent AI is not in agreement. It is in disagreement.

When a single AI gives you an answer, you get one perspective. When multiple AI agents debate, you get the full landscape: the optimistic take, the pessimistic take, the technical reality check, and the operational constraints.

Most teams build consensus engines. I built a system that surfaces conflict.

Try It Yourself

The system runs on open-source tools:

Ollama for local inference
Python for the bridge and viewer
A $5/month VPS

No API keys needed. No cloud dependency. No rate limits.

What is Next

I am working on:

Adding a "devil's advocate" agent that intentionally takes contrarian positions
Recording agent confidence scores to track how certainty changes through debate
Letting external users submit topics via the web UI

Built by Ramagiri Tharun. Part of the tarunai project.

Top comments (1)

Harjot Singh • May 31

No-scripts-no-safety-rails-just-disagreement is a fun setup, and the disagreement part is the genuinely useful signal: a single agent evaluating a startup idea will rationalize toward a confident verdict, but four agents forced to actually disagree surface the objections a solo pass papers over. Structured conflict is a feature, it's how you get the bear case that a sycophantic single model won't volunteer. The honest caveat though, and the reason this is more demo than decision-tool: LLM debate produces the most articulate argument, not the most correct one, so without grounding the agents will confidently disagree about market sizes and competitors they're hallucinating. Debate sharpens reasoning; it doesn't add facts. The version that would actually change a founder's mind grounds each agent in real signal (search, market data, the actual landscape) so they're arguing over evidence, not vibes. Disagreement-as-a-feature plus grounding-so-it's-not-just-eloquence is the combo. That use-conflict-to-surface-objections-but-ground-the-claims instinct is how I think about multi-agent in Moonshift. Did you ground the agents in any real data, or is the debate purely from their priors right now?