The Problem
Every founder has been there: you spend 3 months building something, launch it, and... crickets. Nobody wants it.
User interviews cost time. Landing page A/B tests cost money. Survey panels cost both.
What if you could get brutally honest feedback from 1,000 potential customers in 2 minutes?
What I Built
Sybil Swarm is an open-source swarm intelligence engine. You feed it your product URL or description, and it:
- Spawns 1,000 AI agents — each with a unique persona (age, job, income, personality, interests)
- Has them evaluate your product as real potential customers
- Generates a market prediction report with conversion rate, objections, and recommendations
The name comes from the Sibyls — prophetic oracles of ancient Greece.
Demo
The Dashboard
The simulation runs in real-time with:
- Canvas world — agents move around, cluster by sentiment (buyers → right, rejectors → left)
- Sentiment heatmap — fills up red/yellow/green as agents complete
- Live feed — Twitter-like stream of agent reactions
- Conversion funnel — Aware → Interested → Willing to Pay → Would Buy
## What the Report Looks Like
The AI synthesis agent writes a brutally honest market prediction:
- Market Viability Score (0-100)
- Conversion rate prediction
- Top objections (what's killing your product)
- Top suggestions (what would make people buy)
- Go/No-Go recommendation
One of my test runs came back: "AgriBeacon is not viable in its current form. 0% conversion rate. Its value proposition is a
marketing mirage." Ouch. But exactly what I needed to hear.
## Tech Stack
- Backend: Python FastAPI, async parallel agent evaluation
- Frontend: Next.js 16, Canvas 2D, Framer Motion
- LLM: Any OpenAI-compatible API (works with free Alibaba Qwen tier)
- No agent framework — just async batch LLM calls with semaphore. CrewAI/LangGraph are overkill for this.
## Try It
bash
git clone https://github.com/nghiahsgs/Sybil-Swarm.git
cd Sybil-Swarm
# Set up your API key (Qwen free tier works!)
cp .env.example .env
# Install & run
cd backend && pip install -e . && uvicorn app.main:app --port 8000 &
cd ../frontend && npm install && npm run dev
Open http://localhost:3000 → paste your URL → Launch Simulation
Is This Actually Useful?
Honestly? It's not a replacement for talking to real humans. But it's a cheap, fast pre-filter:
- If 0% of simulated customers would buy → strong signal to pivot
- If 40%+ would buy → worth investing in real validation
- The objections list alone is worth it — things you didn't think of
What's Next
- Chat with individual agents post-simulation (already built)
- More provider support (OpenAI, Anthropic, Google, Qwen)
- Deploy as hosted service
GitHub: https://github.com/nghiahsgs/Sybil-Swarm
MIT license. Stars appreciated!
---

Top comments (16)
this is genuinely useful - I've found that getting harsh synthetic feedback before you have real users saves a lot of time. real user interviews are expensive and people are weirdly polite, so you often don't get the brutal truth until way later. the AI roasting isn't a replacement but it shifts your thinking earlier in the process, which is where it actually matters.
Exactly — the 'politeness problem' is real. In user interviews, people nod and say 'oh that's cool' then never sign up.
AI agents have no social pressure to be nice, so you get the objections faster. Glad this resonates
yeah the social friction removal is massive. real users filter themselves based on what they think you want to hear - unconsciously. 100 synthetic respondents have no agenda, no politeness reflex. that's the actual unlock
The architectural decision to skip CrewAI/LangGraph and go with raw async batch calls is underrated. I run a fleet of AI agents for managing a large financial data site and came to the same conclusion — agent frameworks add overhead and abstraction that makes debugging harder when you really just need concurrent LLM calls with good error handling.
The pre-filter framing resonates with how I evaluate new product ideas. I score every opportunity on a 5-axis framework (TAM, competition, monetization clarity, time to first dollar, scalability) before building anything. Something like Sybil Swarm could add a sixth axis — simulated demand signal — that catches the "sounds good on paper but nobody actually wants it" problem earlier.
One thing I would watch for at the 1,000 agent scale: LLM temperature and persona prompting can create a false sense of diversity. I have seen this in content generation where you ask for varied analysis across thousands of pages and the model converges on 3-4 archetypal responses regardless of persona parameters. The sentiment clustering visualization would actually be a great diagnostic for detecting this — if your canvas shows tight clusters instead of a gradient, the personas might not be as independent as they appear. Have you looked at the actual distribution of agent reasoning patterns vs just the final buy/no-buy outcome?
Honest answer — right now I'm mainly looking at the outcome fields (buy/not buy, willingness_to_pay, sentiment_score) plus the individual reasoning and objections. Each agent does return a full reasoning string, but I haven't done any clustering analysis on the reasoning text itself to measure actual diversity.
You're probably right that despite 1,000 different personas, the reasoning patterns likely converge into a handful of archetypes. The canvas visualization gives a rough signal (tight clusters = low diversity), but a proper analysis — maybe embedding the reasoning texts and clustering them — would give a much clearer picture of how many truly distinct viewpoints we're getting.
That's a solid next step. Thanks for pushing on this.
The embedding + clustering idea is really smart. I run something similar on a financial data site where I generate analysis content for 8,000+ stock tickers using a local LLM — and the archetype convergence problem is real. Even with different system prompts per sector, you end up with maybe 5-6 distinct reasoning patterns that cover 90% of outputs. The visual cluster approach you mentioned would be a great diagnostic — if you see tight clusters despite diverse personas, that's a signal to inject more structured variance into the prompt templates themselves, not just the persona descriptions.
Thanks so much for the incredibly deep insight, Apex!
The 'archetype convergence' problem you mentioned is exactly the kind of 'hidden' hurdle I was worried about
Totally agree with the 'lean' architectural choice here. Avoiding the overhead of heavy frameworks like CrewAI and focusing on raw async calls is a game changer for debugging at scale. As a founder, I’m also diving deep into LLM reasoning patterns to ensure our synthetic users don't just 'sound' different but actually 'think' differently. Thanks for the inspiration
This is Amazing! Really good job!
Right now, I have no clue about the full logic of the generation of reviews and what goes in context and what doesn't.
But the base is there. Giving it a star and asking to continue improving it!
I will try to spend some time over the weekends to maybe give some suggestions, PRs.
Or maybe later will ask my employee at Vexrail to come and have a look.
Thanks Nikoloz! Would love PRs and suggestions. The core flow is: persona generation (batch LLM) → parallel agent evaluation → aggregation report. Check backend/app/services/ for the main pipeline. Feel free to open issues with ideas — happy to discuss architecture decisions!
I'm curious about the persona generation methodology. In Rails projects I've worked on where we've tried similar synthetic user research (much smaller scale), we always ran into the problem of LLMs defaulting to median personas — you end up with 1,000 agents that are suspiciously similar despite their varied surface attributes.
Are you doing anything to force genuine distribution across the personality/attitude space, or does the clustering on your canvas world reveal that the "unique personas" tend to converge in their reasoning patterns even when their demographics differ?
That said, you're right that LLMs tend to converge on archetypal reasoning patterns. It's an active area of improvement — experimenting with temperature tuning per persona tier and more adversarial persona prompts for the 'contrarian' segment. PRs welcome if you have ideas!
The pre-filter framing is exactly right. Getting brutally honest signals before you invest heavily in real user research is how smart founders operate. The 0% vs 40% heuristic is a solid gut-check.
One thing I'd add from experience: the objections list is arguably more valuable than the buy/no-buy percentage. Real customers rarely tell you why they said no — they just don't reply. Simulated agents can articulate the friction in ways that point directly at messaging or positioning gaps.
Are you thinking about letting users define custom personas (industry, job title, budget range) for more targeted simulation? That would make this genuinely powerful for niche B2B ideas.
Thank you . Custom personas is planned. On the technical side, I'm also working on making agents genuinely "live" their persona rather than just role-playing it
Good
My Repo : github.com/nghiahsgs/Sybil-Swarm