DEV Community

nghiahsgs
nghiahsgs

Posted on

I built an open-source "focus group simulator" that spawns 1,000 AI customers to roast your startup idea

The Problem

Every founder has been there: you spend 3 months building something, launch it, and... crickets. Nobody wants it.

User interviews cost time. Landing page A/B tests cost money. Survey panels cost both.

What if you could get brutally honest feedback from 1,000 potential customers in 2 minutes?

What I Built

Sybil Swarm is an open-source swarm intelligence engine. You feed it your product URL or description, and it:

  1. Spawns 1,000 AI agents — each with a unique persona (age, job, income, personality, interests)
  2. Has them evaluate your product as real potential customers
  3. Generates a market prediction report with conversion rate, objections, and recommendations

The name comes from the Sibyls — prophetic oracles of ancient Greece.

Demo

The Dashboard

The simulation runs in real-time with:

  • Canvas world — agents move around, cluster by sentiment (buyers → right, rejectors → left)
  • Sentiment heatmap — fills up red/yellow/green as agents complete
  • Live feed — Twitter-like stream of agent reactions
  • Conversion funnel — Aware → Interested → Willing to Pay → Would Buy

## What the Report Looks Like

The AI synthesis agent writes a brutally honest market prediction:

  • Market Viability Score (0-100)
  • Conversion rate prediction
  • Top objections (what's killing your product)
  • Top suggestions (what would make people buy)
  • Go/No-Go recommendation

One of my test runs came back: "AgriBeacon is not viable in its current form. 0% conversion rate. Its value proposition is a
marketing mirage."
Ouch. But exactly what I needed to hear.

## Tech Stack

  • Backend: Python FastAPI, async parallel agent evaluation
  • Frontend: Next.js 16, Canvas 2D, Framer Motion
  • LLM: Any OpenAI-compatible API (works with free Alibaba Qwen tier)
  • No agent framework — just async batch LLM calls with semaphore. CrewAI/LangGraph are overkill for this.

## Try It


bash
  git clone https://github.com/nghiahsgs/Sybil-Swarm.git
  cd Sybil-Swarm
  # Set up your API key (Qwen free tier works!)
  cp .env.example .env
  # Install & run
  cd backend && pip install -e . && uvicorn app.main:app --port 8000 &
  cd ../frontend && npm install && npm run dev

  Open http://localhost:3000 → paste your URL → Launch Simulation

  Is This Actually Useful?

  Honestly? It's not a replacement for talking to real humans. But it's a cheap, fast pre-filter:

  - If 0% of simulated customers would buy → strong signal to pivot
  - If 40%+ would buy → worth investing in real validation
  - The objections list alone is worth it — things you didn't think of

  What's Next

  - Chat with individual agents post-simulation (already built)
  - More provider support (OpenAI, Anthropic, Google, Qwen)
  - Deploy as hosted service

  GitHub: https://github.com/nghiahsgs/Sybil-Swarm

  MIT license. Stars appreciated!

  ---
Enter fullscreen mode Exit fullscreen mode

Top comments (16)

Collapse
 
itskondrat profile image
Mykola Kondratiuk

this is genuinely useful - I've found that getting harsh synthetic feedback before you have real users saves a lot of time. real user interviews are expensive and people are weirdly polite, so you often don't get the brutal truth until way later. the AI roasting isn't a replacement but it shifts your thinking earlier in the process, which is where it actually matters.

Collapse
 
nghiahsgs profile image
nghiahsgs

Exactly — the 'politeness problem' is real. In user interviews, people nod and say 'oh that's cool' then never sign up.
AI agents have no social pressure to be nice, so you get the objections faster. Glad this resonates

Collapse
 
itskondrat profile image
Mykola Kondratiuk

yeah the social friction removal is massive. real users filter themselves based on what they think you want to hear - unconsciously. 100 synthetic respondents have no agenda, no politeness reflex. that's the actual unlock

Collapse
 
apex_stack profile image
Apex Stack

The architectural decision to skip CrewAI/LangGraph and go with raw async batch calls is underrated. I run a fleet of AI agents for managing a large financial data site and came to the same conclusion — agent frameworks add overhead and abstraction that makes debugging harder when you really just need concurrent LLM calls with good error handling.

The pre-filter framing resonates with how I evaluate new product ideas. I score every opportunity on a 5-axis framework (TAM, competition, monetization clarity, time to first dollar, scalability) before building anything. Something like Sybil Swarm could add a sixth axis — simulated demand signal — that catches the "sounds good on paper but nobody actually wants it" problem earlier.

One thing I would watch for at the 1,000 agent scale: LLM temperature and persona prompting can create a false sense of diversity. I have seen this in content generation where you ask for varied analysis across thousands of pages and the model converges on 3-4 archetypal responses regardless of persona parameters. The sentiment clustering visualization would actually be a great diagnostic for detecting this — if your canvas shows tight clusters instead of a gradient, the personas might not be as independent as they appear. Have you looked at the actual distribution of agent reasoning patterns vs just the final buy/no-buy outcome?

Collapse
 
nghiahsgs profile image
nghiahsgs

Honest answer — right now I'm mainly looking at the outcome fields (buy/not buy, willingness_to_pay, sentiment_score) plus the individual reasoning and objections. Each agent does return a full reasoning string, but I haven't done any clustering analysis on the reasoning text itself to measure actual diversity.

You're probably right that despite 1,000 different personas, the reasoning patterns likely converge into a handful of archetypes. The canvas visualization gives a rough signal (tight clusters = low diversity), but a proper analysis — maybe embedding the reasoning texts and clustering them — would give a much clearer picture of how many truly distinct viewpoints we're getting.

That's a solid next step. Thanks for pushing on this.

Collapse
 
apex_stack profile image
Apex Stack

The embedding + clustering idea is really smart. I run something similar on a financial data site where I generate analysis content for 8,000+ stock tickers using a local LLM — and the archetype convergence problem is real. Even with different system prompts per sector, you end up with maybe 5-6 distinct reasoning patterns that cover 90% of outputs. The visual cluster approach you mentioned would be a great diagnostic — if you see tight clusters despite diverse personas, that's a signal to inject more structured variance into the prompt templates themselves, not just the persona descriptions.

Thread Thread
 
nghiahsgs profile image
nghiahsgs

Thanks so much for the incredibly deep insight, Apex!

The 'archetype convergence' problem you mentioned is exactly the kind of 'hidden' hurdle I was worried about

Collapse
 
nghiahsgs profile image
nghiahsgs

Totally agree with the 'lean' architectural choice here. Avoiding the overhead of heavy frameworks like CrewAI and focusing on raw async calls is a game changer for debugging at scale. As a founder, I’m also diving deep into LLM reasoning patterns to ensure our synthetic users don't just 'sound' different but actually 'think' differently. Thanks for the inspiration

Collapse
 
axrisi profile image
Nikoloz Turazashvili (@axrisi)

This is Amazing! Really good job!
Right now, I have no clue about the full logic of the generation of reviews and what goes in context and what doesn't.

But the base is there. Giving it a star and asking to continue improving it!
I will try to spend some time over the weekends to maybe give some suggestions, PRs.

Or maybe later will ask my employee at Vexrail to come and have a look.

Collapse
 
nghiahsgs profile image
nghiahsgs

Thanks Nikoloz! Would love PRs and suggestions. The core flow is: persona generation (batch LLM) → parallel agent evaluation → aggregation report. Check backend/app/services/ for the main pipeline. Feel free to open issues with ideas — happy to discuss architecture decisions!

Collapse
 
devgab profile image
DevGab

I'm curious about the persona generation methodology. In Rails projects I've worked on where we've tried similar synthetic user research (much smaller scale), we always ran into the problem of LLMs defaulting to median personas — you end up with 1,000 agents that are suspiciously similar despite their varied surface attributes.

Are you doing anything to force genuine distribution across the personality/attitude space, or does the clustering on your canvas world reveal that the "unique personas" tend to converge in their reasoning patterns even when their demographics differ?

Collapse
 
nghiahsgs profile image
nghiahsgs

That said, you're right that LLMs tend to converge on archetypal reasoning patterns. It's an active area of improvement — experimenting with temperature tuning per persona tier and more adversarial persona prompts for the 'contrarian' segment. PRs welcome if you have ideas!

Collapse
 
maxothex profile image
Max Othex

The pre-filter framing is exactly right. Getting brutally honest signals before you invest heavily in real user research is how smart founders operate. The 0% vs 40% heuristic is a solid gut-check.

One thing I'd add from experience: the objections list is arguably more valuable than the buy/no-buy percentage. Real customers rarely tell you why they said no — they just don't reply. Simulated agents can articulate the friction in ways that point directly at messaging or positioning gaps.

Are you thinking about letting users define custom personas (industry, job title, budget range) for more targeted simulation? That would make this genuinely powerful for niche B2B ideas.

Collapse
 
nghiahsgs profile image
nghiahsgs

Thank you . Custom personas is planned. On the technical side, I'm also working on making agents genuinely "live" their persona rather than just role-playing it

Collapse
 
nghiahsgs profile image
nghiahsgs

Good

Collapse
 
nghiahsgs profile image
nghiahsgs