I tested mcp-doctor pricing with 12 LLM-simulated personas. 4 said they would pay.

#ai #opensource #security #mcp

Earlier today I shipped @weiseer/mcp-doctor — an open-source supply-chain trust scanner for MCP (Model Context Protocol) servers. CLI + GitHub Action + Trust Badge + free public API at https://api.weiseer.com. Pro tier is $19/mo on Gumroad.

The honest question every solo founder skips: would anyone actually pay $19/mo for this?

I have a separate tool for exactly this — personalab, an open-source persona-driven product evaluation harness. 12 LLM-simulated personas read the product, decide each day what they'd do, and tell you who would pay and who would walk away. I've used it before on PostHog, Cal.com, and personalab-on-itself.

Tonight I ran it on mcp-doctor as case study #4. Code + raw data + full report all in github.com/weiseer/mcp-doctor/blob/main/case_study/.

Headline result

4 of 12 personas would pay (33%). 2 abandoned. 6 stayed engaged on the free tier.

Case study	Would-pay rate (under same persona harness)
mcp-doctor (today)	4/12 = 33%
personalab self-test	0/8
PostHog (5-day agentic)	0/12 sustained (6/12 day-1 yes)
Cal.com	8/12

This puts mcp-doctor between PostHog and Cal.com under the same methodology. Better than personalab itself, better than what PostHog showed under a 5-day sustained simulation, worse than Cal.com (which converged on a single clean friction lever — the famous "Powered by Cal.com" branding).

Not making PMF claims. Treat it as one signal among several.

Who paid

07 OSS maintainer — strongest engagement signal. Opened a GitHub issue on day 2, shared with team on day 3, subscribed to Pro on day 4. Quote synthesized from the 5-day transcript:

"Supply-chain audits are part of my actual job. A rubric I can fork and argue with is worth more than another vendor's black box. $19/mo is below my coffee budget."

06 Research consultant — buys tools on behalf of clients. Subscribed on day 5. The "buying for someone else" pattern showed up clearly — they care about whether the trust signal is defensible to a third party.

Who walked

02 Growth PM — final action: UNSUBSCRIBE_OR_UNINSTALL. Their verbatim:

"mcp-doctor 解决的是供应链信任问题，跟我的 OKR（Free→Paid conversion 3.2%→4.5%）完全正交。5 天了，零帮助我加快 A/B 迭代速度。时间成本 > $19 价值。"

(Translation: "mcp-doctor solves supply-chain trust. My OKR is conversion rate. They are orthogonal. After 5 days I haven't moved faster on A/B tests. Time cost exceeds $19 value.")

This is correct. The persona is right. Their OKR is conversion; my tool is supply chain. Audience mismatch.

11 Data team lead — abandoned over rubric calibration disagreement. They disagreed with how aggressively A1_unpinned_deps fires. This is real feedback the actual product would need to address (PR welcome on rubric.yaml).

Who stayed engaged but didn't pay

6 of 12 personas used the free tier daily, found genuine value, but did not subscribe. These are the free-tier loyalists — exactly the funnel design intent. They give us:

Word-of-mouth (some opened GitHub issues, shared with team)
Trust badge usage on their READMEs (free)
The actual marketing engine

If we tried to push these personas to Pro, we'd lose the funnel. Free tier should stay generous.

Patterns across the 60 simulations (12 × 5)

The personalab agentic mode runs each persona day-by-day, so I get 60 data points. Friction clusters extracted:

Cluster	# mentions across persona-days
Rubric calibration / false positive concerns	14
Pro tier value vs Free tier sufficiency	11
MCP-specific audience (do I even use MCP?)	9
Trust building (new brand)	8
vs npm audit / Snyk / Bumblebee	7
Self-serve / docs gap	4

The top cluster — rubric calibration — is the right one to prioritize. v0.2 of the scanner should add an LLM-judge mode for ambiguous signals (the same fix planned for @weiseer/prompt-redteam's detection).

The number-of-clusters observation from earlier personalab work was: pre-PMF products see 4-5 diffuse complaints, late-funnel products see 1-2 clean levers. mcp-doctor surfaced 6 clusters at day 1 of launch. That feels right — pre-PMF, complaints diffuse.

What I'm doing about it

Not changing pricing — 33% would-pay on the right persona slice is enough signal at $19/mo. Cal.com hits 67% on a more general audience; we accept narrower fit at this stage.
Sharpening audience — Twitter / Reddit posting should drop the "general developer" framing and double down on "MCP server users" specifically. The personas who pay are the ones who already do this work.
Rubric calibration — top friction cluster is real. v0.2 will add LLM-judge classification of ambiguous signals + explicit per-signal severity thresholds.
Not naming the package — case study itself is the marketing. No "X is the worst, buy mcp-doctor."

Honest disclosure

This is simulated user behavior via Claude Haiku 4.5, not real customer interviews. Treat as one signal, not as PMF validation.
The same persona library was previously calibrated on three other products; cross-product comparability is plausible but not proven.
The product context was shown once; real buyers would see Twitter, GitHub stars, friends' opinions, etc.
Some persona quotes may reflect personalab's own design biases (acknowledged in personalab's own meta case study).
Two products by the same person (mcp-doctor + personalab) tested by the same person — bias risk acknowledged.

Reproducibility

# Clone personalab
git clone https://github.com/g16253470-beep/personalab
cd personalab

# Adapt the runner to your product
# https://github.com/weiseer/mcp-doctor/blob/main/case_study/run_personalab.py

# Run on your own product brief
ANTHROPIC_API_KEY=... python run_personalab.py

The raw JSON output is at mcp-doctor/case_study/personalab_raw_report.json. Argue with the persona definitions. Fork the case-study runner. If you do this on your own product, the failure modes you find will tell you more than any survey.

Top comments (1)

Harjot Singh • May 31

Using LLM-simulated personas to pre-test pricing is a clever validation hack, way cheaper than waiting for real signups to tell you the price is wrong. The honest caveat: simulated "would you pay" is directional, not real, models are agreeable and don't feel the pain of actually opening a wallet, so 4 of 12 saying yes is probably optimistic. I'd treat it as a fast filter to kill obviously-wrong prices, then confirm with real payment intent (a checkout, a pre-order, a Stripe link). Great cheap first pass though. I think about cheap-validation-before-build a lot with Moonshift. Did the persona feedback end up matching what real users did, or was it rosier than reality?