How We Generate 300+ AI Business Ideas a Month With GPT-5 (and Filter the Junk Out)

#ai #startup #indiehackers #productivity

Six months ago I shipped an AI quiz that matches aspiring founders to a business idea they can actually build. The matcher only works if the underlying idea library is large, fresh, and not full of slop. So I had to build the pipeline that fills it.

This post walks through the real architecture: prompt design, the validation gate, the day it silently produced zero ideas for 48 hours, and what we'd cut if we started over. If you're building anything that uses LLMs to generate structured content at scale, you'll probably hit the same walls.

The product this powers is AI Student Factory — but the pipeline is generic.

The constraint that shaped everything

The matcher quiz returns a single idea per user. If that idea is bad, the entire product is bad. So the bar wasn't "generate lots of ideas" — it was "every idea in the library has to be someone's reasonable next 6 months."

That meant each row needed:

A real keyword with real search volume (not made up)
A difficulty score we trust
A 6-step build plan a non-engineer can follow
A summary long enough to be useful (we settled on ≥ 200 characters)
Honest tagging — niche, required skills, monetization model

LLMs are bad at all of these by default. They will gleefully invent a keyword volume of "8,400 searches/month" for a phrase no human has ever Googled.

Pipeline overview

That's it. Everything else gets accepted and surfaced; the matcher quiz handles ranking at retrieval time, not at insertion time. Loose at ingest, strict at query — same pattern as a search engine.

Stage 4: Storage

A single ideas table in Postgres, with:

structured columns for the things we filter on (volume, difficulty, niche)
a jsonb column for the build steps and tags
a published boolean for soft-gating
a summary column with a length check enforced at the DB level (not in app code — DB is the last honest layer)

Don't put the build plan in 6 separate columns. You will regret it the moment you want to support a 4-step idea.

What I'd cut if starting over

Cut: the "trend signal" inputs. I scraped Reddit/HN/PH for trending topics and fed them in as inspiration. The output got worse, not better — the model latched onto whatever was in the seed and ignored its training. Now I just give it a niche category and let it cook.

Cut: model ensembling. I tried running each idea through two models and merging. It was 2× the cost for a noise-level quality improvement. Pick one good model and trust it.

Keep: per-row provenance. Every idea stores the prompt version, model, and timestamp that generated it. When quality drifts, I can diff prompt versions against acceptance rate. This caught a regression three weeks ago that would otherwise have been invisible.

The boring lessons

Most of what made this work was operational, not clever:

Log everything. The 48-hour silent failure happened because I had no alerting on a metric I assumed was self-evident.
Validate at the DB. App code lies, schemas don't.
Tool calling > JSON mode > "please respond in JSON". Always.
Decouple generation from filtering. They have different latency and cost profiles.
Run the pipeline against yesterday's data before you trust today's.

The full product — the quiz that uses this library to match people to ideas — is at aistudentfactory.com. Happy to dig into any specific piece in the comments.

If you're building something similar and want to compare notes, my email is in the footer of the site. I read everything.

Top comments (1)

Harjot Singh • May 31

The validation gate is the actual product here, the generation is commodity, the filter is the moat. The "silently produced zero ideas for 48 hours" story is the one every LLM-pipeline builder needs to read, because it's the failure mode nobody designs for: the system didn't error, it just quietly stopped being useful, and a green health check told you nothing. That's the argument for treating output quality as a first-class signal, not just uptime. Two things that saved me on similar pipelines: a cheap deterministic pre-filter before the expensive LLM judge (dedupe, length, banned-shape) so slop never reaches the costly stage, and an alert on output distribution, not just on exceptions, so "all 300 ideas suddenly look the same" pages you. This is the verify-or-abstain layer I build into Moonshift, generate freely, but gate hard before anything ships. What tripped the 48-hour silent failure, was it an upstream prompt change, or the gate quietly rejecting everything?