Building a 52,000-Investor Database From Scratch With No Funding

Most people assume you need money to find money. That assumption cost me six months I will never get back.

How It Started

I launched MentionFox without a war chest, without a warm intro to a partner at a16z, and without a BD person whose job was to "build relationships in the VC ecosystem." What I had was a product that scraped, parsed, and organized public signals across the web, and a stubborn belief that the same infrastructure I was building for our customers could solve my own problem first.

The problem was simple to describe and brutal to live with. I needed investors. Not the idea of investors. Not a list of 200 firms copy-pasted from Crunchbase. I needed the right people, at the right firms, writing checks at the right stage, who had actually touched a company like mine in the last 18 months. That specificity is where most founder databases fall apart. They give you a phone book. They do not give you a lead.

So I decided to build my own. And because I had no budget, I had to be disciplined in a way that funded teams never are. Every data point had to earn its place.

What I Actually Built and How

The first version was embarrassing. A Google Sheet with 300 rows, sourced from LinkedIn searches, AngelList profiles, and a few Substack newsletters I had bookmarked. I thought I was being clever. I was being slow. The sheet was already stale by the time I finished populating it.

The second version was where things changed. I started treating investor discovery the same way I treated media monitoring. Instead of waiting for investors to announce themselves, I tracked the signals they were already leaving in public. Podcast appearances. Conference speaker slots. Twitter threads about specific verticals. Blog posts on firm websites. Portfolio company press releases that revealed a thesis. Quotes in TechCrunch deal announcements. Each of these signals told me something a database field never could: what this person actually cared about right now, not what their firm bio said they cared about two years ago.

I built structured scraping workflows that pulled this into a central store. Then I layered in deduplication, firm-level context, and a simple scoring model. Investors who had recently posted about B2B SaaS tooling, who had led at least one Seed or Series A in the last 12 months, and who had a check size that made sense for where I was got a higher score. Investors who had not touched the category in three years, or whose last deal was a Series D, got deprioritized regardless of how famous their name was. This sounds obvious. Almost no one does it.

By month four, the database had crossed 10,000 records. By month eight, 30,000. The jump to 52,000 came from expanding the signal sources: international LP databases, government-backed fund disclosures in the UK and EU, family office filings, and corporate venture arms that most founders completely ignore because they do not fit the classic VC mold. Corporate VCs wrote a lot of checks last year. They are faster to close than you think. They were dramatically underrepresented in every investor list I had ever seen.

The quality gate mattered as much as the volume. I ran periodic audits where I would pull 100 random records and manually verify them. Firm still active? Check. Partner still at the firm? Check. Fund vintage still open? Check. At month twelve, my accuracy rate on active, reachable investors was above 85 percent. That number matters because outreach to a stale record is not just wasted effort. It damages your sender reputation and, if you are emailing, your deliverability.

What the Data Taught Me About Investor Outreach

Three things surprised me when I started actually using the database.

First, recency of activity is a better filter than AUM or prestige. A managing partner at a $500M fund who has not led a deal in nine months is a worse lead than an emerging manager who closed two deals last quarter. The signal that someone is actively deploying capital is worth more than any credential on their bio page.

Second, the investors who engage with content in your category before you reach out convert at a rate that is roughly four times higher than cold outreach to people who match on paper. When I could see that someone had recently shared an article about AI-native B2B tools, my reply rate on a first email jumped from around 4 percent to somewhere between 15 and 20 percent. That is not a small difference. That is the difference between a process that feels like shouting into a void and one that feels like a conversation.

Third, the database itself is a leverage point in calls. When you can walk into a conversation and say "I noticed your last three investments all had this characteristic in common, and here is how we fit that pattern," you stop sounding like a founder pitching and start sounding like someone who has done the work. Investors notice. It shifts the dynamic in a way that no pitch deck polish ever will.

The Practical Version for Founders Who Cannot Build This Themselves

I am not going to pretend the path I took is repeatable for most founders. Building a scraping and enrichment infrastructure from scratch while also running a company is not a good use of your time unless that infrastructure is itself your product. It was for me. For most people, it should not be.

What I would tell you to do instead is think about your investor search the way you would think about a sales pipeline. Define your ideal investor profile with the same rigor you would use for an ICP. Stage, check size, vertical focus, geographic preference, recent activity, fund age. Then find a tool or a workflow that lets you filter on those dimensions in real time rather than working from a static list someone exported six months ago.

The feature we eventually built into MentionFox for this is what I still use every week. You can find investors using live signals rather than static database entries, which means the results you see reflect what someone is actually doing right now, not what their firm website says they do. That distinction, between declared thesis and revealed behavior, is the whole game. If you want to understand what it costs to access that and the rest of the platform, the pricing page lays it out without a sales call requirement.

The 52,000 records I built are now embedded in the product. I did not build them to flex about scale. I built them because I was desperate and had no other option, and somewhere in that desperation I found a method that actually worked.

If you found this useful, I write about solo-founder distribution, B2B SaaS, and what's actually working in the AI-search era over on my Substack (one post per week, no spam).

I'm building MentionFox - a B2B intelligence suite that combines brand mention tracking with AI-visibility (GEO) measurement, investor research, and outreach automation. There's a free tier and a 5-day trial of Pro at mentionfox.com/pricing.

DEV Community

Building a 52,000-Investor Database From Scratch With No Funding

How It Started

What I Actually Built and How

What the Data Taught Me About Investor Outreach

The Practical Version for Founders Who Cannot Build This Themselves

Top comments (0)