DEV Community

NexGenData
NexGenData

Posted on • Originally published at thenextgennexus.com

How VC Associates Source Pre-Seed Deals Before Crunchbase Hears About Them

How VC Associates Source Pre-Seed Deals Before Crunchbase Hears About Them

The hardest part of a pre-seed associate's job isn't evaluating deals. It's finding them.

By the time a deal shows up in Crunchbase, it's usually already done. Crunchbase's median lag between a Form D filing and the corresponding listing on its public site is roughly 11 days, and that lag stretches to 30+ days for international or non-press-covered rounds. By that time, half a dozen Tier 1 firms have already met with the founder, three have made offers, and the round is either oversubscribed or closed.

Top-quartile sourcing teams — the ones consistently winning competitive deals — aren't using Crunchbase to find companies. They're using Crunchbase to track companies they already met. The actual sourcing happens 4-8 weeks earlier, in public datasets that Crunchbase doesn't even ingest.

This post is the playbook I've watched several emerging-manager funds use to consistently surface pre-seed deals before the rest of the market. None of it requires paid data; all of it can be automated. (The NexGenData Startup Funding Tracker and YC Companies Directory actors wrap most of these sources if you'd rather not build it yourself.)

The Sourcing Funnel: From Signal to Term Sheet

A typical pre-seed sourcing funnel looks like this:

  1. Signal source — public dataset that emits a "company exists" event
  2. Enrichment — pull domain, founders, team size, location, sector
  3. Filtering — apply your fund's thesis filters (B2B SaaS, vertical-AI, US-only, etc.)
  4. Outreach trigger — first contact with founder

The Tier 1 firms shorten the gap between step 1 and step 4 to under 10 days. The median fund takes 6-12 weeks to get from "saw company on Crunchbase" to "first call." That gap is where deals are won.

The signal sources that compress step 1 into hours-not-weeks:

Signal 1: Delaware Division of Corporations

The Delaware Division of Corporations updates its public search index daily. Every Delaware C-corp formation appears within 24 hours of filing. Filter for:

  • Stripe Atlas, Carta, Clerky, or Mercury Atlas as registered agent — these formation services skew heavily toward venture-backed tech startups
  • Brand-able company names (3-12 characters, .com domain available) — strong signal of marketing intent vs holding-co/spinout
  • Formation date within last 14 days — fresh enough that founders haven't started fundraising yet, but late enough that the company is real

A typical week produces 200-400 Stripe-Atlas-formed Delaware C-corps. Maybe 80-100 of them clear the brand-ability + thesis filter. Of those, ~30 are worth a 5-minute LinkedIn check on the founders. Of those, ~5 are worth an outbound email.

That's a sourcing volume of ~5 founder cold outbounds per week from a single signal source — and you're talking to founders 4-8 weeks before they raise.

Signal 2: Y Combinator Pre-Demo-Day

Y Combinator publishes its full alumni roster, including current-batch companies, well before demo day. The data is queryable through YC's Algolia search index with no authentication. For each current-batch company you can pull: founders, team size, brief description, sector, location, and (often) website.

The window between "YC accepts the company" and "demo day" is the cleanest sourcing window in the market. Founders are at YC, focused on building, NOT yet meeting investors at scale. A thoughtful intro email at week 4 of the batch — referencing something specific about the founder's background — has a 30-40% reply rate. By demo day, that same email has a 5-10% reply rate, because the founder is now buried in 200 inbound investor pings.

The catch: YC publishes batch metadata gradually. New W26 entries appear on the YC alumni page in waves throughout the batch. Polling daily catches each wave within 24 hours.

Signal 3: SEC Form D — The Confirmation Source

Form D isn't a leading indicator — it's the legal record that money has already moved. But it's the most precise source for "this company just raised X dollars." For mid-stage tracking (Series A through Series C), Form D is the most reliable single source available.

For pre-seed sourcing specifically, Form D plays a different role: it's the confirmation that a deal you've been tracking actually closed. If you've been talking to a founder for 4 months and they finally file a Form D for $500K with 5 investors, you know the round closed without you in it. That tells you something about your conversion funnel.

Form D filings are also the leakiest source for confirming round details. The "totalAmountSold" field is the actual amount raised; "totalNumberAlreadyInvested" is the count of investors. From those two numbers plus the issuer's age, you can infer round name with reasonable accuracy:

  • ~$100K-$500K, 1-3 investors, company <12 months old → pre-seed
  • ~$500K-$2M, 3-10 investors, company <18 months old → seed
  • ~$2M-$8M, 5-15 investors, company 12-30 months old → seed extension or Series A
  • ~$8M-$30M, 8-25 investors, company 18-48 months old → Series A or B
  • $30M+ → Series B or later

Signal 4: GitHub Founder Activity

A surprisingly underused signal: GitHub's public API exposes user activity, repo creation, and stars-given patterns. A senior engineer who suddenly creates a new GitHub organization, transfers their personal projects to it, and starts pushing commits at 2am on weekends is, statistically, working on a startup.

Cross-reference the GitHub username against LinkedIn (often the same handle). If LinkedIn shows the person left a senior role at a brand-name tech company in the last 6 months, the GitHub-org-creation signal is high-precision: they're stealth-building.

This is the only signal in this post that requires real engineering investment. The GitHub Events API rate-limits hard at 5K requests/hour authenticated, and you're tracking thousands of senior engineers. Most funds either skip this signal entirely or build a focused tracker for the top 500 engineers in their target verticals.

Signal 5: Conference Speaker Lists

Industry conference speaker lists are one of the highest-precision pre-seed sourcing signals. A founder who speaks at SaaStr, INBOUND, or RSA Conference is, by definition, a domain expert with public speaking comfort and a network — three of the four traits VCs care about (the fourth is "actually building something useful").

Speaker lists publish 2-3 months before conferences. A first-time speaker who just left a senior IC or director role at a brand-name company is, statistically, a high-quality founder candidate. Rate-limit yourself to one cold email per conference per founder; the conversion math holds even with low volume.

Putting It Together: A Concrete Workflow

A pre-seed associate I work with runs this sourcing stack daily, takes ~45 minutes/morning:

  1. 6:30am — pull last 24h of Delaware DOC formations matching registered_agent IN (Stripe Atlas, Carta, Clerky, Mercury) AND name LIKE '<3-12 chars>'. ~30 hits.
  2. 6:35am — for each hit, search GitHub for the company name and the formation-date founders. Cross-reference against LinkedIn.
  3. 6:50am — pull last 24h of YC alumni updates. ~5 new entries (some days zero).
  4. 7:00am — pull last 24h of Form D filings, filtered to totalOfferingAmount BETWEEN 200000 AND 5000000. ~40 hits, mostly seed-stage.
  5. 7:10am — categorize hits into 4 buckets: pre-seed-fresh (Delaware-formed last 30d, no Form D yet), seed-just-closed (Form D last 7d, no TechCrunch coverage), seed-press-covered (Form D + TechCrunch this week), and YC-current-batch (current YC batch, not yet demo day).
  6. 7:15am — write 5-10 personalized cold outbound emails based on the highest-priority bucket.

That's roughly 50 cold outbound emails per week, with a 25-30% reply rate, generating 12-15 first calls/week, of which 2-3 convert to deeper diligence, of which 3-5/year convert to a deal. That's a pre-seed deployment cadence of one deal/quarter from a single associate doing 45 minutes/morning.

The actors that wrap most of these signals: NexGenData Startup Funding Tracker for Form D + TechCrunch, YC Companies Directory for YC alumni, Delaware Corporations Search for DE C-corp formations.

The Crunchbase Trap

The reason Crunchbase doesn't help pre-seed sourcing is that Crunchbase's revenue model selects against speed. Crunchbase makes money from selling lists to BD/sales teams, not from being first. The platform optimizes for completeness and editorial verification — both of which add latency. By the time a round is "verified" in Crunchbase, the deal is months old.

The funds that consistently win competitive deals figured out long ago that Crunchbase is a tracking tool, not a sourcing tool. The sourcing happens upstream — in formations, in conference speaker lists, in GitHub activity, in YC's pre-demo-day batches. By the time the round hits Crunchbase, you should already know about it because you talked to the founder 8 weeks ago.


NexGenData publishes 195+ buyer-intent actors covering early-stage signal sources: Delaware DOC, SEC EDGAR, YC alumni, Show HN, Product Hunt, Hacker News engagement, and more. All actors are pay-per-result.

Top comments (0)