How to Build a Startup Lead List from YC Company Data

#marketing #api #webscraping #opensource

If you sell to startups, recruit engineers, or build a book of business off freshly funded teams, the Y Combinator company directory is one of the highest-signal lead sources on the public internet. Every company in YC has been vetted, funded, and pushed through a structured program — they have budget, urgency, and a habit of buying tools quickly. The catch is that the directory was built for browsing, not outbound. Pulling a clean, segmented lead list out of it is where most GTM teams stall. This post is a tactical playbook for turning YC into a working pipeline asset.

The Problem: YC's Directory Is Built for Browsing, Not Pipeline

Anyone who has tried to source startups manually from ycombinator.com/companies knows the pain. The UI lets you filter by batch, industry, region, and a handful of tags, but you cannot export. You cannot save a segment. You cannot stack a "founded in 2024, hiring engineers, based in SF, NOT in stealth" filter and walk away with a CSV. Copy-paste works for ten rows. By row fifty, you have lost an afternoon and your data is already stale because S25 just dropped.

The other failure mode is static lists. People download a one-off CSV, load it into Apollo or HubSpot, and run sequences against it for six months. The YC directory is a living dataset: companies change status (Active, Acquired, Public, Inactive), team sizes move, new batches launch twice a year. A list clean in March is full of zombies by September, and your reply rates quietly tank.

The third gap is segmentation depth. The native filters are coarse. You cannot easily ask "show me every Series A-stage YC company in fintech, US-based, 11-50 employees, with an open engineering role, excluding stealth." That is a textbook ICP query for an SDR team, and you cannot run it in the UI.

Why YC Company Data Matters for Outbound

YC companies are unusually attractive buyers for B2B sellers. They are well-capitalized — even a fresh-batch company typically closes on $500K from YC plus a SAFE round shortly after Demo Day. They move fast on tooling because the founders are usually the buyers, with no procurement gauntlet. And they have a multi-year window where the stack is still being chosen. If you land a YC company in months 0-12, you are usually in before they pick an incumbent, which means you ride the expansion curve from 5 to 50 to 500 employees.

For recruiters, the math is even better. YC companies hire aggressively in years 1-3 and pay above market for senior engineering, product, and GTM talent. A founder posting their first three engineering roles is a warm intro waiting to happen. For VCs and corp dev, the directory is a real-time map of who is shipping in any vertical. For agencies, journalists, and BD professionals, it is the cleanest public list of which startups are alive, who runs them, and what they do. The value of a YC lead decays over time — the earlier you reach a company, the higher the conversion.

What the YC Companies Directory Scraper Extracts

The YC Companies Directory Scraper on Apify pulls a structured row for every company in the directory, from batch S05 forward through the latest cohort. For each company you get roughly thirty fields:

Identity: company name, slug, YC profile URL, primary website, logo URL
Batch metadata: batch code (W25, S25, W24, etc.), batch year, top-company badge
Positioning: one-line tagline, long description, primary industry, sub-industries, regions, locations
Operational signals: current status (Active, Acquired, Public, Inactive), team size, hiring flag, careers/jobs URL
People: founder names and titles where listed on the YC profile
Geo: primary city, country, remote flag

That is enough surface area to build a tight ICP filter. Stack conditions like "batch in (W25, S25) AND status = Active AND team_size 5-50 AND hiring = true AND region = US" and you end up with a few hundred companies that are exactly your target — not a few thousand junk rows to triage.

Example Workflow: From Directory Dump to Live Sequence

Here is the concrete five-step playbook GTM teams run on this data. End-to-end it takes about an hour the first time, roughly ten minutes per refresh after that.

Step 1 — Pull the batch slice. Run the YC scraper with the batches you care about (typically the two most recent, plus the prior year for longer cycles). Export to CSV or push to Google Sheets. For most SDR teams the right slice is the last 18 months, which gives you 600-1,200 companies.

Step 2 — Apply your ICP filter. Filter by status (drop Inactive and Acquired unless those fit your persona), team size, industry, and hiring flag. A typical SDR ICP cuts the raw list by 60-80%. You are now sitting on the qualified subset.

Step 3 — Enrich for contacts. The YC profile gives you founder names but not direct emails. Pipe the qualified list through contact-info-scraper against each company website to surface published emails, phones, and socials. For deeper coverage on founder and exec emails, run lead-list-enricher to append titles, LinkedIn URLs, and verified work emails.

Step 4 — Personalize on the YC tagline. The one-liner and long description fields are gold for first-touch personalization. Use them as the merge variable: "Saw [Company] is building [one-liner]. We help YC-stage teams in [industry] solve [pain]." This is the single highest-leverage step for reply rate. The tagline already tells you what the founder cares about this quarter.

Step 5 — Push to outbound and set cadence. Upload the enriched list to Apollo, Outreach, Salesloft, or HubSpot. Build a YC-specific sequence with a three-touch cadence over ten days (email, LinkedIn connect, email). Tag by batch so you can measure reply rate by cohort — fresh batches almost always outperform older ones. Re-run the scraper monthly to catch new batches and status changes.

Use Cases Across GTM, Recruiting, and Capital

SDR / BDR outbound: batch-specific sequences for the freshest cohort each quarter, with founder-direct email as the primary channel. Highest reply rates land in months 2-6 post-batch.
Account executive territory planning: assign new YC accounts by industry or region so AEs own a clean named-account book with intent signals built in.
RevOps list hygiene: use the refreshed scrape as source of truth to auto-deactivate stale CRM records. Inactive and Acquired companies should not be in active sequences.
Recruiter pipeline sourcing: filter for hiring = true, cross-reference founder LinkedIn for warm intros. YC companies hire engineers at a multiple of the broader market rate.
VC competitive intel: map every new YC company in your thesis areas the day each batch is announced. Decide who to chase before the herd notices.
Agency lead generation: design, dev, and growth agencies use YC as a primary ICP. Filter by batch and team size to find the sweet spot where companies have budget but no in-house function yet.
Founder peer outreach: founders selling to founders use the directory to find peers for partnership, beta testing, and design-partner conversations.
Journalist and analyst sources: reporters use the scrape to surface trends and find founders to interview before they are saturated with press requests.
Corp dev and BD targeting: enterprises identify M&A; and partnership targets filtered by stage and team size.
Investor relations: VCs map portfolio peers, comps, and follow-on opportunities in the YC ecosystem.

Get the Data: Run the YC Companies Directory Scraper

The fastest way to turn this playbook into pipeline is to run the actor and pull your first batch. The scraper is pay-per-result, so you only pay for the rows you actually use, and the output drops straight into CSV, JSON, Excel, or Google Sheets. Set it on a monthly schedule and you have a self-refreshing lead source that beats any static list you can buy.

Run the YC Companies Directory Scraper on Apify->

Related Actors for Building a Complete Lead Stack

The YC scraper is the seed list. To turn it into a contactable, segmented outbound dataset, pair it with these:

Contact Info Scraper — pulls emails, phones, and social profiles from any company website.
Company Enrichment Tool — appends firmographic details (industry, size, tech stack hints, social presence) to the YC base record.
Lead List Enricher — converts company rows into person-level records with titles, LinkedIn URLs, and verified work emails.
B2B Leads Finder — discovery layer for decision-makers when YC profiles list only founders and you need VPs or directors.
Website Email Extractor — lightweight bulk email harvester for fast sweeps across YC website URLs.
Indie Hackers product trackers — complementary feed for bootstrapped founders outside the YC universe.
Founders Fund Portfolio Scraper — parallel feed for sellers targeting tier-one VC-backed startups beyond YC.
Lightspeed Portfolio Scraper — same pattern for the Lightspeed portfolio, useful for layering multiple VC feeds into one startup ICP.

For deeper workflows see our guide on exporting YC company directory data for VC sourcing, the startup funding data playbook, and our walkthrough on extracting contact information for lead-gen workflows.

FAQ

Is YC company data public?

Yes. The directory is published on ycombinator.com/companies and is publicly browsable. Scraping the same fields visible in the UI for research, sales, and recruiting is standard practice. Respect rate limits, do not republish the raw dataset, and use the data to inform outreach.

How fresh is the data?

The scraper pulls live from the directory each run. A monthly schedule catches new batches within weeks of launch and picks up status changes as companies get acquired, go public, or wind down.

Can I filter by batch?

Yes — every row includes batch code (W25, S25, etc.) and batch year. Most outbound teams slice the two most recent batches plus the prior year to focus on the high-conversion window.

What about Demo Day-only listings?

Companies that present at Demo Day but stay off the public directory will not appear in this scrape, since the actor mirrors what is published. For Demo Day-specific intel you typically need an investor login or press partnership.

Can I get founder emails directly from the YC profile?

The YC profile lists founder names but not direct emails. Pipe the company list through contact-info-scraper or use lead-list-enricher to append verified work emails.

Is this against YC's terms of service?

Scraping publicly available pages for research is generally permitted under standard web norms, and the YC directory is a public marketing asset. The actor accesses only what is rendered to any anonymous visitor. Consult your own legal counsel for your specific use case.

How does this compare to Crunchbase Pro or Apollo?

Crunchbase and Apollo carry YC tags, but their batch-level metadata often lags the official directory by weeks or months. Scraping the source gives you the canonical, freshest version of every field at a fraction of the per-record cost. Most teams use the YC scrape as the authoritative base and layer Apollo or Crunchbase for person-level enrichment.

How often should I refresh the list?

Monthly is the sweet spot for most outbound teams. Weekly is overkill unless you run a high-velocity SDR motion or VC scout program. Quarterly is too slow and silently degrades reply rate.