Crunchbase is the go-to platform for startup funding data — investor profiles, funding rounds, valuations, acquisitions, and company financials. If you're in VC, sales intelligence, or market research, you've probably tried to pull data from it at scale.
The problem? Crunchbase's API starts at $29/month with strict rate limits, and their Enterprise tier costs thousands. Meanwhile, their anti-scraping measures have gotten aggressive — aggressive enough that a simple requests + BeautifulSoup script won't cut it anymore.
In this guide, I'll show you what works in 2026 for extracting Crunchbase data reliably, including the tools and techniques that handle their current protections.
What Data Can You Extract from Crunchbase?
Crunchbase holds structured data on 1M+ companies. Here's what's extractable:
- Company profiles — name, description, founded date, HQ location, employee count, website
- Funding rounds — round type (Seed, Series A–F, IPO), amount raised, date, lead investors
- Investor profiles — firm name, portfolio size, investment focus, partners, total investments
- Acquisitions — acquirer, target, price, date, terms
- People — founders, C-suite, board members, career history
- Financial metrics — revenue range, valuation (when disclosed), IPO data
Why Traditional Scraping Fails on Crunchbase
Crunchbase uses several layers of protection:
- JavaScript rendering — Most data loads dynamically via React. Static HTTP requests return empty shells.
- Fingerprinting — They detect headless browsers via WebDriver flags, navigator properties, and canvas fingerprinting.
- IP-based rate limiting — Datacenter IPs get blocked after a handful of requests. Residential proxies are almost mandatory.
- Login walls — Deep profile pages and full funding histories require authentication.
The good news: Crunchbase serves structured JSON in their page data (embedded __NEXT_DATA__ payloads and API responses). If you can get past the bot detection, the data is well-structured.
The Easy Way: Use a Pre-Built Crunchbase Scraper
Rather than maintaining your own scraping infrastructure, the Crunchbase Scraper on Apify handles all of the above — browser rendering, proxy rotation, fingerprint evasion, and data extraction — in a single managed actor.
What it extracts:
- Company profiles with full metadata
- Funding rounds with investor details
- Investor/VC firm portfolios
- Search results by keyword, location, category, or funding stage
- People profiles and career data
Key features:
- Residential proxy rotation (built-in via Apify)
- Handles JavaScript rendering automatically
- Structured JSON output ready for analysis
- ~50 profiles per minute
- Usage-based pricing (~$0.01–0.03 per 100 results)
Quick Start with Python
Here's how to scrape Crunchbase investor data using the Apify Python client:
from apify_client import ApifyClient
client = ApifyClient("your_apify_token")
run_input = {
"startUrls": [
"https://www.crunchbase.com/lists/investors-in-ai-startups",
"https://www.crunchbase.com/search/funding_rounds/field/organizations/funding_total/artificial-intelligence"
],
"maxItems": 200,
"proxy": {"useApifyProxy": True, "apifyProxyGroups": ["RESIDENTIAL"]}
}
run = client.actor("cryptosignals/crunchbase-scraper").call(
run_input=run_input
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{item.get('name')} — ${item.get('funding_total', 'N/A')} raised")
Example: Building an Investor Database
Let's say you want to build a database of active Series A investors in fintech. Here's a practical workflow:
import json
from apify_client import ApifyClient
client = ApifyClient("your_apify_token")
# Step 1: Search for fintech companies with Series A funding
run_input = {
"startUrls": [
"https://www.crunchbase.com/search/funding_rounds/field/organizations/last_funding_type/fintech?roundType=series_a"
],
"maxItems": 500,
"proxy": {"useApifyProxy": True, "apifyProxyGroups": ["RESIDENTIAL"]}
}
run = client.actor("cryptosignals/crunchbase-scraper").call(
run_input=run_input
)
# Step 2: Extract and deduplicate investors
investors = {}
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
for round_data in item.get("funding_rounds", []):
for investor in round_data.get("investors", []):
name = investor.get("name")
if name and name not in investors:
investors[name] = {
"name": name,
"type": investor.get("type"),
"investments_count": 1,
"example_portfolio": [item.get("name")]
}
elif name:
investors[name]["investments_count"] += 1
investors[name]["example_portfolio"].append(item.get("name"))
# Step 3: Sort by activity and export
top_investors = sorted(
investors.values(),
key=lambda x: x["investments_count"],
reverse=True
)[:50]
with open("fintech_series_a_investors.json", "w") as f:
json.dump(top_investors, f, indent=2)
print(f"Found {len(top_investors)} active Series A fintech investors")
Use Cases for Crunchbase Data
- Sales prospecting — Target recently funded companies (they have budget and are hiring)
- Competitive intelligence — Track who's funding your competitors and at what valuation
- Market mapping — Build landscape maps of any vertical by funding stage, geography, and investor overlap
- LP research — Track which VCs are most active and what their hit rate looks like
- Trend analysis — Monitor funding flow into sectors like AI, climate tech, or biotech over time
Tips for Large-Scale Extraction
- Use residential proxies — Crunchbase actively blocks datacenter IPs. The Apify actor handles this, but if you're rolling your own, budget for residential proxy costs.
- Paginate with search URLs — Rather than crawling from a single page, use Crunchbase's search URLs with filters to parallelize extraction.
- Respect rate limits — Even with proxies, don't hammer the site. The managed actor throttles automatically.
- Cache aggressively — Company profiles don't change hourly. Scrape once, cache locally, refresh weekly.
Crunchbase API vs Scraping: When to Use Which
| Factor | Crunchbase API | Scraping via Apify |
|---|---|---|
| Cost | $29–$thousands/mo | ~$5 for 10K results |
| Rate limits | 200 req/min (Basic) | Managed by actor |
| Data depth | Limited on Basic tier | Full profile data |
| Real-time | Yes | Near real-time |
| Legal clarity | Clear TOS | Gray area — use responsibly |
For one-off research projects or startups on a budget, scraping is dramatically more cost-effective. For production pipelines where you need guaranteed uptime and legal clarity, consider the API for critical paths.
Try the Crunchbase Scraper on Apify
Ready to extract investor and funding data? The Crunchbase Scraper by cryptosignals is the fastest way to get started — no infrastructure to manage, no proxy setup, and structured JSON output you can pipe directly into your database or spreadsheet.
Building something cool with Crunchbase data? Drop a comment — I'd love to hear about your use case.
Top comments (0)