How VC Analysts Use Public Data for Startup Due Diligence (Without Paying for a Bloomberg Terminal)
Due diligence in venture capital is an interesting data problem. You're evaluating companies that are mostly private, often pre-revenue, and frequently operating in markets where the standard signals — audited financials, public filings, established customer bases — don't exist yet.
But there's more public data about early-stage companies than most analysts realize. Legal entity registrations, patent filings, clinical trial registrations, SEC crowdfunding disclosures, founder employment histories — these are all public record. The gap is tooling that makes them queryable at the speed diligence actually moves.
This post covers the workflows I've seen work well for pre-investment research on startups, particularly in tech, biotech, and deep tech.
What You're Actually Trying to Answer
The questions in VC due diligence roughly cluster into:
- Does the team have the credentials they claim? (Founder background, patent authorship, prior employment)
- Is the technology real and defensible? (Patents filed, clinical trial registrations, regulatory submissions)
- Is the company properly formed and clean? (Legal entity status, cap table signals, SOS registrations)
- Who else is in this space and how far ahead are they? (Competitor patent analysis, public market comps, SEC filings for public players)
- What does the market structure look like? (Public company financials, contract awards, regulatory trends)
Public data addresses pieces of all five. It won't replace reference calls or technical diligence — but it frontloads the screening work so you spend reference call time on harder questions.
Workflow 1: Verifying the Legal Entity and Early Corporate History
Step 1: Confirm the Entity Is Properly Formed
This sounds basic but it catches problems. Check the startup's legal entity registration in its state of incorporation (usually Delaware) and its state of operation.
For California-based startups: california-business-leads
For Texas: texas-business-leads
For New York: new-york-ucc-lien-search (UCC liens show secured debt against company assets)
What to look for:
- Formation date vs. founding story — if the pitch deck says "founded 2021" but the entity was formed in 2023, ask why
- Entity type — most serious startups incorporate as Delaware C-corps for VC compatibility; an LLC or S-corp at Series A is unusual
- UCC filings — secured creditors filing UCC-1s against a company mean someone has a senior claim on assets; this shows up in the cap table conversation
Step 2: Check for Prior Entities Under Founder Names
Founders who've had previous ventures sometimes have entity registrations (or dissolutions) that don't appear on their LinkedIn. Running founder names through state SOS databases surfaces prior company affiliations, dissolution history, and sometimes litigation history.
This is slower to automate since you're searching by individual name rather than company name, but for lead partners doing final diligence on a term sheet, it's worth the 30 minutes.
Workflow 2: Patent and IP Landscape Analysis
Step 1: Pull the Company's Patent Portfolio
For any startup where IP is part of the moat, verifying what's actually filed (vs. what the deck claims) is table stakes.
uspto-patents-patentsview-search queries the USPTO PatentsView database, which covers granted US patents and published applications. Search by assignee (company name) to see their full patent portfolio — what's granted, what's pending, and when it was filed.
Key questions:
- Are patents assigned to the company or to the founders personally? (Personal assignment is a red flag pre-investment — founders can walk with the IP)
- When were they filed? Filed two weeks before the pitch is a different signal than filed two years ago
- What's the claims language? Broad, enforceable claims are very different from narrow ones
Step 2: Map the Competitive Patent Landscape
This is where it gets more useful. Run the same search against competitor names and you get a picture of who holds what IP in the space. If a potential investment is building in an area where a large incumbent has 400 patents and your startup has 3, that's a diligence conversation you want to have explicitly.
You can also search by inventor name — if the startup's technical founders are listed as inventors on prior patents (at a previous employer), understanding what was licensed vs. what was independently developed matters.
Workflow 3: Clinical Trial and Regulatory Status (Biotech/MedTech)
For life sciences investments, clinical trial registration data is one of the highest-signal public datasets available.
Step 1: Pull the Trial Registry
clinicaltrials-gov-search queries ClinicalTrials.gov, the NIH registry of clinical studies. Every trial the FDA requires to be registered appears here — including phase, enrollment status, primary endpoints, and study results if available.
Search by sponsor name (the company) to pull all their registered trials. What to verify:
- Enrollment status — is the trial actually enrolling, or was it registered and then stalled?
- Primary endpoint — what they're measuring, and whether it's a regulatory endpoint the FDA will accept for approval
- Expected completion date vs. actual — delays are common; large delays without explanation warrant a question
- Results publications — if Phase 2 is complete, are results posted? Unpublished results from completed trials is a yellow flag
Step 2: Check FDA 510(k) Clearances for Medical Devices
For medtech investments claiming cleared devices: fda-510k-medical-device-clearances lets you query the 510(k) clearance database by company name or device classification.
If a device startup claims FDA clearance and their device isn't in the 510(k) database, that conversation needs to happen before term sheet. The most common explanation is that the device was cleared under a previous company name (post-acquisition or rebranding) — but you want to confirm.
Workflow 4: Incumbent and Market Structure Analysis
Step 1: Pull Public Competitor Filings
For any market with public company comparables, SEC EDGAR tells you more about market structure than most analyst reports.
sec-edgar-company-filings lets you pull 10-Ks, 10-Qs, and 8-Ks by company name or ticker. For early-stage diligence, the most useful sections of a public company 10-K are:
- Business section — how they describe the competitive landscape (useful for understanding what they're worried about)
- Risk factors — what the public incumbent considers its existential risks (often maps directly to where startups are attacking)
- Customer concentration disclosures — which customers matter and what the loss of one would mean
This is also where you find material weakness disclosures, regulatory risk language, and forward guidance — all relevant for understanding where an incumbent is vulnerable.
Step 2: Track Insider Transactions
For public competitors, sec-insider-trading-tracker surfaces Form 4 filings showing when executives buy or sell shares. Heavy executive selling at a competitor is sometimes a leading indicator of problems that haven't hit public disclosure yet. It's circumstantial, but it's a data point.
Step 3: Check Federal Contract Awards in the Sector
For enterprise, defense tech, or govtech plays, usaspending-federal-spending-search shows which companies are actually winning government contracts in the space. This tells you who the government is already paying and at what scale — useful for sizing the addressable market and understanding where your investment would need to compete.
What Public Data Can't Tell You
Being clear about limits is part of doing this well:
- Private company financials — if the startup isn't publicly traded or hasn't filed with the SEC (Reg CF, Reg A), you're not getting audited financials from public records. You'll need to rely on what the company provides.
- Customer references — no database tells you whether the product actually works or whether customers would re-buy. Reference calls remain irreplaceable.
- Team dynamics — whether the cofounders will still be speaking in 18 months doesn't show up in any filing.
What public data gives you is the ability to verify what you're told, surface what you weren't told, and ask better questions when you do speak to the team. In a world where founders have become very good at pitching, the ability to walk into a reference call already knowing the IP ownership structure, the trial status, and the SEC filings of their largest competitor is a real edge.
Practical Setup
If you're doing this regularly, set up a pipeline that runs the SOS check, patent search, and clinical trial query in parallel as soon as a new deal enters the pipeline. The whole thing can run in under five minutes and flags the issues worth investigating before you've burned analyst time on a deeper review.
All the actors referenced here are queryable via Apify's API, so they integrate directly into a deal tracking workflow — CRM webhook triggers a pipeline run, outputs land in a Google Sheet or Notion database, associate reviews flags before the first partner meeting.
Actors mentioned in this post are available on Apify under pink_comic. All query public government data sources.
Top comments (0)