If you've priced Crunchbase or PitchBook lately, you know company and funding data is mostly locked behind expensive seats. But a surprising amount of it is public and free — you just have to pull it from the primary sources instead of a reseller. This guide shows how to assemble a company profile (legal identity + real funding rounds + financials) from the SEC and the global LEI system, with no API key and no scraping of gated sites.
1. Funding rounds → SEC Form D
When a private US company raises a round under Regulation D (most venture and private placements), it files a Form D with the SEC. It's a public filing, and it contains the parts you actually want: total offering amount, amount sold, industry, and date.
Find a company's Form D filings via EDGAR full-text search, then pull the offering amounts from the filing's XML:
# 1) find Form D filings for an issuer
curl -s 'https://efts.sec.gov/LATEST/search-index?q=%22Databricks%22&forms=D' \
-H 'User-Agent: yourname you@example.com'
# 2) each hit has a CIK + accession; fetch the Form D primary_doc.xml
curl -s 'https://www.sec.gov/Archives/edgar/data/<CIK>/<ACCESSION_NODASHES>/primary_doc.xml' \
-H 'User-Agent: yourname you@example.com'
The XML carries <totalOfferingAmount>, <totalAmountSold>, and <industryGroupType>. A recent Databricks Form D, for example, discloses an offering north of $1B — real funding data, filed by the company, free to read.
Two honest caveats (these matter, and most scrapers get them wrong):
-
Match the issuer name strictly. Full-text search returns anything mentioning the term.
Stripe Milton LLCis not Stripe; a fund named after a startup is not the startup. Normalize and require an exact legal-name match, and exclude SPV/fund vehicles. - Coverage is partial. Plenty of famous startups raise through structures that never file a Form D under their own name. Form D is excellent where it exists — don't treat its absence as "no funding."
Always send a declared User-Agent with contact info; that's SEC's fair-access requirement.
2. Legal identity → GLEIF (the LEI system)
The Global Legal Entity Identifier Foundation publishes legal-entity data as fully open data — legal name, HQ country, registered address, status. Resolve a brand name to an entity:
# fuzzy-match a name to candidate LEIs
curl -s 'https://api.gleif.org/api/v1/fuzzycompletions?field=entity.legalName&q=Anthropic'
# then fetch the full record
curl -s 'https://api.gleif.org/api/v1/lei-records/<LEI>'
Caveat: common brand names collide (there can be several "Stripe" entities in different countries), so confirm the country/jurisdiction before trusting a match — ideally cross-check against a US SEC filing.
3. Industry + financials → SEC EDGAR
For any SEC-registered company (public companies and Form D filers), the submissions API gives you SIC industry, state of incorporation, business address, former names, and tickers:
curl -s 'https://data.sec.gov/submissions/CIK0000320193.json' -H 'User-Agent: yourname you@example.com'
For public companies, XBRL company-facts give real reported numbers (Apple's latest annual revenue comes back as ~$416B):
curl -s 'https://data.sec.gov/api/xbrl/companyconcept/CIK0000320193/us-gaap/Revenues.json' \
-H 'User-Agent: yourname you@example.com'
Putting it together
Stitch those three and you have a legitimate, official-source company profile — identity, funding signals, and financials — without paying for Crunchbase and without scraping anything gated. The hard part is the glue: entity resolution, strict issuer matching, XML parsing, and rate-limit-friendly EDGAR access.
If you'd rather skip the plumbing, the Company Data Aggregator actor does exactly this in one call — give it a company name or domain and it returns GLEIF legal identity, SEC Form D funding signals, EDGAR industry/financials, and a domain/tech profile. No API key.
Useful neighbors if you're building company intelligence:
- SEC Form 8-K Material Events Scraper — every material corporate event
- 13F Holdings Delta Tracker — institutional position changes
- Business Registration Lookup — global registry data
All of this is informational, official-source data — not investment advice. Build responsibly, declare your User-Agent, and respect the SEC's rate limits.
Top comments (0)