If you work in compliance at a bank, fintech, or MSB, you know the pain: every new business customer triggers a manual verification process. Someone has to check the Secretary of State, look up the entity on SAM.gov, verify the registered agent, and cross-reference against enforcement databases.
Most teams pay $15-40K/year for tools like Middesk, Enigma, or Dun & Bradstreet to automate this. But the underlying data is public. The tools are just wrappers around government databases with a nice UI.
Here's how to build the same pipeline yourself using free public data APIs.
The KYB verification checklist
A standard Know Your Business check covers:
- Entity existence -- Is this actually a registered business?
- Good standing -- Is the registration current or dissolved?
- Registered agent -- Who is legally responsible?
- Officer/director names -- Do they match what the customer provided?
- Federal registration -- Is the entity registered for government contracts (SAM.gov)?
- Enforcement history -- Any SEC actions, OSHA violations, or federal debarments?
Every one of these is available from public sources.
Data sources and how to access them
Secretary of State filings (entity verification)
Every state maintains a business entity database. The challenge is that each state has a different portal, different format, and different search interface.
For the five largest states:
- California -- bizfileonline.sos.ca.gov (requires API key or browser session)
- Texas -- mycpa.cpa.state.tx.us/coa (public API)
- Florida -- dos.state.fl.us/sunbiz (daily SFTP files)
- New York -- appext20.dos.ny.gov/corp_public (public search)
- Illinois -- ilsos.gov/corporatellc (public search)
If you don't want to build and maintain scrapers for each state, there are pre-built Secretary of State scraper APIs on Apify that handle the extraction and return structured JSON.
SAM.gov (federal entity registration)
SAM.gov is the System for Award Management -- every entity that does business with the federal government is registered here. It's also useful for KYB because registration requires a DUNS number, physical address, and officer information.
The SAM.gov API is free but requires registration. Pre-built options like the SAM.gov Federal Contracts Scraper can extract entity data without needing your own API key.
SEC EDGAR (public company filings)
For public companies or entities that have filed with the SEC, EDGAR provides officer names, filing history, and financial data. The SEC EDGAR Filing Search API returns structured filing data.
OSHA, FDA, and enforcement databases
Federal enforcement data helps flag businesses with compliance issues:
- OSHA violations -- workplace safety inspection results
- FDA warning letters -- regulatory actions against manufacturers
- Federal debarment -- entities barred from government contracts (via SAM.gov)
Building the pipeline
A typical automated KYB flow:
Customer submits business name + state
-> Query Secretary of State API for entity status
-> Query SAM.gov for federal registration
-> Query SEC EDGAR if public company
-> Query OSHA/FDA for enforcement history
-> Compile verification report
-> Flag discrepancies for manual review
With n8n or Make.com, you can wire these APIs together in an afternoon. Each Apify actor returns structured JSON, so parsing is straightforward.
Cost comparison
| Approach | Annual cost | Coverage |
|---|---|---|
| Middesk | $15,000-40,000 | 50 states + federal |
| Manual verification | $50-100/entity (analyst time) | Varies |
| DIY with public APIs | $50-200/month (API compute) | 50 states + federal |
The DIY approach costs 90%+ less. The tradeoff is integration work upfront and maintenance when government portals change -- which is why using pre-built scrapers that someone else maintains is the sweet spot.
Getting started
- Pick your highest-volume states (probably CA, TX, FL, NY, IL)
- Set up structured API calls for each state's Secretary of State data
- Add SAM.gov and SEC EDGAR for federal cross-referencing
- Build a simple report template that flags missing or mismatched data
- Route flagged entities to a human reviewer
The full suite of government data scrapers covers Secretary of State filings across multiple states, SEC EDGAR, OSHA, FDA, and more -- all returning structured JSON ready for your compliance pipeline.
Building compliance automation tools with public government data. More at apify.com/pink_comic.
Top comments (0)