DEV Community

Ava Torres
Ava Torres

Posted on

A Data Engineer's Guide to Tracing Corporate Ownership and Political Money

A Data Engineer's Guide to Tracing Corporate Ownership and Political Money (With Public Records APIs)

I've done a fair amount of contract work for journalists and researchers over the years, helping them turn document dumps and public databases into structured, queryable datasets. The work is genuinely interesting — and it's made me appreciate how much signal is sitting in government databases that most people never touch programmatically.

This post is for investigative reporters, data journalists, and researchers who are comfortable with basic scripting and want to build repeatable workflows for the kinds of questions that come up constantly: Who owns this company? Who's funding this politician? What lobbyists are pushing this legislation? Which contractors are getting paid for what?

The good news: almost all of this data is public and accessible via API or scraping. The bad news: the government's idea of "accessible" and a journalist's idea of "accessible" are pretty different. Here's how to bridge that gap.


The Data Sources That Actually Matter

Before getting into workflows, here's a map of what's available and what it's good for:

Source What It Contains Best For
SEC EDGAR Corporate filings: 10-Ks, proxies, 13F holdings, 8-Ks Corporate ownership, financials, related-party transactions
FEC Campaign contributions, PAC filings, independent expenditures Political money, donor networks
Lobbying Disclosure Act (LDA) Lobbyist registrations, quarterly activity reports Influence mapping, issue tracking
USASpending.gov Federal contracts and grants Contractor spending, agency priorities
Secretary of State records Business registrations Shell company tracing, beneficial ownership clues
RECAP/PACER Federal court dockets Litigation history, sealed filings, corporate disputes

The challenge isn't that this data is hidden — it's that each database has its own interface, its own quirks, and its own way of representing the same underlying entities. Connecting them is the work.


Workflow 1: Tracing Corporate Ownership Through SEC Filings

When you need to understand who owns what, SEC EDGAR is the most reliable public source for US public companies.

Step 1: Pull the 13F Holdings

Investment managers with more than $100M in assets under management must file 13F reports quarterly disclosing their equity holdings. This is how you trace which funds own significant stakes in which companies.

sec-edgar-company-filings lets you query EDGAR by company name, CIK number, or filing type. Search for a company's CIK, then pull their latest proxy statement (DEF 14A) to see major shareholders, executive compensation, and related-party transactions.

Step 2: Follow the Proxy Disclosures

Proxy statements are where the real ownership disclosures live. They name shareholders with more than 5% of outstanding shares (required by SEC rules), the compensation arrangements for named executives, and any transactions between the company and parties related to its management.

If you're investigating a private equity-backed company, the proxy won't exist — but an 8-K filed at the time of acquisition often names the acquirer and deal structure.

Step 3: Cross-Reference with State Business Registrations

Public company filings show the structure at the top. But subsidiaries, operating entities, and holding companies are often registered at the state level and don't appear in EDGAR.

Use state SOS data to trace the full entity tree:

Search for the holding company name across multiple states. LLCs created to hold real assets are often formed in Delaware, Nevada, or Wyoming for liability and disclosure reasons — but they may have registered foreign qualifications in the states where they operate.


Workflow 2: Mapping Political Donations and Influence Networks

Step 1: Start With FEC Filings

The FEC database contains every federal campaign contribution above $200, every PAC filing, and every independent expenditure. It's comprehensive, it's public, and it's deeply underused.

fec-campaign-finance-search lets you query contributions by donor name, employer, ZIP code, committee, or candidate. This is how you find the full picture of where an individual's or organization's political money is going — across multiple cycles, across multiple candidates, across direct contributions and PAC money.

Start with a company name in the employer field. You'll see which employees are giving to which candidates. Combine that with executive names to trace individual giving patterns.

Step 2: Follow the Lobbying Money

Contributions are only part of the influence picture. Lobbying is often the more direct lever.

The Lobbying Disclosure Act requires lobbyists to register and file quarterly activity reports disclosing their clients, the issues they're lobbying on, and estimated spending. lobbying-disclosure-search makes this queryable.

Search by company name to see what issues they're paying lobbyists to push. Search by issue keyword to see who's lobbying on a particular bill or regulatory matter. The combination of FEC contributions and LDA lobbying disclosures gives you a fairly complete picture of a company's political engagement.

Step 3: Check Federal Contractor Spending

If the company also receives federal contracts, usaspending-federal-spending-search connects the loop. You can see who's paying lobbyists to influence legislation while simultaneously receiving government contracts — a pattern worth examining in a lot of industries.

USASpending data includes contract award amounts, agency, recipient, and place of performance. It's not perfect (there are known data quality issues with some agency reporting), but it's the most complete public picture of federal contractor spending.


Workflow 3: Tracking Legislation and Regulatory Actions

Step 1: Follow Bills Through Congress

congress-gov-legislation-tracker lets you search bills by keyword, sponsor, committee, or status. For a beat reporter covering a specific industry or issue, setting up a keyword alert on new bills introduced in a relevant committee is basic infrastructure.

Step 2: Watch the Federal Register

Regulations often have more real-world impact than legislation — and they move through a more predictable process. federal-register-search lets you search proposed and final rules by agency, topic, or date.

The notice-and-comment period is where lobbying money gets translated into regulatory language. Searching the Federal Register for rules that affect your coverage area, then cross-referencing who submitted comments (via regulations.gov) and who was lobbying on the same issues (via LDA), is a productive way to trace influence.

Step 3: Pull Court Dockets for Enforcement History

Federal enforcement actions show up as civil or criminal cases in PACER. recap-federal-court-dockets gives you access to RECAP's archive of PACER documents, which is the best free alternative to expensive PACER fees.

Search by party name to find all federal litigation involving a company or individual. This is how you find the SEC enforcement case that settled quietly, the DOJ investigation that never made the news, and the civil fraud suits that preceded the bankruptcy.


Connecting the Dots

The real value isn't any single data source — it's the network that emerges when you connect them. A lobbyist who represents a company also made personal donations to the committee chair overseeing that company's regulatory environment, who awarded a contract to a firm where the lobbyist is a senior partner.

None of that is in any single database. All of it is in public records. The work is joining them.

A few practical notes:

  • Entity name matching is the hard part. "Goldman Sachs Group Inc." and "Goldman Sachs & Co. LLC" are related but distinct legal entities. Budget time for fuzzy matching and manual disambiguation.
  • FEC employer field is self-reported. "Goldman Sachs," "Goldman," "GS," and "Goldman Sachs Bank" all appear in FEC data for the same institution. Query broadly, normalize in post-processing.
  • LDA filings lag by a quarter. You won't see Q4 lobbying activity until February. Factor that into your publication timeline.
  • EDGAR filings are machine-readable but inconsistently formatted. XBRL data is cleaner than raw 10-K text; use the structured filing types where possible.

The tools are there. The data is public. Building the workflow is the journalist's actual edge.


Actors mentioned in this post run on Apify. Each one queries the underlying government API or database and returns structured JSON — no scraping gray areas, just public data made queryable.

Top comments (0)