Quick answer: The IRS requires ~1.6 million US tax-exempt organisations to file Form 990 annually, and that data is public domain — but it arrives as deeply nested JSON from ProPublica's API, one envelope per organisation, with officer compensation buried in different fields depending on which form variant was filed. An IRS 990 scraper fans that envelope out into one flat, typed row per (EIN, tax year) filing, picks the correct compensation field for each form type, and computes a total-compensation-and-benefits figure you can load straight into Pandas or a warehouse. The Apify Actor below does that for $0.003 per row (~$3.05 per 1,000 rows), with the field-mapping, backoff handling, and Pydantic validation already wired in.
Nonprofit analysts, fundraising consultants, journalists, and academic researchers all want the same thing: a structured table of nonprofit officer compensation, cross-referenced with revenue and assets, across a multi-year window. The data is public, but the raw API output is built for display, not pipelines. Here is what the extraction involves and how I packaged it into one API call.
What is IRS Form 990? 📄
IRS Form 990 is the annual information return filed by US tax-exempt organisations — 501(c)(3) charities, 501(c)(4) social-welfare orgs, and private foundations. It discloses revenue, expenses, assets, and — most relevant here — aggregate officer and director compensation. Because 990s are filed under public-disclosure requirements, ProPublica has digitised and served them via Nonprofit Explorer since 2012, covering roughly 1.6 million organisations back to around 2010.
Three form variants exist, and they matter for any programmatic extract:
-
Form 990 — standard return for larger organisations. Officer comp is in
compnsatncurrofcr. - Form 990EZ — short form for smaller organisations. Same officer-comp field as 990.
-
Form 990PF — private foundation return. Officer comp lives in
compofficers, a different key entirely.
Write a single-field extractor without handling that split and you silently get null officer comp for every private foundation in your dataset.
Does the IRS have a bulk 990 data API? 🔎
No, not a structured one. The IRS publishes raw XML bulk downloads on AWS, but parsing them means navigating a deeply nested schema that shifts across filing years. ProPublica's Nonprofit Explorer API v2 is the practical structured surface — 60+ flat fields per filing for any EIN — but it wraps everything in an organization envelope with a filings_with_data array. There is no endpoint that hands you "officer comp for these 200 EINs as a flat table." That transformation is what this Actor does.
What the data looks like
Each row is one (EIN, tax year) filing — 21 fields total, Pydantic-validated before it hits the dataset. Here is a real output row:
{
"ein": "131684331",
"organization_name": "Ford Foundation",
"organization_url": "https://projects.propublica.org/nonprofits/organizations/131684331",
"state": "NY",
"city": "New York",
"ntee_code": "T20",
"subsection_code": 4,
"tax_year": 2022,
"tax_period_end": "2022-12",
"form_type": "990PF",
"pdf_url": "https://s3.amazonaws.com/irs-form-990/202323199349300122_public.xml",
"officer_comp_usd": 11240000,
"other_salaries_wages_usd": null,
"payroll_taxes_usd": 4120000,
"pension_contributions_usd": 2380000,
"other_employee_benefits_usd": 1950000,
"total_comp_and_benefits_usd": 19690000,
"total_revenue_usd": 1420000000,
"total_functional_expenses_usd": 680000000,
"total_assets_end_usd": 16800000000,
"scraped_at": "2026-05-28T14:22:07+00:00"
}
Notice other_salaries_wages_usd is null on this 990PF row — the private-foundation form does not report that line separately, so the Actor sets it null rather than zero. total_comp_and_benefits_usd is the sum of the five comp/benefit components, treating nulls as zero but returning null when all five are null, so incomplete filings never report a false zero.
The naive approach (and why it falls apart) 🔧
The obvious path: call GET .../organizations/{ein}.json, pull filings_with_data, read off the compensation fields. Three things break at scale.
1. The formtype integer is underdocumented. ProPublica returns formtype as 0, 1, or 2 (0 → 990, 1 → 990EZ, 2 → 990PF). An extractor that hardcodes if formtype == 2: use compofficers silently drops any future variant. We map the integer to a validated Literal["990", "990EZ", "990PF"] at parse time, and log a warning then drop the row for any unknown integer, so new filings can't corrupt your dataset.
2. Officer comp lives in a different key for private foundations. A 990PF does not file Part IX, so compnsatncurrofcr doesn't exist on those rows. Reuse one field name across all three form types and you get null officer comp for every foundation. We branch on form_type: 990 and 990EZ read compnsatncurrofcr; 990PF reads compofficers.
3. Search-mode pagination is quiet about its limits. search.json paginates at 25 results per page and hard-stops at the API's ceiling — 400 pages, 10,000 results. We surface that cap in the status message rather than handing you a silently truncated list.
Beyond correctness: we rotate a Chrome 131 TLS fingerprint via curl-cffi so the handshake reads as a browser, not Python. We retry on 429 and 503 with exponential backoff starting at 2 seconds, doubling, capped at 30, up to 5 attempts per page, honouring Retry-After when present. Apify residential proxies are opt-in for when rate-limit pressure rises. And we fail loud on empty datasets — non-zero exit, clear status message — instead of handing you a green run with zero rows. No data, no charge.
The Actor
The Actor is on the Apify Store: apify.com/DevilScrapes/irs-990-officer-comp. Run it from the Console with a point-and-click form, or drive it programmatically:
from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
# Mode 1: explicit EIN list
run = client.actor("DevilScrapes/irs-990-officer-comp").call(
run_input={
"eins": ["131684331", "133871360", "530196605"],
"startYear": 2021,
"endYear": 2023,
}
)
# Mode 2: name search, state-scoped, capped at 50 orgs
run = client.actor("DevilScrapes/irs-990-officer-comp").call(
run_input={
"searchQuery": "community foundation",
"stateFilter": "CA",
"maxOrgs": 50,
"startYear": 2022,
"endYear": 2023,
}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)
Input is one of two mutually exclusive modes — eins (explicit list, hyphens optional) or searchQuery (name search across ProPublica's index). A Pydantic model_validator enforces the XOR before any network call, so passing both yields a clear validation error rather than a confusing empty dataset. The stateFilter (two-letter USPS code) narrows search mode to one state and is silently ignored in EIN mode; startYear and endYear default to the last three calendar years.
What you would actually use this for 💡
Nonprofit executive-pay benchmarking. Pull officer compensation for 100-1,000 peer organisations to benchmark CEO or CFO pay against revenue and assets. The computed total_comp_and_benefits_usd — officer comp plus non-officer salaries, payroll tax, pension, and other benefits — lets you compare total employment cost, not just the headline officer number.
Foundation due diligence. A foundation's 990PF discloses officer compensation and total assets across multiple years. Pull three years of filings in one run and read the financial trajectory before investing weeks in a grant application.
Investigative journalism. Filter for organisations where officer_comp_usd / total_revenue_usd exceeds a threshold, or where officer comp jumped year-over-year while revenue fell. The pdf_url field links straight to the source IRS filing for every flagged row.
Academic panel datasets. Multi-year 990 data is the standard empirical base for nonprofit-sector economics research. This Actor delivers a structured panel — one row per (EIN, tax year) — that drops into pd.read_json() or SQLite with no field-mapping preprocessing.
State-level sector reports. Combine stateFilter with a broad searchQuery such as "hospital" or "university" and cap maxOrgs to enumerate every organisation of a type in a state.
Pricing — exact numbers 💰
Pay-per-event. You pay for rows that land in the dataset; you do not pay for EINs that return no filings in your year window.
| Event | Price |
|---|---|
actor-start |
$0.05 (once per run) |
result-row |
$0.003 per row |
| Run size | Estimated cost |
|---|---|
| 10 rows (single EIN, 3-year window, ~3 filings/year) | $0.08 |
| 100 rows | $0.35 |
| 300 rows (100 EINs × 3 filings avg) | $0.95 |
| 1,000 rows | $3.05 |
| 10,000 rows | $30.05 |
A typical 100-EIN benchmarking run averaging three filings each returns 300 rows and costs $0.95 total. Apify's $5 free trial credit (no credit card required) covers your first ~1,650 rows.
The technically interesting part
The detail other scrapers skip is the formtype-correct compensation field. ProPublica returns formtype as a bare integer — 0, 1, or 2 — and the mapping to 990 / 990EZ / 990PF is documented only in their API changelog, not the response itself. The officer-comp field name changes across that split: compnsatncurrofcr for 990/990EZ, compofficers for 990PF.
This matters specifically for private foundations, which file 990PF exclusively. A query for Ford Foundation (EIN 131684331) returns formtype=2; a field reader that always reads compnsatncurrofcr gets null. We verified the split against all three variants using live API responses on 2026-05-16 — Red Cross (990), ACLU (990), Ford Foundation (990PF). That one branch is what makes the output trustworthy for foundation researchers.
Limitations 🚧
- No per-officer breakdown. The structured API exposes only aggregate officer compensation (Part IX line 5 / 990PF Part I line 15). Per-officer name, title, hours, and individual comp from Part VII Section A live only in the 990 PDF and are out of scope for v1.
- No Schedule J detail. Compensation from related organisations (Schedule J Part II) is PDF-only, same reason.
- Coverage starts around 2010. ProPublica's structured coverage doesn't extend reliably earlier; older filings may exist as PDFs but won't produce rows here.
- Annual cadence, 12-18 month lag. Expect a given tax year to appear in ProPublica's index 12-18 months after the period ends.
-
Search mode caps at 10,000 EINs (400 pages × 25 results). For larger queries, narrow with
stateFilteror use expliciteinsmode. - Historical benchmarking data, not a real-time feed. If you need current executive compensation, 990 is the wrong source regardless of tool.
FAQ ❓
Is scraping IRS 990 data legal?
IRS Form 990 returns are public-domain records the IRS itself requires to be publicly disclosed, and ProPublica's Nonprofit Explorer API serves them as a free public service. This Actor reads only what the public API exposes, collects no personal data beyond what organisations disclosed to the IRS, and bypasses no authentication. As always, review your jurisdiction and use case — but 990 data has been the backbone of journalism and nonprofit research for decades precisely because it is designed to be public.
Can I export the results to a spreadsheet or data warehouse?
Yes. The Apify Console exports CSV, Excel, JSON, or XML from any run. You can webhook the dataset on ACTOR.RUN.SUCCEEDED into Make, Zapier, or n8n, or pull it via the Apify Dataset API. Every row is Pydantic-validated, so schema-on-read stays consistent.
Does ProPublica have an official API I could use instead?
Yes — ProPublica's Nonprofit Explorer API v2 is free and key-free. What it doesn't provide is the cross-form field mapping, year-window filtering, computed total-compensation field, or one-row-per-filing-year shape. For raw filings on one EIN, use ProPublica directly. For a structured panel across 50+ organisations with formtype-correct extraction, this Actor is the layer on top.
Why is other_salaries_wages_usd null for private foundations?
Form 990PF does not report non-officer salaries as a separate line the way Form 990 does. When the field is absent, the Actor returns null rather than zero, and total_comp_and_benefits_usd is still summed from the components that are present.
Try it
The Actor is live on the Apify Store: apify.com/DevilScrapes/irs-990-officer-comp.
Free $5 trial credit, no credit card. Run it on ["131684331", "133871360", "530196605"] (Ford Foundation, ACLU, American Red Cross) with the default window and you will have three organisations' filing history as clean, typed rows in under a minute. Need a field the structured ProPublica API exposes but this Actor doesn't surface yet? Drop it in the comments — per-officer PDF extraction is on the v2 roadmap.
Built by Devil Scrapes — Apify Actors with attitude. Pay-per-event, transparent pricing, no junk fields. 😈
Top comments (0)