The US has 9+ million licensed healthcare providers registered in the NPPES NPI Registry. Most healthcare data companies charge $500-5,000/month to access this public dataset. The data is free — here's how to use it.
What data is available
NPPES (National Plan and Provider Enumeration System) is a public CMS database containing every licensed healthcare provider in the US:
- Provider name, NPI number, credentials
- Practice addresses (primary + secondary)
- Specialties and taxonomy codes
- Phone/fax numbers
- License numbers by state
- Organization/group affiliations
Method 1: NPPES Free API
CMS provides a free REST API — no authentication required:
import requests
def search_nppes(params: dict) -> list:
url = "https://npiregistry.cms.hhs.gov/api/"
defaults = {"version": "2.1", "limit": 200, "skip": 0}
response = requests.get(url, params={**defaults, **params}, timeout=30)
if response.status_code == 200:
return response.json().get("results", [])
return []
# Search family medicine providers in San Francisco
providers = search_nppes({
"taxonomy_description": "Family Medicine",
"state": "CA",
"city": "San Francisco",
})
for p in providers[:3]:
basic = p.get("basic", {})
addr = p.get("addresses", [{}])[0]
print(f"{basic.get('first_name')} {basic.get('last_name')}, {basic.get('credential')}")
print(f" NPI: {p.get('number')}")
print(f" Phone: {addr.get('telephone_number')}")
print(f" Address: {addr.get('address_1')}, {addr.get('city')}, {addr.get('state')}")
Paginate through all results:
def get_all_providers(search_params: dict) -> list:
all_results = []
skip = 0
while True:
batch = search_nppes({**search_params, "skip": skip})
if not batch:
break
all_results.extend(batch)
skip += 200
if len(batch) < 200:
break
return all_results
# Get all cardiologists in Texas (may return 5,000+ results)
cardiologists = get_all_providers({
"taxonomy_description": "Cardiovascular Disease",
"state": "TX"
})
print(f"Found {len(cardiologists):,} cardiologists in Texas")
Method 2: Bulk NPPES monthly file
For nationwide analysis, download the full file (~1.3GB CSV, updated monthly):
import pandas as pd
# Download from: https://download.cms.gov/nppes/NPI_Files.html
df = pd.read_csv("npidata_pfile.csv", dtype=str, low_memory=False)
# Active individual providers only
active = df[
(df["Entity Type Code"] == "1") &
df["NPI Deactivation Date"].isna()
]
# Filter by taxonomy code (Family Medicine = 207Q*)
family_med = active[
active["Healthcare Provider Taxonomy Code_1"].str.startswith("207Q", na=False)
]
print(f"Active family medicine providers: {len(family_med):,}")
Method 3: Hospital directory scraping
Hospital websites publish provider directories but don't export them. Use Playwright:
from playwright.async_api import async_playwright
import asyncio
async def scrape_hospital_providers(hospital_url: str) -> list:
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto(f"{hospital_url}/find-a-doctor")
await page.wait_for_selector(".provider-card", timeout=10000)
providers = await page.evaluate(
"Array.from(document.querySelectorAll('.provider-card'))"
".map(el => ({"
" name: el.querySelector('.provider-name')?.innerText,"
" specialty: el.querySelector('.specialty')?.innerText,"
" accepting: el.querySelector('.accepting-patients')?.innerText"
"}))"
)
await browser.close()
return providers
providers = asyncio.run(scrape_hospital_providers("https://hospital.example.com"))
Method 4: Pre-built healthcare scraper
The Healthcare Provider Scraper on Apify handles NPPES pagination, hospital directory scraping, and data normalization automatically.
Sample output:
{
"npi": "1234567890",
"name": "Dr. Jane Smith, MD",
"specialty": "Family Medicine",
"address": "123 Main St, San Francisco, CA 94102",
"phone": "415-555-0100",
"acceptingNewPatients": true,
"insuranceAccepted": ["Blue Cross", "Aetna", "Medicare"],
"languages": ["English", "Spanish"]
}
74+ production runs. Pay-per-result pricing.
Use cases
- Healthcare staffing: Find providers by specialty + accepting status + location
- Insurance network analysis: Map which providers accept which plans
- Market research: Provider density by specialty + region
- Referral network mapping: Build provider relationship graphs from NPI affiliation data
- Sales prospecting: B2B outreach to medical practices by specialty
Important notes
NPPES API limits: 200 results per request. No formal rate limit, but add 500ms delays for bulk queries.
Hospital websites: extremely variable structure. Test on each target before building production scrapers.
For HIPAA-sensitive use cases: NPPES data is public — it's the providers' professional registration data, not patient data.
n8n AI Automation Pack ($39) — 5 production-ready workflows
Pre-built and maintained
Apify Scrapers Bundle — $29 one-time
35+ scrapers including the Healthcare Provider Scraper. Instant download.
Top comments (0)