I was building a health-tech prototype last week. Needed medical research data. Expected paywalls everywhere.
Then I found PubMed's E-utilities API. 36 million biomedical papers. Free. No API key. No signup.
What Is PubMed?
PubMed is the U.S. National Library of Medicine's database — the world's largest collection of biomedical literature. It's run by NIH (National Institutes of Health), and they provide free programmatic access to everything.
If you work with health data, drug research, clinical trials, or biomedical NLP — this is your goldmine.
Your First API Call (Zero Setup)
curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=artificial+intelligence&retmode=json&retmax=3"
That returns 364,000+ results for "artificial intelligence" in biomedical literature.
Full Python Example: Search and Fetch Papers
import requests
# Step 1: Search for papers
search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
search_params = {
"db": "pubmed",
"term": "machine learning cancer diagnosis",
"retmode": "json",
"retmax": 5,
"sort": "relevance"
}
search = requests.get(search_url, params=search_params).json()
ids = search["esearchresult"]["idlist"]
total = search["esearchresult"]["count"]
print(f"Found {total} papers. Fetching top {len(ids)}...")
# Step 2: Fetch paper details
fetch_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi"
fetch_params = {
"db": "pubmed",
"id": ",".join(ids),
"retmode": "json"
}
details = requests.get(fetch_url, params=fetch_params).json()["result"]
for pmid in ids:
paper = details[pmid]
title = paper["title"]
authors = ", ".join(a["name"] for a in paper.get("authors", [])[:3])
journal = paper.get("source", "Unknown")
date = paper.get("pubdate", "Unknown")
print(f"
[{pmid}] {title}")
print(f" Authors: {authors}")
print(f" Journal: {journal} | Date: {date}")
Output:
Found 45,231 papers. Fetching top 5...
[39187234] Machine Learning in Cancer Diagnosis: Current State and Future
Authors: Smith J, Chen L, Kumar R
Journal: Nature Reviews | Date: 2024 Mar
5 Things You Can Build
1. Drug Research Tracker
# Track publications about a specific drug
drugs = ["ozempic", "metformin", "ivermectin"]
for drug in drugs:
r = requests.get(search_url, params="db": "pubmed")
count = r.json()["esearchresult"]["count"]
print(f"{drug}: {count} papers")
2. Clinical Trial Monitor
# Find recent clinical trials
params = {
"db": "pubmed",
"term": "clinical trial[pt] AND 2024[dp] AND diabetes",
"retmode": "json",
"retmax": 10
}
trials = requests.get(search_url, params=params).json()
print(f"Diabetes clinical trials in 2024: {trials["esearchresult"]["count"]}")
3. Author Publication Tracker
# Find all papers by a specific author
params = {
"db": "pubmed",
"term": "Hinton GE[author]",
"retmode": "json"
}
result = requests.get(search_url, params=params).json()
print(f"Geoffrey Hinton: {result["esearchresult"]["count"]} papers in PubMed")
4. Research Trend Analyzer
# Publications per year for a topic
for year in range(2019, 2026):
params = {
"db": "pubmed",
"term": f"large language model AND year}[dp]"
count = requests.get(search_url, params=params).json()["esearchresult"]["count"]
print(f"{year}: {count} LLM papers")
5. Abstract Downloader for NLP Training
import csv
# Get abstracts for NLP/ML training
fetch_abstract_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
params = "db": "pubmed"
abstracts = requests.get(fetch_abstract_url, params=params).text
print(abstracts[:500])
PubMed vs Other Medical APIs
| Feature | PubMed (E-utilities) | Semantic Scholar | Google Scholar | Scopus |
|---|---|---|---|---|
| API Key | ❌ Not required | ✅ Required | No API | ✅ Required |
| Papers | 36M+ biomedical | 200M+ all fields | Unknown | 84M+ |
| Free | ✅ Completely | ✅ Basic tier | N/A | ❌ Paid |
| Abstracts | ✅ Full text | ✅ Yes | ❌ No | ✅ Yes |
| MeSH Terms | ✅ Yes | ❌ No | ❌ No | ❌ No |
| Clinical Trials | ✅ Filterable | ❌ No | ❌ No | Limited |
Pro Tips
-
Use MeSH terms for precise medical queries:
"diabetes mellitus"[MeSH]is more accurate than justdiabetes -
Add your email as
&email=you@example.com— NCBI recommends it for tracking - Rate limit: 3 requests/second without API key, 10/sec with a free key from NCBI
- Get a free API key at NCBI — optional but recommended
-
Use
retstartfor pagination:&retstart=10&retmax=10for page 2
Combine with OpenAlex
PubMed is biomedical-only. For broader academic search (CS, physics, social science), check out my OpenAlex tutorial — 250M+ papers, also free.
What medical data would you extract from 36M papers? Share your use case in the comments.
More free API tutorials: My API series on Dev.to
Need custom data extraction? 77 scrapers on Apify | Contact
Top comments (0)