You find a paper you need. It costs $35. You check another journal. $49.
But here's the thing: most paywalled papers have a free, legal version somewhere. Author's personal website, university repository, preprint server.
The problem is finding it. That's what Unpaywall does.
One API Call Per Paper
import requests
doi = "10.1038/nature12373" # A Nature paper
resp = requests.get(f"https://api.unpaywall.org/v2/{doi}?email=you@email.com")
data = resp.json()
if data["is_oa"]:
print(f"FREE: {data['best_oa_location']['url_for_pdf']}")
else:
print("No free version found")
That's it. One request, one answer.
How Does Unpaywall Work?
Unpaywall checks:
- Publisher websites — some papers become free after embargo
- Preprint servers — arXiv, bioRxiv, medRxiv
- Institutional repositories — university hosting
- Author pages — self-archived copies
- Government mandates — NIH-funded papers must be free
All of these are legal. Unpaywall doesn't do anything shady — it just knows where to look.
Batch Check Your Reading List
Got 50 DOIs from a Crossref search? Check them all:
import time
dois = [
"10.1038/nature12373",
"10.1126/science.1252229",
"10.1016/j.cell.2014.11.021",
"10.1073/pnas.1318679111",
"10.1038/nbt.3122",
]
for doi in dois:
resp = requests.get(f"https://api.unpaywall.org/v2/{doi}?email=you@email.com")
data = resp.json()
status = "FREE" if data["is_oa"] else "PAID"
title = data.get("title", "")[:50]
print(f"[{status}] {title}")
if data["is_oa"]:
pdf = data["best_oa_location"].get("url_for_pdf", "no direct PDF")
print(f" → {pdf}")
time.sleep(1) # be polite
Sample output:
[FREE] RNA-guided human genome engineering via Cas9
→ https://europepmc.org/articles/pmc3969860?pdf=render
[FREE] Programmable editing of a target base in genomic DNA
→ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4873371
[PAID] Genome engineering using the CRISPR-Cas9 system
[FREE] Highly efficient Cas9-mediated transcriptional progr
→ https://www.biorxiv.org/content/10.1101/005967v1.full.pdf
3 out of 5 papers — free. That's $105 saved.
Combine With Crossref
Real workflow: search Crossref → check Unpaywall → download free PDFs:
# Step 1: Find papers via Crossref
results = requests.get("https://api.crossref.org/works", params={
"query": "CRISPR gene therapy", "rows": 10, "sort": "is-referenced-by-count"
}).json()["message"]["items"]
# Step 2: Check each DOI against Unpaywall
for item in results:
doi = item["DOI"]
title = item["title"][0][:50] if item.get("title") else "Untitled"
oa = requests.get(f"https://api.unpaywall.org/v2/{doi}?email=you@email.com").json()
status = "FREE" if oa.get("is_oa") else "PAID"
print(f"[{status}] {title}")
time.sleep(1)
The Numbers
According to Unpaywall's own data:
- ~30% of all papers have a free version somewhere
- For recent papers (2020+), it's closer to 50%
- NIH-funded papers: >90% are free (by mandate)
I Built a Toolkit
unpaywall-toolkit — Python wrapper for the Unpaywall API:
from unpaywall_toolkit import UnpaywallClient
client = UnpaywallClient(email="you@email.com")
# Single check
result = client.check("10.1038/nature12373")
# Batch check + export to CSV
client.export_csv(["10.1038/nature12373", "10.1126/science.1252229"], "results.csv")
Part of the Research API Suite
- Unpaywall Toolkit — find free PDFs (this post)
- Research Paper CLI — search 800M+ papers
- Crossref Toolkit — 150M+ DOIs
- OpenAlex Toolkit — 250M+ papers
- Semantic Scholar — AI summaries
All on GitHub. All free and open source.
How do you access paywalled papers? And do you use any tools to find free versions?
Top comments (0)