How to Find Free PDFs of Research Papers (Legally) With One API Call

#python #research #tutorial #productivity

You find a paper you need. It costs $35. You check another journal. $49.

But here's the thing: most paywalled papers have a free, legal version somewhere. Author's personal website, university repository, preprint server.

The problem is finding it. That's what Unpaywall does.

One API Call Per Paper

import requests

doi = "10.1038/nature12373"  # A Nature paper
resp = requests.get(f"https://api.unpaywall.org/v2/{doi}?email=you@email.com")
data = resp.json()

if data["is_oa"]:
    print(f"FREE: {data['best_oa_location']['url_for_pdf']}")
else:
    print("No free version found")

That's it. One request, one answer.

How Does Unpaywall Work?

Unpaywall checks:

Publisher websites — some papers become free after embargo
Preprint servers — arXiv, bioRxiv, medRxiv
Institutional repositories — university hosting
Author pages — self-archived copies
Government mandates — NIH-funded papers must be free

All of these are legal. Unpaywall doesn't do anything shady — it just knows where to look.

Batch Check Your Reading List

Got 50 DOIs from a Crossref search? Check them all:

import time

dois = [
    "10.1038/nature12373",
    "10.1126/science.1252229",
    "10.1016/j.cell.2014.11.021",
    "10.1073/pnas.1318679111",
    "10.1038/nbt.3122",
]

for doi in dois:
    resp = requests.get(f"https://api.unpaywall.org/v2/{doi}?email=you@email.com")
    data = resp.json()
    status = "FREE" if data["is_oa"] else "PAID"
    title = data.get("title", "")[:50]
    print(f"[{status}] {title}")
    if data["is_oa"]:
        pdf = data["best_oa_location"].get("url_for_pdf", "no direct PDF")
        print(f"  → {pdf}")
    time.sleep(1)  # be polite

Sample output:

[FREE] RNA-guided human genome engineering via Cas9
  → https://europepmc.org/articles/pmc3969860?pdf=render
[FREE] Programmable editing of a target base in genomic DNA
  → https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4873371
[PAID] Genome engineering using the CRISPR-Cas9 system
[FREE] Highly efficient Cas9-mediated transcriptional progr
  → https://www.biorxiv.org/content/10.1101/005967v1.full.pdf

3 out of 5 papers — free. That's $105 saved.

Combine With Crossref

Real workflow: search Crossref → check Unpaywall → download free PDFs:

# Step 1: Find papers via Crossref
results = requests.get("https://api.crossref.org/works", params={
    "query": "CRISPR gene therapy", "rows": 10, "sort": "is-referenced-by-count"
}).json()["message"]["items"]

# Step 2: Check each DOI against Unpaywall
for item in results:
    doi = item["DOI"]
    title = item["title"][0][:50] if item.get("title") else "Untitled"

    oa = requests.get(f"https://api.unpaywall.org/v2/{doi}?email=you@email.com").json()
    status = "FREE" if oa.get("is_oa") else "PAID"
    print(f"[{status}] {title}")
    time.sleep(1)

The Numbers

According to Unpaywall's own data:

~30% of all papers have a free version somewhere
For recent papers (2020+), it's closer to 50%
NIH-funded papers: >90% are free (by mandate)

I Built a Toolkit

unpaywall-toolkit — Python wrapper for the Unpaywall API:

from unpaywall_toolkit import UnpaywallClient

client = UnpaywallClient(email="you@email.com")

# Single check
result = client.check("10.1038/nature12373")

# Batch check + export to CSV
client.export_csv(["10.1038/nature12373", "10.1126/science.1252229"], "results.csv")

Part of the Research API Suite

Unpaywall Toolkit — find free PDFs (this post)
Research Paper CLI — search 800M+ papers
Crossref Toolkit — 150M+ DOIs
OpenAlex Toolkit — 250M+ papers
Semantic Scholar — AI summaries

All on GitHub. All free and open source.

How do you access paywalled papers? And do you use any tools to find free versions?

Need web scraping or data extraction? I've built 77+ production scrapers. Email spinov001@gmail.com — quote in 2 hours. Or try my ready-made Apify actors — no code needed.