Alex Spinov

Posted on Mar 25 • Edited on Mar 26

Crossref Has a Free API — Search 150M+ Scholarly Articles (No Key Required)

#api #python #research #tutorial

I was building a research tool and needed article metadata — titles, authors, citations, DOIs.

I tried Google Scholar (no API), Semantic Scholar (rate limited), Web of Science (expensive). Then I found Crossref — and it changed everything.

What is Crossref?

Crossref is the official DOI registration agency. When a publisher assigns a DOI to an article, the metadata goes to Crossref. That means:

150M+ scholarly works — journal articles, books, conference papers, datasets
No API key — just send a request
No rate limits — with polite pool (add your email)
Rich metadata — citations, references, funders, licenses

Quick Example

import requests

response = requests.get(
    "https://api.crossref.org/works",
    params={
        "query": "machine learning healthcare",
        "rows": 5,
        "mailto": "your@email.com"  # polite pool = faster
    }
)

for item in response.json()["message"]["items"]:
    title = item["title"][0]
    citations = item.get("is-referenced-by-count", 0)
    doi = item["DOI"]
    print(f"{title}")
    print(f"  Citations: {citations} | DOI: {doi}")

Output:

Machine Learning in Healthcare: A Review
  Citations: 1247 | DOI: 10.1016/j.artmed.2023.102456
Deep Learning for Medical Image Analysis
  Citations: 892 | DOI: 10.1038/s41591-023-02354-1

Look Up Any DOI

# Get full metadata for any DOI
resp = requests.get("https://api.crossref.org/works/10.1038/nature12373")
article = resp.json()["message"]

print(f"Title: {article['title'][0]}")
print(f"Journal: {article['container-title'][0]}")
print(f"Cited by: {article['is-referenced-by-count']} papers")
print(f"Authors: {', '.join(a['family'] for a in article['author'][:3])}")

Filter by Type, Date, Funder

Crossref supports powerful filters:

# Only journal articles from 2024+
params = {
    "query": "artificial intelligence",
    "filter": "type:journal-article,from-pub-date:2024-01-01",
    "sort": "is-referenced-by-count",
    "order": "desc",
    "rows": 10
}
resp = requests.get("https://api.crossref.org/works", params=params)

Available filters:

type:journal-article — articles only
from-pub-date:2024-01-01 — published after date
has-abstract:true — only with abstracts
funder:10.13039/100000001 — funded by specific org (NSF in this case)

Export to CSV

import csv

results = requests.get(
    "https://api.crossref.org/works",
    params={"query": "quantum computing", "rows": 100}
).json()["message"]["items"]

with open("papers.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["Title", "Authors", "Year", "Journal", "DOI", "Citations"])
    for item in results:
        title = item["title"][0] if item.get("title") else ""
        authors = "; ".join(f"{a.get('given','')} {a.get('family','')}" for a in item.get("author", [])[:3])
        year = item.get("published-print", item.get("published-online", {})).get("date-parts", [[None]])[0][0]
        journal = item.get("container-title", [""])[0]
        writer.writerow([title, authors, year, journal, item["DOI"], item.get("is-referenced-by-count", 0)])

Crossref vs Other Academic APIs

Feature	Crossref	OpenAlex	PubMed	arXiv
Records	150M+	250M+	36M+	2M+
API key	No	No	No	No
Scope	All fields	All fields	Biomedical	Physics/CS/Math
Citations	Yes	Yes	Limited	No
DOI lookup	Yes (native)	Yes	Via DOI	No
Abstracts	Some	Yes	Yes	Yes
Full text	No	No	No	Yes (PDF)

The "Polite Pool" Trick

Add mailto parameter to your requests. Crossref routes you to faster servers:

# Slow (anonymous pool)
requests.get("https://api.crossref.org/works?query=ai")

# Fast (polite pool — 10x faster)
requests.get("https://api.crossref.org/works?query=ai&mailto=you@email.com")

This is documented and encouraged by Crossref. No spam, no tracking — just better service.

I Built a Toolkit

I wrapped all of this into a Python toolkit: crossref-research-toolkit

from crossref_toolkit import CrossrefClient

client = CrossrefClient()
results = client.search_works("CRISPR gene editing", rows=5)
for r in results:
    print(f"{r['title']} — {r['citations']} citations")

# Export 100 results to CSV
client.export_csv("quantum computing", rows=100)

Part of the Research API Suite

This is part of my open-source Research API Suite:

🔬 Crossref Toolkit — 150M+ articles (this post)
📚 OpenAlex Toolkit — 250M+ papers
🏥 PubMed Toolkit — 36M+ medical papers
📄 arXiv Searcher — 2M+ preprints

All free. All open source. All no-API-key.

What academic API would you want a tutorial for next? Semantic Scholar? CORE? Let me know in the comments.

Need custom research tools or data pipelines? Check my GitHub or reach out.

Need data from the web without writing scrapers? Check my *Apify actors** — ready-made scrapers for HN, Reddit, LinkedIn, and 75+ more sites. Or email: spinov001@gmail.com*

DEV Community