I was building a research tool and needed article metadata — titles, authors, citations, DOIs.
I tried Google Scholar (no API), Semantic Scholar (rate limited), Web of Science (expensive). Then I found Crossref — and it changed everything.
What is Crossref?
Crossref is the official DOI registration agency. When a publisher assigns a DOI to an article, the metadata goes to Crossref. That means:
- 150M+ scholarly works — journal articles, books, conference papers, datasets
- No API key — just send a request
- No rate limits — with polite pool (add your email)
- Rich metadata — citations, references, funders, licenses
Quick Example
import requests
response = requests.get(
"https://api.crossref.org/works",
params={
"query": "machine learning healthcare",
"rows": 5,
"mailto": "your@email.com" # polite pool = faster
}
)
for item in response.json()["message"]["items"]:
title = item["title"][0]
citations = item.get("is-referenced-by-count", 0)
doi = item["DOI"]
print(f"{title}")
print(f" Citations: {citations} | DOI: {doi}")
Output:
Machine Learning in Healthcare: A Review
Citations: 1247 | DOI: 10.1016/j.artmed.2023.102456
Deep Learning for Medical Image Analysis
Citations: 892 | DOI: 10.1038/s41591-023-02354-1
Look Up Any DOI
# Get full metadata for any DOI
resp = requests.get("https://api.crossref.org/works/10.1038/nature12373")
article = resp.json()["message"]
print(f"Title: {article['title'][0]}")
print(f"Journal: {article['container-title'][0]}")
print(f"Cited by: {article['is-referenced-by-count']} papers")
print(f"Authors: {', '.join(a['family'] for a in article['author'][:3])}")
Filter by Type, Date, Funder
Crossref supports powerful filters:
# Only journal articles from 2024+
params = {
"query": "artificial intelligence",
"filter": "type:journal-article,from-pub-date:2024-01-01",
"sort": "is-referenced-by-count",
"order": "desc",
"rows": 10
}
resp = requests.get("https://api.crossref.org/works", params=params)
Available filters:
-
type:journal-article— articles only -
from-pub-date:2024-01-01— published after date -
has-abstract:true— only with abstracts -
funder:10.13039/100000001— funded by specific org (NSF in this case)
Export to CSV
import csv
results = requests.get(
"https://api.crossref.org/works",
params={"query": "quantum computing", "rows": 100}
).json()["message"]["items"]
with open("papers.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["Title", "Authors", "Year", "Journal", "DOI", "Citations"])
for item in results:
title = item["title"][0] if item.get("title") else ""
authors = "; ".join(f"{a.get('given','')} {a.get('family','')}" for a in item.get("author", [])[:3])
year = item.get("published-print", item.get("published-online", {})).get("date-parts", [[None]])[0][0]
journal = item.get("container-title", [""])[0]
writer.writerow([title, authors, year, journal, item["DOI"], item.get("is-referenced-by-count", 0)])
Crossref vs Other Academic APIs
| Feature | Crossref | OpenAlex | PubMed | arXiv |
|---|---|---|---|---|
| Records | 150M+ | 250M+ | 36M+ | 2M+ |
| API key | No | No | No | No |
| Scope | All fields | All fields | Biomedical | Physics/CS/Math |
| Citations | Yes | Yes | Limited | No |
| DOI lookup | Yes (native) | Yes | Via DOI | No |
| Abstracts | Some | Yes | Yes | Yes |
| Full text | No | No | No | Yes (PDF) |
The "Polite Pool" Trick
Add mailto parameter to your requests. Crossref routes you to faster servers:
# Slow (anonymous pool)
requests.get("https://api.crossref.org/works?query=ai")
# Fast (polite pool — 10x faster)
requests.get("https://api.crossref.org/works?query=ai&mailto=you@email.com")
This is documented and encouraged by Crossref. No spam, no tracking — just better service.
I Built a Toolkit
I wrapped all of this into a Python toolkit: crossref-research-toolkit
from crossref_toolkit import CrossrefClient
client = CrossrefClient()
results = client.search_works("CRISPR gene editing", rows=5)
for r in results:
print(f"{r['title']} — {r['citations']} citations")
# Export 100 results to CSV
client.export_csv("quantum computing", rows=100)
Part of the Research API Suite
This is part of my open-source Research API Suite:
- 🔬 Crossref Toolkit — 150M+ articles (this post)
- 📚 OpenAlex Toolkit — 250M+ papers
- 🏥 PubMed Toolkit — 36M+ medical papers
- 📄 arXiv Searcher — 2M+ preprints
All free. All open source. All no-API-key.
What academic API would you want a tutorial for next? Semantic Scholar? CORE? Let me know in the comments.
Need custom research tools or data pipelines? Check my GitHub or reach out.
Top comments (0)