DEV Community

Alex Spinov
Alex Spinov

Posted on

Crossref Has a Free API — Search 150M+ Scholarly Articles (No Key Required)

I was building a research tool and needed article metadata — titles, authors, citations, DOIs.

I tried Google Scholar (no API), Semantic Scholar (rate limited), Web of Science (expensive). Then I found Crossref — and it changed everything.

What is Crossref?

Crossref is the official DOI registration agency. When a publisher assigns a DOI to an article, the metadata goes to Crossref. That means:

  • 150M+ scholarly works — journal articles, books, conference papers, datasets
  • No API key — just send a request
  • No rate limits — with polite pool (add your email)
  • Rich metadata — citations, references, funders, licenses

Quick Example

import requests

response = requests.get(
    "https://api.crossref.org/works",
    params={
        "query": "machine learning healthcare",
        "rows": 5,
        "mailto": "your@email.com"  # polite pool = faster
    }
)

for item in response.json()["message"]["items"]:
    title = item["title"][0]
    citations = item.get("is-referenced-by-count", 0)
    doi = item["DOI"]
    print(f"{title}")
    print(f"  Citations: {citations} | DOI: {doi}")
Enter fullscreen mode Exit fullscreen mode

Output:

Machine Learning in Healthcare: A Review
  Citations: 1247 | DOI: 10.1016/j.artmed.2023.102456
Deep Learning for Medical Image Analysis
  Citations: 892 | DOI: 10.1038/s41591-023-02354-1
Enter fullscreen mode Exit fullscreen mode

Look Up Any DOI

# Get full metadata for any DOI
resp = requests.get("https://api.crossref.org/works/10.1038/nature12373")
article = resp.json()["message"]

print(f"Title: {article['title'][0]}")
print(f"Journal: {article['container-title'][0]}")
print(f"Cited by: {article['is-referenced-by-count']} papers")
print(f"Authors: {', '.join(a['family'] for a in article['author'][:3])}")
Enter fullscreen mode Exit fullscreen mode

Filter by Type, Date, Funder

Crossref supports powerful filters:

# Only journal articles from 2024+
params = {
    "query": "artificial intelligence",
    "filter": "type:journal-article,from-pub-date:2024-01-01",
    "sort": "is-referenced-by-count",
    "order": "desc",
    "rows": 10
}
resp = requests.get("https://api.crossref.org/works", params=params)
Enter fullscreen mode Exit fullscreen mode

Available filters:

  • type:journal-article — articles only
  • from-pub-date:2024-01-01 — published after date
  • has-abstract:true — only with abstracts
  • funder:10.13039/100000001 — funded by specific org (NSF in this case)

Export to CSV

import csv

results = requests.get(
    "https://api.crossref.org/works",
    params={"query": "quantum computing", "rows": 100}
).json()["message"]["items"]

with open("papers.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["Title", "Authors", "Year", "Journal", "DOI", "Citations"])
    for item in results:
        title = item["title"][0] if item.get("title") else ""
        authors = "; ".join(f"{a.get('given','')} {a.get('family','')}" for a in item.get("author", [])[:3])
        year = item.get("published-print", item.get("published-online", {})).get("date-parts", [[None]])[0][0]
        journal = item.get("container-title", [""])[0]
        writer.writerow([title, authors, year, journal, item["DOI"], item.get("is-referenced-by-count", 0)])
Enter fullscreen mode Exit fullscreen mode

Crossref vs Other Academic APIs

Feature Crossref OpenAlex PubMed arXiv
Records 150M+ 250M+ 36M+ 2M+
API key No No No No
Scope All fields All fields Biomedical Physics/CS/Math
Citations Yes Yes Limited No
DOI lookup Yes (native) Yes Via DOI No
Abstracts Some Yes Yes Yes
Full text No No No Yes (PDF)

The "Polite Pool" Trick

Add mailto parameter to your requests. Crossref routes you to faster servers:

# Slow (anonymous pool)
requests.get("https://api.crossref.org/works?query=ai")

# Fast (polite pool — 10x faster)
requests.get("https://api.crossref.org/works?query=ai&mailto=you@email.com")
Enter fullscreen mode Exit fullscreen mode

This is documented and encouraged by Crossref. No spam, no tracking — just better service.

I Built a Toolkit

I wrapped all of this into a Python toolkit: crossref-research-toolkit

from crossref_toolkit import CrossrefClient

client = CrossrefClient()
results = client.search_works("CRISPR gene editing", rows=5)
for r in results:
    print(f"{r['title']}{r['citations']} citations")

# Export 100 results to CSV
client.export_csv("quantum computing", rows=100)
Enter fullscreen mode Exit fullscreen mode

Part of the Research API Suite

This is part of my open-source Research API Suite:

All free. All open source. All no-API-key.


What academic API would you want a tutorial for next? Semantic Scholar? CORE? Let me know in the comments.

Need custom research tools or data pipelines? Check my GitHub or reach out.

Top comments (0)