Alex Spinov

Posted on Mar 25

OpenAlex API: Search 250M+ Academic Papers for Free (No Key Required)

#api #python #research #opensource

Last month I needed citation data for a research project. Google Scholar blocks scraping. Semantic Scholar has rate limits. Then I found OpenAlex — and it changed everything.

What Is OpenAlex?

OpenAlex is a free, open catalog of 250M+ academic papers, authors, institutions, and concepts. No API key. No authentication. No rate limits (well, 100K/day, but that's generous). It's maintained by the nonprofit OurResearch (the folks behind Unpaywall).

Think of it as the Wikipedia of academic metadata.

Why Should You Care?

If you work with:

Academic research — find papers, citations, co-author networks
Market research — track R&D trends by analyzing publication patterns
AI/ML — build training datasets from paper abstracts
Competitive intelligence — see what universities/companies are publishing

...then OpenAlex is your new best friend.

Quick Start: Your First API Call

No setup. No signup. Just curl:

curl "https://api.openalex.org/works?search=machine+learning&per_page=3"

That returns 3.7 million results. Each paper includes title, authors, DOI, citation count, abstract, topics, and more.

Real-World Example: Finding the Most-Cited ML Papers

import requests

# Search for machine learning papers, sorted by citations
url = "https://api.openalex.org/works"
params = {
    "search": "machine learning",
    "sort": "cited_by_count:desc",
    "per_page": 10
}

response = requests.get(url, params=params)
data = response.json()

print(f"Total papers found: data["meta"]["count"]:")
print()

for paper in data["results"]:
    title = paper["title"]
    citations = paper["cited_by_count"]
    year = paper["publication_year"]
    print(f"citations: citations | {year} | {title[:80]}")

Output:

Total papers found: 3,750,848

63,373 citations | 2011 | Scikit-learn: Machine Learning in Python
49,282 citations | 1989 | Genetic algorithms in search, optimization, and machine learning
42,418 citations | 2015 | Deep Learning

That took 0.3 seconds. No API key needed.

5 Powerful Things You Can Do

1. Track Research Trends Over Time

# How many AI papers per year?
for year in range(2020, 2027):
    url = f"https://api.openalex.org/works?filter=concept.id:C154945302,publication_year:{year}"
    count = requests.get(url).json()["meta"]["count"]
    print(f"{year}: count: AI papers")

2. Find an Author's Full Publication List

# Search by author name
url = "https://api.openalex.org/authors?search=yann+lecun"
author = requests.get(url).json()["results"][0]
print(f"{author["display_name"]}: {author["works_count"]} papers, {author["cited_by_count"]:,} citations")

3. Map Institution Research Output

# MIT's publications
url = "https://api.openalex.org/institutions?search=MIT"
mit = requests.get(url).json()["results"][0]
print(f"{mit["display_name"]}: mit["works_count"]: papers")

4. Build a Citation Network

# Get papers that cite a specific paper
paper_id = "W2741809807"  # "Attention Is All You Need"
url = f"https://api.openalex.org/works?filter=cites:{paper_id}&per_page=5"
citing = requests.get(url).json()
print(f"citing["meta"]["count"]: papers cite "Attention Is All You Need"")

5. Export Data for Analysis

import csv

url = "https://api.openalex.org/works?search=web+scraping&per_page=50"
papers = requests.get(url).json()["results"]

with open("papers.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["Title", "Year", "Citations", "DOI"])
    for p in papers:
        writer.writerow([p["title"], p["publication_year"], p["cited_by_count"], p.get("doi", "")])

print(f"Exported {len(papers)} papers to papers.csv")

OpenAlex vs Other Academic APIs

Feature	OpenAlex	Google Scholar	Semantic Scholar	Scopus
API Key Required	❌ No	❌ No API	✅ Yes	✅ Yes
Rate Limit	100K/day	Blocked	100/5min	Varies
Papers	250M+	Unknown	200M+	84M+
Free	✅ Yes	N/A	✅ Yes	❌ Paid
Open Source	✅ Yes	❌ No	❌ No	❌ No
Bulk Download	✅ Yes	❌ No	✅ Yes	❌ No

Pro Tips

Add your email to get into the "polite pool" (faster responses):
?mailto=you@example.com
Use filters instead of search for precise queries:
?filter=concept.id:C41008148,publication_year:2024 (Computer Science papers from 2024)
Pagination — use cursor for large result sets (faster than offset)
Group by — get aggregated stats:
/works?group_by=publication_year&filter=concept.id:C154945302

When NOT to Use OpenAlex

You need full-text PDFs → Use Unpaywall or Sci-Hub
You need patent data → Use Google Patents or Lens.org
You need real-time preprints → Use arXiv API (my previous article covers this)

Build Something With It

I used OpenAlex to build a research trend analyzer that tracks how AI subfields grow year over year. The entire thing is 50 lines of Python.

The data is there. It's free. No gatekeepers. Go build.

What would you build with 250M papers? Drop your idea in the comments — I'll pick the most interesting one and build a prototype.

More free API tutorials: My API series on Dev.to

Need custom data extraction? I build scrapers professionally. Contact me or check my 77 ready-made scrapers on Apify.

DEV Community