Last month I needed citation data for a research project. Google Scholar blocks scraping. Semantic Scholar has rate limits. Then I found OpenAlex — and it changed everything.
What Is OpenAlex?
OpenAlex is a free, open catalog of 250M+ academic papers, authors, institutions, and concepts. No API key. No authentication. No rate limits (well, 100K/day, but that's generous). It's maintained by the nonprofit OurResearch (the folks behind Unpaywall).
Think of it as the Wikipedia of academic metadata.
Why Should You Care?
If you work with:
- Academic research — find papers, citations, co-author networks
- Market research — track R&D trends by analyzing publication patterns
- AI/ML — build training datasets from paper abstracts
- Competitive intelligence — see what universities/companies are publishing
...then OpenAlex is your new best friend.
Quick Start: Your First API Call
No setup. No signup. Just curl:
curl "https://api.openalex.org/works?search=machine+learning&per_page=3"
That returns 3.7 million results. Each paper includes title, authors, DOI, citation count, abstract, topics, and more.
Real-World Example: Finding the Most-Cited ML Papers
import requests
# Search for machine learning papers, sorted by citations
url = "https://api.openalex.org/works"
params = {
"search": "machine learning",
"sort": "cited_by_count:desc",
"per_page": 10
}
response = requests.get(url, params=params)
data = response.json()
print(f"Total papers found: data["meta"]["count"]:")
print()
for paper in data["results"]:
title = paper["title"]
citations = paper["cited_by_count"]
year = paper["publication_year"]
print(f"citations: citations | {year} | {title[:80]}")
Output:
Total papers found: 3,750,848
63,373 citations | 2011 | Scikit-learn: Machine Learning in Python
49,282 citations | 1989 | Genetic algorithms in search, optimization, and machine learning
42,418 citations | 2015 | Deep Learning
That took 0.3 seconds. No API key needed.
5 Powerful Things You Can Do
1. Track Research Trends Over Time
# How many AI papers per year?
for year in range(2020, 2027):
url = f"https://api.openalex.org/works?filter=concept.id:C154945302,publication_year:{year}"
count = requests.get(url).json()["meta"]["count"]
print(f"{year}: count: AI papers")
2. Find an Author's Full Publication List
# Search by author name
url = "https://api.openalex.org/authors?search=yann+lecun"
author = requests.get(url).json()["results"][0]
print(f"{author["display_name"]}: {author["works_count"]} papers, {author["cited_by_count"]:,} citations")
3. Map Institution Research Output
# MIT's publications
url = "https://api.openalex.org/institutions?search=MIT"
mit = requests.get(url).json()["results"][0]
print(f"{mit["display_name"]}: mit["works_count"]: papers")
4. Build a Citation Network
# Get papers that cite a specific paper
paper_id = "W2741809807" # "Attention Is All You Need"
url = f"https://api.openalex.org/works?filter=cites:{paper_id}&per_page=5"
citing = requests.get(url).json()
print(f"citing["meta"]["count"]: papers cite "Attention Is All You Need"")
5. Export Data for Analysis
import csv
url = "https://api.openalex.org/works?search=web+scraping&per_page=50"
papers = requests.get(url).json()["results"]
with open("papers.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["Title", "Year", "Citations", "DOI"])
for p in papers:
writer.writerow([p["title"], p["publication_year"], p["cited_by_count"], p.get("doi", "")])
print(f"Exported {len(papers)} papers to papers.csv")
OpenAlex vs Other Academic APIs
| Feature | OpenAlex | Google Scholar | Semantic Scholar | Scopus |
|---|---|---|---|---|
| API Key Required | ❌ No | ❌ No API | ✅ Yes | ✅ Yes |
| Rate Limit | 100K/day | Blocked | 100/5min | Varies |
| Papers | 250M+ | Unknown | 200M+ | 84M+ |
| Free | ✅ Yes | N/A | ✅ Yes | ❌ Paid |
| Open Source | ✅ Yes | ❌ No | ❌ No | ❌ No |
| Bulk Download | ✅ Yes | ❌ No | ✅ Yes | ❌ No |
Pro Tips
Add your email to get into the "polite pool" (faster responses):
?mailto=you@example.comUse filters instead of search for precise queries:
?filter=concept.id:C41008148,publication_year:2024(Computer Science papers from 2024)Pagination — use
cursorfor large result sets (faster than offset)Group by — get aggregated stats:
/works?group_by=publication_year&filter=concept.id:C154945302
When NOT to Use OpenAlex
- You need full-text PDFs → Use Unpaywall or Sci-Hub
- You need patent data → Use Google Patents or Lens.org
- You need real-time preprints → Use arXiv API (my previous article covers this)
Build Something With It
I used OpenAlex to build a research trend analyzer that tracks how AI subfields grow year over year. The entire thing is 50 lines of Python.
The data is there. It's free. No gatekeepers. Go build.
What would you build with 250M papers? Drop your idea in the comments — I'll pick the most interesting one and build a prototype.
More free API tutorials: My API series on Dev.to
Need custom data extraction? I build scrapers professionally. Contact me or check my 77 ready-made scrapers on Apify.
Top comments (0)