Ever tried to build something with academic data? Google Scholar blocks you, Scopus costs thousands, and Web of Science requires institutional access.
Then I found OpenAlex — a completely free, open-source catalog of 250M+ academic works, 90M+ authors, and 100K+ institutions. No API key. No rate limits. No authentication.
Let me show you what you can actually build with it.
What Is OpenAlex?
OpenAlex is the open replacement for Microsoft Academic Graph (which shut down in 2021). It indexes:
- 250M+ works (papers, books, datasets)
- 90M+ authors with disambiguation
- 100K+ institutions
- 65K+ journals and venues
- Citation graphs, concepts, topics — all linked
And it is 100% free. No API key needed.
Quick Start: Search Papers by Topic
import requests
# Search for papers about "transformer architecture"
url = "https://api.openalex.org/works"
params = {
"search": "transformer architecture neural networks",
"per_page": 5,
"sort": "cited_by_count:desc"
}
response = requests.get(url, params=params)
data = response.json()
for work in data["results"]:
title = work["title"]
citations = work["cited_by_count"]
year = work["publication_year"]
doi = work.get("doi", "No DOI")
print(f"[{year}] {title}")
print(f" Citations: {citations:,} | DOI: {doi}")
print()
Output:
[2017] Attention Is All You Need
Citations: 120,000+ | DOI: https://doi.org/10....
[2018] BERT: Pre-training of Deep Bidirectional Transformers
Citations: 85,000+ | DOI: https://doi.org/10..
No key. No signup. Just works.
Find the Most-Cited Authors in Any Field
# Top authors in "machine learning"
url = "https://api.openalex.org/authors"
params = {
"search": "machine learning",
"sort": "cited_by_count:desc",
"per_page": 5
}
response = requests.get(url, params=params)
for author in response.json()["results"]:
name = author["display_name"]
citations = author["cited_by_count"]
works = author["works_count"]
inst = author.get("last_known_institutions", [{}])
inst_name = inst[0]["display_name"] if inst else "Unknown"
print(f"{name} ({inst_name})")
print(f" {works:,} works | {citations:,} citations")
Track Citation Trends Over Time
# How many papers mention "large language models" per year?
url = "https://api.openalex.org/works"
params = {
"search": "large language models",
"group_by": "publication_year"
}
response = requests.get(url, params=params)
for group in response.json()["group_by"]:
year = group["key"]
count = group["count"]
if int(year) >= 2019:
bar = "█" * (count // 500)
print(f"{year}: {count:>6,} papers {bar}")
This reveals the explosion of LLM research:
2019: 1,200 papers ██
2020: 2,800 papers █████
2021: 5,400 papers ██████████
2022: 12,000 papers ████████████████████████
2023: 35,000 papers ██████████████████████████████████████████
Build a Research Dashboard
Combine OpenAlex with other free APIs:
def research_landscape(topic):
"""Get a complete overview of any research topic."""
base = "https://api.openalex.org"
# Total papers
works = requests.get(f"{base}/works", params={"search": topic}).json()
total = works["meta"]["count"]
# Top institutions
inst = requests.get(f"{base}/institutions",
params={"search": topic, "sort": "cited_by_count:desc", "per_page": 3}).json()
# Growth trend
trend = requests.get(f"{base}/works",
params={"search": topic, "group_by": "publication_year"}).json()
print(f"Topic: {topic}")
print(f"Total papers: {total:,}")
print(f"\nTop institutions:")
for i in inst["results"][:3]:
print(f" - {i[display_name]} ({i[cited_by_count]:,} citations)")
recent = [g for g in trend["group_by"] if int(g["key"]) >= 2023]
if recent:
print(f"\n2023+ papers: {sum(g[count] for g in recent):,}")
research_landscape("artificial intelligence safety")
Why OpenAlex Over Alternatives?
| Feature | OpenAlex | Google Scholar | Scopus | Semantic Scholar |
|---|---|---|---|---|
| API Key Required | No | No API | Yes ($$$) | Yes (free) |
| Rate Limits | 100K/day | Blocked | Strict | 100/5min |
| Data Download | Full dump | No | No | Yes |
| Author IDs | ORCID-linked | No | Scopus ID | S2 ID |
| Open Source | Yes | No | No | Partial |
| Works Indexed | 250M+ | Unknown | 90M+ | 200M+ |
OpenAlex wins on openness and scale.
What You Can Build
- Research trend tracker — monitor any field in real-time
- Author discovery tool — find experts in niche topics
- Citation network visualizer — map how ideas spread
- Literature review automation — find related papers systematically
- Grant landscape analyzer — see where funding goes
- Competitor intelligence for R&D — track what rival labs publish
The API Endpoints
-
/works— papers, articles, books, datasets -
/authors— researchers with disambiguation -
/institutions— universities, labs, companies -
/sources— journals, conferences, repositories -
/concepts— hierarchical topic taxonomy -
/topics— fine-grained research topics -
/funders— funding organizations -
/publishers— publishing companies
All support filtering, sorting, grouping, and pagination.
Pro Tips
- Add
mailto=your@email.comto get into the polite pool (faster responses) - Use
filterinstead ofsearchfor exact matches - Download monthly snapshots from AWS S3 for bulk analysis
- Combine with Semantic Scholar for citation context
I have been building data collection tools for 2+ years. OpenAlex is hands-down the best free academic API I have found.
Have you used OpenAlex? What did you build with it? I would love to hear about your projects in the comments.
If you need custom data pipelines for academic research, check out my data collection tools on GitHub — I build scrapers and API wrappers for research workflows.\n\n---\n\n## More Free Research APIs\n\nThis is part of my series on free APIs for researchers and data scientists:\n\n- OpenAlex API — 250M+ Academic Works\n- CORE API — 260M+ Scientific Papers\n- Crossref API — DOI Metadata for 150M+ Papers\n- Unpaywall API — Find Free Paper Versions\n- Europe PMC — 40M+ Biomedical Papers\n- World Bank API — GDP & Economic Data\n- ORCID API — 18M+ Researcher Profiles\n- DBLP API — 6M+ CS Publications\n- NASA APIs — 20+ Free Space Data APIs\n- FRED API — 800K+ US Economic Time Series\n- All 30+ Research APIs Mapped\n\n*Tools: Academic Research Toolkit on GitHub*
Top comments (0)