Most researchers pay for Scopus ($10,000+/year) or Web of Science ($50,000+/year) to search academic literature. But OpenAlex — a free, open-source index of 250M+ scholarly works — gives you the same data through a simple REST API. No API key. No rate limits (for polite users). No paywall.
I replaced a $500/month research data pipeline with 30 lines of Python using this API. Here's how.
What Is OpenAlex?
OpenAlex is a free, open catalog of the global research system. It indexes:
- 250M+ works (papers, books, datasets)
- 100M+ authors with citation metrics
- 100K+ institutions worldwide
- 50K+ journals and conferences
- Topics, concepts, and citation graphs
Think of it as a free alternative to Scopus, Web of Science, and Google Scholar — but with a proper API.
Quick Start: Search Papers in 5 Lines
import urllib.request
import json
url = "https://api.openalex.org/works?search=large+language+models&sort=cited_by_count:desc&per_page=5"
response = urllib.request.urlopen(url)
papers = json.loads(response.read())
for paper in papers["results"]:
print(f"{paper['title']}")
print(f" Citations: {paper['cited_by_count']}")
print(f" Year: {paper['publication_year']}")
print(f" DOI: {paper.get('doi')}\n")
No pip install. No API key. Just stdlib Python.
5 Practical Use Cases
1. Find the Most-Cited Papers on Any Topic
def top_papers(topic, limit=10):
url = f"https://api.openalex.org/works?search={topic}&sort=cited_by_count:desc&per_page={limit}"
response = urllib.request.urlopen(url)
data = json.loads(response.read())
return [(p["title"], p["cited_by_count"], p["publication_year"]) for p in data["results"]]
for title, citations, year in top_papers("artificial intelligence"):
print(f"[{year}] {title} — {citations} citations")
2. Track Research Trends Over Time
def trend(topic, start_year=2020, end_year=2025):
for year in range(start_year, end_year + 1):
url = f"https://api.openalex.org/works?search={topic}&filter=publication_year:{year}&per_page=1"
response = urllib.request.urlopen(url)
data = json.loads(response.read())
print(f"{year}: {data['meta']['count']} papers")
trend("large language models")
# 2020: 1,204 papers
# 2021: 2,891 papers
# 2022: 8,445 papers
# 2023: 28,102 papers <- explosion
# 2024: 61,334 papers
This alone replaces expensive research analytics tools.
3. Map Author Networks
def author_info(name):
url = f"https://api.openalex.org/authors?search={name}&per_page=1"
response = urllib.request.urlopen(url)
data = json.loads(response.read())
if data["results"]:
a = data["results"][0]
return {
"name": a["display_name"],
"works": a["works_count"],
"citations": a["cited_by_count"],
"h_index": a["summary_stats"]["h_index"]
}
print(author_info("Yann LeCun"))
4. Competitive Intelligence for Startups
def company_research(name):
url = f"https://api.openalex.org/institutions?search={name}&per_page=1"
response = urllib.request.urlopen(url)
data = json.loads(response.read())
if data["results"]:
inst = data["results"][0]
inst_id = inst["id"].split("/")[-1]
papers_url = f"https://api.openalex.org/works?filter=authorships.institutions.id:{inst_id},publication_year:2024-2025&sort=cited_by_count:desc&per_page=5"
response = urllib.request.urlopen(papers_url)
papers = json.loads(response.read())
print(f"{inst['display_name']} — {inst['works_count']} total works")
for p in papers["results"]:
print(f" {p['title']} ({p['cited_by_count']} citations)")
company_research("Google DeepMind")
5. Build a Research Dashboard
def dashboard(topics):
for topic in topics:
url = f"https://api.openalex.org/works?search={topic}&filter=publication_year:2025&per_page=1"
response = urllib.request.urlopen(url)
data = json.loads(response.read())
print(f"{topic:30} | {data['meta']['count']:>8} papers in 2025")
dashboard(["large language models", "computer vision", "quantum computing", "web scraping"])
API Endpoints Cheat Sheet
| Endpoint | What it returns | Example |
|---|---|---|
/works |
Papers, books, datasets | ?search=topic&sort=cited_by_count:desc |
/authors |
Researchers with h-index | ?search=name |
/institutions |
Universities, companies | ?search=MIT |
/topics |
Research topics with trends | ?search=machine+learning |
/sources |
Journals, conferences | ?search=Nature |
Rate Limits and Best Practices
- No API key needed for basic usage
- Add
mailto=your@email.comparameter for higher rate limits (polite pool) - Default: ~10 requests/second
- Polite pool: ~100 requests/second
- All responses are JSON
When NOT to Use OpenAlex
- You need full-text PDFs (metadata only)
- You need real-time data (1-2 day delay)
- You need patent data (use Google Patents instead)
Conclusion
OpenAlex gives you what Scopus charges $10,000/year for — free. The API is clean, fast, and requires zero setup.
Full code examples: github.com/spinov001-art
What research API do you use? Have you tried OpenAlex? Let me know in the comments — I am building a collection of free research data tools.
Top comments (0)