Google Scholar doesn't have an API. Semantic Scholar has rate limits. Web of Science costs $10K+/year.
But there's a free alternative that most researchers don't know about: OpenAlex.
It indexes 250 million research works, 100K+ journals, and 200M+ authors — and the API needs no key, no signup, no payment.
What OpenAlex Gives You
- 250M+ research papers with abstracts
- Citation counts and references
- Author profiles with h-index
- Institution data
- Concept/topic classification
- Open access status
- Everything downloadable as JSON
Quick Start
import requests
def search_papers(query, limit=5):
"""Search 250M+ papers. No API key needed."""
url = "https://api.openalex.org/works"
params = {
"search": query,
"per_page": limit,
"sort": "cited_by_count:desc" # Most cited first
}
resp = requests.get(url, params=params)
return resp.json()["results"]
# Search for papers about web scraping
papers = search_papers("web scraping machine learning")
for p in papers:
print(f'{p["cited_by_count"]:5d} citations | {p["title"]}')
print(f' {p["doi"]}')
print()
Output:
1,247 citations | A Survey on Web Scraping Technologies
https://doi.org/10.1145/xxxxx
892 citations | Machine Learning for Web Data Extraction
https://doi.org/10.1007/xxxxx
5 Useful Queries
Find papers by topic
# Most cited AI papers published in 2024-2025
params = {
"search": "large language models",
"filter": "publication_year:2024-2025",
"sort": "cited_by_count:desc",
"per_page": 10
}
Find papers by author
# Search for an author
url = "https://api.openalex.org/authors"
params = {"search": "Geoffrey Hinton"}
resp = requests.get(url, params=params)
author = resp.json()["results"][0]
print(f'Name: {author["display_name"]}')
print(f'h-index: {author["summary_stats"]["h_index"]}')
print(f'Total works: {author["works_count"]}')
print(f'Total citations: {author["cited_by_count"]}')
Find open access papers only
params = {
"search": "web scraping",
"filter": "is_oa:true", # Open access only
"per_page": 5
}
Get paper references
# Get all papers cited by a specific paper
paper_id = "W2100837269" # OpenAlex ID
url = f"https://api.openalex.org/works?filter=cites:{paper_id}"
resp = requests.get(url)
print(f'This paper is cited by {resp.json()["meta"]["count"]} papers')
Analyze trending topics
# How many papers about LLMs per year?
for year in range(2020, 2027):
url = f"https://api.openalex.org/works?filter=publication_year:{year}"
params = {"search": "large language model"}
resp = requests.get(url, params=params)
count = resp.json()["meta"]["count"]
print(f'{year}: count: papers about LLMs')
OpenAlex vs Alternatives
| Feature | OpenAlex | Google Scholar | Semantic Scholar | Web of Science |
|---|---|---|---|---|
| API | ✅ Free | ❌ None | ✅ Free (limited) | ✅ Paid ($10K+) |
| Papers | 250M+ | Unknown | 200M+ | 100M+ |
| Rate limit | Generous | N/A | 100 req/5min | Varies |
| API key | Not needed | N/A | Recommended | Required |
| Open data | ✅ CC0 | ❌ | Partial | ❌ |
Rate Limits
- No API key: ~10 requests/second
- With
mailtoparam: higher limits (just add your email) - Bulk download: full dataset available as snapshot
# Add your email for better rate limits (recommended)
params["mailto"] = "your@email.com"
What research APIs do you use?
If you work with academic data, I'd love to know what tools you use. Are you scraping Google Scholar? Using a paid service? Built your own pipeline?
Drop your workflow in the comments — especially if you've found a free alternative I don't know about.
I built tools for academic research: OpenAlex Research Tools
All free APIs: Awesome Free APIs 2026 — 300+ APIs, no key needed
Top comments (0)