Alex Spinov

Posted on Mar 24 • Edited on Mar 26

OpenAlex API: Search 250M+ Research Papers for Free (No API Key Needed)

#python #api #webdev #machinelearning

Most researchers pay for Scopus ($10,000+/year) or Web of Science ($50,000+/year) to search academic literature. But OpenAlex — a free, open-source index of 250M+ scholarly works — gives you the same data through a simple REST API. No API key. No rate limits (for polite users). No paywall.

I replaced a $500/month research data pipeline with 30 lines of Python using this API. Here's how.

What Is OpenAlex?

OpenAlex is a free, open catalog of the global research system. It indexes:

250M+ works (papers, books, datasets)
100M+ authors with citation metrics
100K+ institutions worldwide
50K+ journals and conferences
Topics, concepts, and citation graphs

Think of it as a free alternative to Scopus, Web of Science, and Google Scholar — but with a proper API.

Quick Start: Search Papers in 5 Lines

import urllib.request
import json

url = "https://api.openalex.org/works?search=large+language+models&sort=cited_by_count:desc&per_page=5"
response = urllib.request.urlopen(url)
papers = json.loads(response.read())

for paper in papers["results"]:
    print(f"{paper['title']}")
    print(f"  Citations: {paper['cited_by_count']}")
    print(f"  Year: {paper['publication_year']}")
    print(f"  DOI: {paper.get('doi')}\n")

No pip install. No API key. Just stdlib Python.

5 Practical Use Cases

1. Find the Most-Cited Papers on Any Topic

def top_papers(topic, limit=10):
    url = f"https://api.openalex.org/works?search={topic}&sort=cited_by_count:desc&per_page={limit}"
    response = urllib.request.urlopen(url)
    data = json.loads(response.read())
    return [(p["title"], p["cited_by_count"], p["publication_year"]) for p in data["results"]]

for title, citations, year in top_papers("artificial intelligence"):
    print(f"[{year}] {title} — {citations} citations")

2. Track Research Trends Over Time

def trend(topic, start_year=2020, end_year=2025):
    for year in range(start_year, end_year + 1):
        url = f"https://api.openalex.org/works?search={topic}&filter=publication_year:{year}&per_page=1"
        response = urllib.request.urlopen(url)
        data = json.loads(response.read())
        print(f"{year}: {data['meta']['count']} papers")

trend("large language models")
# 2020: 1,204 papers
# 2021: 2,891 papers
# 2022: 8,445 papers
# 2023: 28,102 papers  <- explosion
# 2024: 61,334 papers

This alone replaces expensive research analytics tools.

3. Map Author Networks

def author_info(name):
    url = f"https://api.openalex.org/authors?search={name}&per_page=1"
    response = urllib.request.urlopen(url)
    data = json.loads(response.read())
    if data["results"]:
        a = data["results"][0]
        return {
            "name": a["display_name"],
            "works": a["works_count"],
            "citations": a["cited_by_count"],
            "h_index": a["summary_stats"]["h_index"]
        }

print(author_info("Yann LeCun"))

4. Competitive Intelligence for Startups

def company_research(name):
    url = f"https://api.openalex.org/institutions?search={name}&per_page=1"
    response = urllib.request.urlopen(url)
    data = json.loads(response.read())
    if data["results"]:
        inst = data["results"][0]
        inst_id = inst["id"].split("/")[-1]
        papers_url = f"https://api.openalex.org/works?filter=authorships.institutions.id:{inst_id},publication_year:2024-2025&sort=cited_by_count:desc&per_page=5"
        response = urllib.request.urlopen(papers_url)
        papers = json.loads(response.read())
        print(f"{inst['display_name']} — {inst['works_count']} total works")
        for p in papers["results"]:
            print(f"  {p['title']} ({p['cited_by_count']} citations)")

company_research("Google DeepMind")

5. Build a Research Dashboard

def dashboard(topics):
    for topic in topics:
        url = f"https://api.openalex.org/works?search={topic}&filter=publication_year:2025&per_page=1"
        response = urllib.request.urlopen(url)
        data = json.loads(response.read())
        print(f"{topic:30} | {data['meta']['count']:>8} papers in 2025")

dashboard(["large language models", "computer vision", "quantum computing", "web scraping"])

API Endpoints Cheat Sheet

Endpoint	What it returns	Example
`/works`	Papers, books, datasets	`?search=topic&sort=cited_by_count:desc`
`/authors`	Researchers with h-index	`?search=name`
`/institutions`	Universities, companies	`?search=MIT`
`/topics`	Research topics with trends	`?search=machine+learning`
`/sources`	Journals, conferences	`?search=Nature`

Rate Limits and Best Practices

No API key needed for basic usage
Add mailto=your@email.com parameter for higher rate limits (polite pool)
Default: ~10 requests/second
Polite pool: ~100 requests/second
All responses are JSON

When NOT to Use OpenAlex

You need full-text PDFs (metadata only)
You need real-time data (1-2 day delay)
You need patent data (use Google Patents instead)

Conclusion

OpenAlex gives you what Scopus charges $10,000/year for — free. The API is clean, fast, and requires zero setup.

Full code examples: github.com/spinov001-art

What research API do you use? Have you tried OpenAlex? Let me know in the comments — I am building a collection of free research data tools.

Need custom dev tools, scrapers, or API integrations? I build automation for dev teams. Email spinov001@gmail.com — or explore awesome-web-scraping.

You might also like:

Need data from the web without writing scrapers? Check my *Apify actors** — ready-made scrapers for HN, Reddit, LinkedIn, and 75+ more sites. Or email: spinov001@gmail.com*

DEV Community