Two weeks ago, I started a research project that required:
- Academic papers from multiple databases
- Patent data
- Clinical trial information
- Security checks on all downloaded files
Manually, this would take days. With 10 free APIs, I automated it in an afternoon.
Here's the stack I built.
The Research Pipeline
Query → OpenAlex (papers) → Crossref (metadata) → Unpaywall (free PDFs)
→ PubMed (medical) → ClinicalTrials.gov (trials) → Patents (USPTO)
→ Semantic Scholar (AI summaries) → Export → Analyze
Each step is one Python function. Total code: ~200 lines.
Step 1: Find Papers (OpenAlex)
import requests
def find_papers(topic, limit=20):
resp = requests.get('https://api.openalex.org/works', params={
'search': topic, 'per_page': limit,
'sort': 'cited_by_count:desc'
})
return [{
'title': w['title'],
'doi': w.get('doi'),
'citations': w['cited_by_count'],
'year': w.get('publication_year')
} for w in resp.json()['results']]
papers = find_papers('CRISPR gene editing therapy')
print(f"Found {len(papers)} papers, top cited: {papers[0]['citations']}")
Step 2: Enrich Metadata (Crossref)
def get_metadata(doi):
if not doi: return {}
doi_id = doi.replace('https://doi.org/', '')
resp = requests.get(f'https://api.crossref.org/works/{doi_id}')
if resp.status_code != 200: return {}
item = resp.json()['message']
return {
'publisher': item.get('publisher'),
'journal': item.get('container-title', [''])[0],
'references': item.get('references-count', 0)
}
Step 3: Find Free PDFs (Unpaywall)
def find_pdf(doi):
if not doi: return None
doi_id = doi.replace('https://doi.org/', '')
resp = requests.get(f'https://api.unpaywall.org/v2/{doi_id}',
params={'email': 'research@example.com'})
data = resp.json()
if data.get('is_oa'):
return data['best_oa_location'].get('url_for_pdf')
return None
Step 4: Get AI Summaries (Semantic Scholar)
def get_tldr(title):
resp = requests.get('https://api.semanticscholar.org/graph/v1/paper/search',
params={'query': title, 'limit': 1, 'fields': 'tldr'})
papers = resp.json().get('data', [])
if papers and papers[0].get('tldr'):
return papers[0]['tldr']['text']
return 'No summary available'
Step 5: Check Related Trials (ClinicalTrials.gov)
def find_trials(topic, limit=5):
resp = requests.get('https://clinicaltrials.gov/api/v2/studies', params={
'query.term': topic, 'pageSize': limit, 'format': 'json'
})
return [{
'nct_id': s['protocolSection']['identificationModule']['nctId'],
'title': s['protocolSection']['identificationModule']['briefTitle'],
'status': s['protocolSection']['statusModule']['overallStatus']
} for s in resp.json().get('studies', [])]
Step 6: Check Patents (USPTO)
def find_patents(topic, limit=5):
resp = requests.post('https://api.patentsview.org/patents/query', json={
'q': {'_text_any': {'patent_abstract': topic}},
'f': ['patent_number', 'patent_title', 'patent_date'],
'o': {'per_page': limit},
's': [{'patent_date': 'desc'}]
})
return resp.json().get('patents', [])
The Full Pipeline
def research(topic):
print(f"Researching: {topic}\n")
# Papers
papers = find_papers(topic, limit=10)
print(f"📚 {len(papers)} papers found")
# Enrich top 5 with metadata + PDFs
for p in papers[:5]:
meta = get_metadata(p['doi'])
pdf = find_pdf(p['doi'])
tldr = get_tldr(p['title'])
print(f" • {p['title'][:60]}")
print(f" Citations: {p['citations']} | Journal: {meta.get('journal', 'N/A')}")
print(f" PDF: {'✅' if pdf else '❌'} | TLDR: {tldr[:80]}...")
# Clinical trials
trials = find_trials(topic)
print(f"\n🏥 {len(trials)} clinical trials")
for t in trials:
print(f" [{t['status']}] {t['title'][:60]}")
# Patents
patents = find_patents(topic)
print(f"\n📜 {len(patents)} patents")
for p in patents:
print(f" [{p['patent_date']}] {p['patent_title'][:60]}")
research('CRISPR gene editing therapy')
Results
For one query, I got:
- 10 highly-cited papers with metadata
- 4 free PDFs (via Unpaywall)
- AI summaries for all papers
- 5 active clinical trials
- 5 related patents
All in under 30 seconds.
All Toolkits (Open Source)
I packaged each step into its own toolkit:
| # | Toolkit | What it does |
|---|---|---|
| 1 | OpenAlex | 250M+ academic works |
| 2 | Crossref | 150M+ article metadata |
| 3 | PubMed | 36M+ medical papers |
| 4 | Semantic Scholar | AI summaries |
| 5 | arXiv | 2.4M+ preprints |
| 6 | CORE | 300M+ open access |
| 7 | Unpaywall | Find free PDFs |
| 8 | ClinicalTrials.gov | 500K+ trials |
| 9 | USPTO Patents | 8M+ patents |
| 10 | Security Scanner | 5 security APIs |
Full collection: awesome-free-research-apis
What would you automate if you had all these APIs in one pipeline? I'm curious about creative use cases.
Top comments (0)