DEV Community

Alex Spinov
Alex Spinov

Posted on

I Built a Tool to Track What Big Pharma Is Testing (Using Free Data)

Last month, a friend working in biotech asked me: "How can I quickly see what trials Pfizer is running right now?"

He was spending hours on ClinicalTrials.gov, clicking through pages manually.

I said: "Give me 20 minutes."

The Problem

ClinicalTrials.gov has 500,000+ trials. The website works, but if you want to:

  • Monitor a specific company's pipeline
  • Track a disease area over time
  • Export data for analysis

...you're stuck clicking around.

The Solution: 15 Lines of Python

import requests

def track_company(sponsor, status='RECRUITING'):
    resp = requests.get('https://clinicaltrials.gov/api/v2/studies', params={
        'query.sponsor': sponsor,
        'filter.overallStatus': status,
        'pageSize': 20,
        'format': 'json'
    })
    trials = resp.json().get('studies', [])

    print(f"\n{sponsor}{len(trials)} {status.lower()} trials:\n")
    for s in trials:
        p = s['protocolSection']
        title = p['identificationModule']['briefTitle']
        phase = ', '.join(p.get('designModule', {}).get('phases', ['N/A']))
        print(f"  [{phase}] {title}")

    return trials

# Check what the big players are testing
for company in ['Pfizer', 'Moderna', 'Novartis', 'Roche']:
    track_company(company)
Enter fullscreen mode Exit fullscreen mode

Output:

Pfizer — 20 recruiting trials:

  [PHASE3] Study of PF-07321332 in Non-Hospitalized Adults
  [PHASE2] Novel mRNA Cancer Vaccine + Pembrolizumab
  [PHASE1] Gene Therapy for Hemophilia B
  ...

Moderna — 20 recruiting trials:

  [PHASE3] mRNA-1283 COVID-19 Next-Gen Vaccine
  [PHASE2] Personalized Cancer Vaccine (mRNA-4157)
  ...
Enter fullscreen mode Exit fullscreen mode

Making It Useful: Daily Email Alert

import smtplib
from email.mime.text import MIMEText
from datetime import datetime, timedelta

def get_new_trials(query, days=1):
    since = (datetime.now() - timedelta(days=days)).strftime('%Y-%m-%d')
    resp = requests.get('https://clinicaltrials.gov/api/v2/studies', params={
        'query.term': query,
        'filter.advanced': f'AREA[StudyFirstPostDate]RANGE[{since}, MAX]',
        'pageSize': 50,
        'format': 'json'
    })
    return resp.json().get('studies', [])

# Run daily via cron
new_trials = get_new_trials('artificial intelligence', days=1)
if new_trials:
    print(f"{len(new_trials)} new AI trials today!")
Enter fullscreen mode Exit fullscreen mode

What My Friend Said

"I was paying $200/month for a clinical trial monitoring service. This does 80% of what it does."

That's $2,400/year saved with a Python script.

The Full Toolkit

I packaged this into a proper CLI tool:

python search_trials.py 'cancer immunotherapy' --status RECRUITING --format csv --output trials.csv
Enter fullscreen mode Exit fullscreen mode

👉 GitHub repo

It's part of my Research API Suite — 9 free API toolkits for research automation.


What data would you track if you had easy access to 500K clinical trials? I'm curious about non-obvious use cases.


More tools: Apify scrapers | GitHub

Top comments (0)