Alex Spinov

Posted on Mar 24 • Edited on Mar 26

Archive.org Has a Free API — Check How Any Website Looked 20 Years Ago (Wayback Machine)

#javascript #webdev #tutorial #api

The Wayback Machine has been archiving the internet since 1996. It has 800 billion web pages stored.

What most developers don't know: it has a free API. No key needed. You can programmatically check how any URL looked on any date.

Quick Start

import requests

def wayback_check(url):
    """Check if a URL exists in the Wayback Machine."""
    api = f'https://archive.org/wayback/available?url={url}'
    resp = requests.get(api)
    data = resp.json()
    snapshot = data.get('archived_snapshots', {}).get('closest', {})
    if snapshot:
        return {
            'archived': True,
            'url': snapshot['url'],
            'timestamp': snapshot['timestamp'],
            'status': snapshot['status']
        }
    return {'archived': False}

result = wayback_check('google.com')
print(f'Archived: {result["archived"]}')
if result['archived']:
    print(f'Snapshot URL: {result["url"]}')

Use Cases

1. Competitive Intelligence

See how a competitor's pricing page changed over time:

def get_snapshots(url, limit=10):
    """Get all archived snapshots of a URL."""
    cdx_api = f'https://web.archive.org/cdx/search/cdx?url={url}&output=json&limit={limit}'
    resp = requests.get(cdx_api)
    rows = resp.json()
    if len(rows) < 2:
        return []
    headers = rows[0]
    return [dict(zip(headers, row)) for row in rows[1:]]

# See every version of a competitor's pricing page
snapshots = get_snapshots('competitor.com/pricing', limit=20)
for s in snapshots:
    print(f'{s["timestamp"]} — {s["statuscode"]} — {s["length"]} bytes')

2. Check If a Domain Was Previously Used

Buying a domain? Check what it was used for before:

snapshots = get_snapshots('domain-you-want-to-buy.com', limit=5)
for s in snapshots:
    archive_url = f'https://web.archive.org/web/{s["timestamp"]}/{s["original"]}'
    print(f'{s["timestamp"][:4]}: {archive_url}')

3. Recover Lost Content

Website went down? Blog post deleted? Check if the Wayback Machine has it:

result = wayback_check('deleted-blog.com/important-article')
if result['archived']:
    print(f'Found it! {result["url"]}')
    # Download the page
    page = requests.get(result['url'])
    with open('recovered.html', 'w') as f:
        f.write(page.text)

4. SEO: Track How a Site Evolved

# How many times was a URL archived per year?
from collections import Counter

snapshots = get_snapshots('example.com', limit=1000)
years = Counter(s['timestamp'][:4] for s in snapshots)
for year, count in sorted(years.items()):
    bar = '█' * (count // 5)
    print(f'{year}: {count:4d} snapshots {bar}')

CDX API (The Power User API)

The CDX API is more powerful than the basic availability API:

https://web.archive.org/cdx/search/cdx?url=example.com&output=json

Parameters:

url — URL to search
matchType — exact, prefix, host, domain
from / to — date range (YYYYMMDD)
limit — max results
output — json, text, csv
filter — statuscode, mimetype, etc.

Rate Limits

No API key needed
Be respectful: 1-2 requests/second
Large queries may timeout — use limit parameter
Full dataset available for bulk download

What do you use the Wayback Machine for?

I've used it for competitive analysis and recovering deleted blog posts. What's the most creative use you've seen? Share in the comments.

More free APIs: Awesome Free APIs 2026 — 300+ APIs, no key needed

Web scraping tools: Awesome Web Scraping 2026

Need web scraping or data extraction? I've built 77+ production scrapers. Email spinov001@gmail.com — quote in 2 hours. Or try my ready-made Apify actors — no code needed.

DEV Community