DEV Community

Alex Spinov
Alex Spinov

Posted on

The Wayback Machine Has a Free API — Check How Any Website Looked 10 Years Ago

A client once asked me to prove that a competitor copied their landing page design. I needed to show what both sites looked like 2 years ago.

The Internet Archive's Wayback Machine has a free API that lets you retrieve any website's snapshot history — programmatically.

The API

Base URL: https://archive.org/wayback/available

No API key. No auth. No rate limit docs (but be reasonable).

1. Check If a URL Has Been Archived

import requests

def check_archive(url):
    resp = requests.get(
        'https://archive.org/wayback/available',
        params={'url': url}
    ).json()

    snapshot = resp.get('archived_snapshots', {}).get('closest', {})
    if snapshot:
        return {
            'available': True,
            'url': snapshot['url'],
            'timestamp': snapshot['timestamp'],
            'status': snapshot['status']
        }
    return {'available': False}

# Check any website
result = check_archive('github.com')
print(result)
# {'available': True, 'url': 'http://web.archive.org/web/20260324.../github.com', ...}
Enter fullscreen mode Exit fullscreen mode

2. Get a Specific Date's Snapshot

def get_snapshot(url, date='20200101'):
    """Get closest snapshot to a specific date. Format: YYYYMMDD."""
    resp = requests.get(
        'https://archive.org/wayback/available',
        params={'url': url, 'timestamp': date}
    ).json()

    snapshot = resp.get('archived_snapshots', {}).get('closest', {})
    return snapshot.get('url', 'Not found')

# What did Google look like on Jan 1, 2015?
print(get_snapshot('google.com', '20150101'))
Enter fullscreen mode Exit fullscreen mode

3. Track Design Changes Over Time

def track_changes(url, years=range(2015, 2027)):
    """Get one snapshot per year to track evolution."""
    snapshots = []
    for year in years:
        resp = requests.get(
            'https://archive.org/wayback/available',
            params={'url': url, 'timestamp': f'{year}0601'}
        ).json()
        snap = resp.get('archived_snapshots', {}).get('closest', {})
        if snap:
            snapshots.append({
                'year': year,
                'url': snap['url'],
                'timestamp': snap['timestamp'][:8]
            })
    return snapshots

# Track how a company's site evolved
for s in track_changes('stripe.com'):
    print(f"  {s['year']}: {s['url'][:70]}...")
Enter fullscreen mode Exit fullscreen mode

4. CDX API for Bulk Queries

For serious research, use the CDX Server API:

def get_all_snapshots(url, limit=20):
    """Get all archived snapshots of a URL."""
    resp = requests.get(
        'https://web.archive.org/cdx/search/cdx',
        params={
            'url': url,
            'output': 'json',
            'limit': limit,
            'fl': 'timestamp,statuscode,original'
        }
    ).json()

    if len(resp) <= 1:  # First row is header
        return []

    headers = resp[0]
    return [dict(zip(headers, row)) for row in resp[1:]]

# How many times was python.org archived?
snaps = get_all_snapshots('python.org')
print(f"Found {len(snaps)} snapshots")
for s in snaps[:5]:
    print(f"  {s['timestamp'][:8]} — status {s['statuscode']}")
Enter fullscreen mode Exit fullscreen mode

5. Monitor Competitor Changes

def compare_snapshots(url, date1, date2):
    """Get two snapshots for comparison."""
    snap1 = get_snapshot(url, date1)
    snap2 = get_snapshot(url, date2)

    print(f"Before ({date1}): {snap1}")
    print(f"After  ({date2}): {snap2}")
    print("\nOpen both in browser to compare visually.")

compare_snapshots('airbnb.com', '20200101', '20250101')
Enter fullscreen mode Exit fullscreen mode

Use Cases

  1. Legal: Prove what a website showed on a specific date
  2. Competitive analysis: Track competitor redesigns and messaging changes
  3. Due diligence: Verify claims about company history
  4. Research: Study how the web evolves over time
  5. SEO: Check if a domain previously hosted different content
  6. Journalism: Find deleted pages or changed statements

Tips

  • Add 1-second delays between requests — be nice to the Archive
  • Not every page is archived — popular sites have more snapshots
  • Some sites block archiving via robots.txt
  • Snapshots may not include all images/CSS (pages may look broken)

I explore free APIs that most developers miss. More: GitHub | Writing inquiries: Spinov001@gmail.com

More free tools: 77 Web Scraping Tools & APIs

What's the oldest or most interesting website snapshot you've found on the Wayback Machine? I love finding how big sites looked in their early days. 👇

Top comments (0)