A client once asked me to prove that a competitor copied their landing page design. I needed to show what both sites looked like 2 years ago.
The Internet Archive's Wayback Machine has a free API that lets you retrieve any website's snapshot history — programmatically.
The API
Base URL: https://archive.org/wayback/available
No API key. No auth. No rate limit docs (but be reasonable).
1. Check If a URL Has Been Archived
import requests
def check_archive(url):
resp = requests.get(
'https://archive.org/wayback/available',
params={'url': url}
).json()
snapshot = resp.get('archived_snapshots', {}).get('closest', {})
if snapshot:
return {
'available': True,
'url': snapshot['url'],
'timestamp': snapshot['timestamp'],
'status': snapshot['status']
}
return {'available': False}
# Check any website
result = check_archive('github.com')
print(result)
# {'available': True, 'url': 'http://web.archive.org/web/20260324.../github.com', ...}
2. Get a Specific Date's Snapshot
def get_snapshot(url, date='20200101'):
"""Get closest snapshot to a specific date. Format: YYYYMMDD."""
resp = requests.get(
'https://archive.org/wayback/available',
params={'url': url, 'timestamp': date}
).json()
snapshot = resp.get('archived_snapshots', {}).get('closest', {})
return snapshot.get('url', 'Not found')
# What did Google look like on Jan 1, 2015?
print(get_snapshot('google.com', '20150101'))
3. Track Design Changes Over Time
def track_changes(url, years=range(2015, 2027)):
"""Get one snapshot per year to track evolution."""
snapshots = []
for year in years:
resp = requests.get(
'https://archive.org/wayback/available',
params={'url': url, 'timestamp': f'{year}0601'}
).json()
snap = resp.get('archived_snapshots', {}).get('closest', {})
if snap:
snapshots.append({
'year': year,
'url': snap['url'],
'timestamp': snap['timestamp'][:8]
})
return snapshots
# Track how a company's site evolved
for s in track_changes('stripe.com'):
print(f" {s['year']}: {s['url'][:70]}...")
4. CDX API for Bulk Queries
For serious research, use the CDX Server API:
def get_all_snapshots(url, limit=20):
"""Get all archived snapshots of a URL."""
resp = requests.get(
'https://web.archive.org/cdx/search/cdx',
params={
'url': url,
'output': 'json',
'limit': limit,
'fl': 'timestamp,statuscode,original'
}
).json()
if len(resp) <= 1: # First row is header
return []
headers = resp[0]
return [dict(zip(headers, row)) for row in resp[1:]]
# How many times was python.org archived?
snaps = get_all_snapshots('python.org')
print(f"Found {len(snaps)} snapshots")
for s in snaps[:5]:
print(f" {s['timestamp'][:8]} — status {s['statuscode']}")
5. Monitor Competitor Changes
def compare_snapshots(url, date1, date2):
"""Get two snapshots for comparison."""
snap1 = get_snapshot(url, date1)
snap2 = get_snapshot(url, date2)
print(f"Before ({date1}): {snap1}")
print(f"After ({date2}): {snap2}")
print("\nOpen both in browser to compare visually.")
compare_snapshots('airbnb.com', '20200101', '20250101')
Use Cases
- Legal: Prove what a website showed on a specific date
- Competitive analysis: Track competitor redesigns and messaging changes
- Due diligence: Verify claims about company history
- Research: Study how the web evolves over time
- SEO: Check if a domain previously hosted different content
- Journalism: Find deleted pages or changed statements
Tips
- Add 1-second delays between requests — be nice to the Archive
- Not every page is archived — popular sites have more snapshots
- Some sites block archiving via robots.txt
- Snapshots may not include all images/CSS (pages may look broken)
I explore free APIs that most developers miss. More: GitHub | Writing inquiries: Spinov001@gmail.com
More free tools: 77 Web Scraping Tools & APIs
What's the oldest or most interesting website snapshot you've found on the Wayback Machine? I love finding how big sites looked in their early days. 👇
Top comments (0)