The Wayback Machine has been archiving the internet since 1996. It has 800 billion web pages stored.
What most developers don't know: it has a free API. No key needed. You can programmatically check how any URL looked on any date.
Quick Start
import requests
def wayback_check(url):
"""Check if a URL exists in the Wayback Machine."""
api = f'https://archive.org/wayback/available?url={url}'
resp = requests.get(api)
data = resp.json()
snapshot = data.get('archived_snapshots', {}).get('closest', {})
if snapshot:
return {
'archived': True,
'url': snapshot['url'],
'timestamp': snapshot['timestamp'],
'status': snapshot['status']
}
return {'archived': False}
result = wayback_check('google.com')
print(f'Archived: {result["archived"]}')
if result['archived']:
print(f'Snapshot URL: {result["url"]}')
Use Cases
1. Competitive Intelligence
See how a competitor's pricing page changed over time:
def get_snapshots(url, limit=10):
"""Get all archived snapshots of a URL."""
cdx_api = f'https://web.archive.org/cdx/search/cdx?url={url}&output=json&limit={limit}'
resp = requests.get(cdx_api)
rows = resp.json()
if len(rows) < 2:
return []
headers = rows[0]
return [dict(zip(headers, row)) for row in rows[1:]]
# See every version of a competitor's pricing page
snapshots = get_snapshots('competitor.com/pricing', limit=20)
for s in snapshots:
print(f'{s["timestamp"]} — {s["statuscode"]} — {s["length"]} bytes')
2. Check If a Domain Was Previously Used
Buying a domain? Check what it was used for before:
snapshots = get_snapshots('domain-you-want-to-buy.com', limit=5)
for s in snapshots:
archive_url = f'https://web.archive.org/web/{s["timestamp"]}/{s["original"]}'
print(f'{s["timestamp"][:4]}: {archive_url}')
3. Recover Lost Content
Website went down? Blog post deleted? Check if the Wayback Machine has it:
result = wayback_check('deleted-blog.com/important-article')
if result['archived']:
print(f'Found it! {result["url"]}')
# Download the page
page = requests.get(result['url'])
with open('recovered.html', 'w') as f:
f.write(page.text)
4. SEO: Track How a Site Evolved
# How many times was a URL archived per year?
from collections import Counter
snapshots = get_snapshots('example.com', limit=1000)
years = Counter(s['timestamp'][:4] for s in snapshots)
for year, count in sorted(years.items()):
bar = '█' * (count // 5)
print(f'{year}: {count:4d} snapshots {bar}')
CDX API (The Power User API)
The CDX API is more powerful than the basic availability API:
https://web.archive.org/cdx/search/cdx?url=example.com&output=json
Parameters:
-
url— URL to search -
matchType— exact, prefix, host, domain -
from/to— date range (YYYYMMDD) -
limit— max results -
output— json, text, csv -
filter— statuscode, mimetype, etc.
Rate Limits
- No API key needed
- Be respectful: 1-2 requests/second
- Large queries may timeout — use
limitparameter - Full dataset available for bulk download
What do you use the Wayback Machine for?
I've used it for competitive analysis and recovering deleted blog posts. What's the most creative use you've seen? Share in the comments.
More free APIs: Awesome Free APIs 2026 — 300+ APIs, no key needed
Web scraping tools: Awesome Web Scraping 2026
Top comments (0)