Last month, I discovered something strange while building a security tool.
VirusTotal lets you scan files for malware — for free. They analyze each file against 70+ antivirus engines. The compute cost per scan must be enormous. Yet their API is free for up to 500 requests per day.
Why would a company give that away?
I went down a rabbit hole and found a pattern that explains why some of the most valuable data on the internet is available through free APIs.
The Pattern: Free Data, Paid Intelligence
Here are real examples:
| Company | Free API | What They Sell |
|---|---|---|
| VirusTotal (Google) | 500 scans/day | Enterprise threat intelligence |
| Shodan | 1 query/sec | Bulk data + monitoring |
| Have I Been Pwned | Email breach checks | Domain-wide monitoring |
| GitHub | Full repo/user data | Copilot, Actions, Enterprise |
| Public JSON endpoints | Advertising platform |
The free API is not the product. You are the product — or rather, your data enriches their product.
Every time you scan a file on VirusTotal, you're training their malware detection. Every Shodan query helps them map the internet. Every HIBP check validates their breach database.
What This Means for Developers
This is actually great news. These companies are incentivized to keep their APIs free and reliable because:
- More users = better data — network effects work in your favor
- API stability matters — breaking changes lose them data contributors
- Documentation stays good — they want you to integrate, not give up
The APIs I Use Most (and Why)
For Security Projects
import requests
# Check if an email has been in a data breach
def check_breach(email):
resp = requests.get(
f"https://haveibeenpwned.com/api/v3/breachedaccount/{email}",
headers={"User-Agent": "SecurityAudit/1.0"}
)
return resp.json() if resp.status_code == 200 else []
For Data Projects
# Get Reddit posts without API key
def get_reddit_posts(subreddit, sort="hot", limit=100):
url = f"https://www.reddit.com/r/{subreddit}/{sort}.json?limit={limit}"
resp = requests.get(url, headers={"User-Agent": "DataResearch/1.0"})
return [p["data"] for p in resp.json()["data"]["children"]]
For Competitive Intelligence
# Search npm packages for market research
def search_npm(keyword):
resp = requests.get(f"https://registry.npmjs.org/-/v1/search?text={keyword}&size=20")
packages = resp.json()["objects"]
return [{"name": p["package"]["name"],
"downloads": p["score"]["detail"]["popularity"]}
for p in packages]
The Uncomfortable Truth
Free APIs are a business model, not charity. Understanding this helps you:
- Predict which APIs will stay free — if your usage enriches their core product, it's safe
- Predict which will get locked down — if they're losing money on every request, expect changes (looking at you, Twitter)
- Build more resilient products — always have a fallback for APIs that might go paid
Reddit's recent API pricing change? They realized the data was worth more than the goodwill. Twitter's API lockdown? Same story.
The APIs that stay free are the ones where free users create more value than they consume.
What's Your Experience?
Have you been burned by an API going paid? Found a surprisingly generous free API? I'd love to hear about it.
I maintain a list of 300+ free APIs — always looking for ones I've missed.
More API deep-dives on my GitHub and Dev.to profile.
Top comments (0)