Last year, a malicious package on PyPI stole AWS credentials from thousands of developers. The package name was one typo away from a popular library.
I wanted to check if MY projects were at risk. Turns out, you can build a surprisingly effective supply chain scanner using three free APIs — no authentication required.
The Three Free APIs
- PyPI JSON API — package metadata, versions, maintainers
- GitHub API — repo health, contributor count, last commit
- Libraries.io API — dependency trees, SourceRank scores
Step 1: Check Package Health via PyPI
import requests
from datetime import datetime
def check_pypi_health(package_name):
resp = requests.get(f"https://pypi.org/pypi/{package_name}/json")
if resp.status_code != 200:
return {"package": package_name, "risk": "HIGH", "reason": "Not found"}
data = resp.json()
info = data["info"]
releases = data["releases"]
risks = []
if not info.get("home_page") and not info.get("project_urls"):
risks.append("No homepage or repository link")
if len(releases) < 3:
risks.append(f"Only {len(releases)} releases")
if not info.get("summary") or len(info.get("summary", "")) < 10:
risks.append("Missing description")
if not info.get("author") or info.get("author") == "UNKNOWN":
risks.append("No author information")
risk_level = "LOW" if len(risks) == 0 else "MEDIUM" if len(risks) <= 2 else "HIGH"
return {"package": package_name, "risk_level": risk_level, "signals": risks}
Step 2: Cross-Reference with GitHub
def check_github_health(owner, repo):
resp = requests.get(f"https://api.github.com/repos/{owner}/{repo}")
if resp.status_code != 200:
return {"risk": "HIGH", "reason": "Repo not found"}
data = resp.json()
risks = []
if data["stargazers_count"] < 10:
risks.append(f"Only {data['stargazers_count']} stars")
if data.get("archived"):
risks.append("Repository is archived")
pushed = datetime.strptime(data["pushed_at"], "%Y-%m-%dT%H:%M:%SZ")
days = (datetime.now() - pushed).days
if days > 365:
risks.append(f"No commits in {days} days")
return {"stars": data["stargazers_count"], "days_since_push": days, "risks": risks}
Step 3: Scan Your requirements.txt
import time
def scan_requirements(filepath="requirements.txt"):
with open(filepath) as f:
packages = [
line.strip().split("==")[0].split(">=")[0]
for line in f if line.strip() and not line.startswith("#")
]
results = []
for pkg in packages:
results.append(check_pypi_health(pkg))
time.sleep(0.5) # Be nice to PyPI
risk_order = {"HIGH": 0, "MEDIUM": 1, "LOW": 2}
results.sort(key=lambda r: risk_order.get(r["risk_level"], 3))
for r in results:
icon = {"HIGH": "!!!", "MEDIUM": "[!]", "LOW": "[ok]"}[r["risk_level"]]
signals = ", ".join(r.get("signals", []))
print(f" {icon} {r['package']:<25} {r['risk_level']:<8} {signals}")
high = sum(1 for r in results if r["risk_level"] == "HIGH")
print(f"\nResult: {high} HIGH, {len(results)-high} OK")
return results
Real Output
I ran this on a real project with 12 dependencies:
!!! obscure-utils HIGH Only 1 releases, No author information
[!] some-old-lib MEDIUM No commits in 890 days
[ok] requests LOW
[ok] flask LOW
[ok] pandas LOW
Result: 1 HIGH, 11 OK
That obscure-utils? A typosquat. Removed immediately.
Why This Matters
Supply chain attacks are up 742% since 2022 (Sonatype). Most developers pip install without checking who published it, when it was last updated, or if it has a real repo.
This scanner catches obvious red flags in under 60 seconds.
Limitations
- PyPI API has no official rate limit (but add delays)
- GitHub: 60 req/hour without auth, 5000 with token
- Catches obvious risks, not sophisticated backdoors
- For production: combine with
pip-auditandsafety
I build security tools with free APIs. More projects on GitHub. Writing opportunities: Spinov001@gmail.com
Top comments (0)