Bryan MARTIN

Posted on Jun 29 • Originally published at rektradar.io

We got scraped, so we built a free Ethereum scam API

#ethereum #api #security #web3

By Bryan Martin - founder of RektRadar. Ethereum scam-detection infrastructure since 2024. GitHub - LinkedIn.

A few weeks ago a single machine on Alibaba Cloud (autonomous system AS45102) found our public /v1 endpoints and decided to take everything. Over three sessions it fired 1,911 requests, most of them automated fuzzing: random hex strings where a contract address should be, malformed query params, the usual "let's see what this thing returns" sweep. Our per-IP rate limiter answered 96% of those calls with HTTP 429. Fewer than 1 in 20 got data back.

The easy move is to block the IP and move on. We did block it. But the incident made an obvious point: people clearly want programmatic access to our scam data, and until now the only "API" was whatever public endpoints they reverse-engineered from the app. So we did the other thing too. We documented the surface, put real rate limits and tiers behind it, and shipped it as a proper free API.

This post is the data behind that decision: what the scraper actually did, what the dataset it was reaching for contains, and how the free tier is designed so that abuse like this stays cheap to absorb while honest developers get a useful amount of access for nothing.

Dataset snapshot

Snapshot: 2026-06-29, 13:44 UTC. Counts are straight SELECTs against our token_analysis table on the production database. No estimates.

N_total analyzed Ethereum tokens: 103,954
N flagged scam (risk_score >= 70): 60,328 (58.0%)
N high-confidence (risk_score >= 90): 3,611
Analyzed in the last 7 days: 3,529, of which 2,870 (81.3%) scored as scams

Two numbers stand out. First, 58% of every token we have ever fully scored is a scam by our 70-or-above threshold. Second, on fresh launches the rate is far worse: 81% of the tokens analyzed in the past week crossed the scam line. New ERC-20s on Ethereum skew overwhelmingly toward fraud, and that skew is exactly what the free API exposes to anyone who wants to check an address before they touch it.

The scraper that started this

Here is what 1,911 calls from one IP looks like in aggregate:

Metric	Value
Total requests	1,911
Distinct sessions	3
HTTP 429 (rate-limited)	~96%
Successful data responses	~4%
Source	one IP, Alibaba Cloud AS45102

The pattern was not a careful integration. It was a fuzzer: garbage in the address slot, repeated hammering of the same feed endpoints, no backoff when the 429s started. That is the signature of someone trying to scrape a dataset wholesale rather than look up a token they actually care about.

The rate limiter held. But "the limiter held" is not a product. A developer who genuinely wants our risk score for one contract should not have to guess at undocumented endpoints and get throttled alongside a fuzzer. So the limits got formalized into tiers, and the endpoints got a docs page at rektradar.io/developers.

The tiers: anonymous, free key, paid

Three levels of access, and you can start at the bottom with nothing:

Access	Rate	Monthly quota	Data freshness
Anonymous (no key)	10 req/min	best-effort	feeds delayed ~10 min, token lookups real-time
Free key (one email)	40 req/min	10,000 calls	feeds delayed ~10 min, token lookups real-time
Paid (from 19.99/mo)	up to 300 req/min	50k to 1M	real-time everywhere

Anonymous is deliberately strict at 10 requests per minute: enough to look up the token you are about to ape into, nowhere near enough to mirror our database. That is the level the scraper was hitting, and 10 req/min is why 96% of its calls bounced.

A free email-verified key lifts you to 40 req/min and 10,000 calls per month. That is a real allowance, not a teaser. It comfortably covers a Telegram bot for a small group, a personal dashboard, or a hobby project that checks every new pair in a feed.

The delay is the paywall

We thought hard about how to keep a free tier genuinely useful without giving away the part that costs us the most to produce: real-time intelligence. The answer is freshness, not feature-gating.

Targeted token lookups are real-time for everyone. GET /v1/token/:address and /v1/token/:address/full return the current score and flags with no delay, on anonymous and free keys alike. If you have a specific contract in hand, you get the live verdict.
The activity feeds are delayed on free. GET /v1/rugs, /v1/recent, and /v1/trends run roughly 10 minutes behind on anonymous and free access. You can build a "recent scams" or "fresh rugs" view for free; you just see it 10 minutes after a paid key does.
Paid removes the delay everywhere, adds WebSocket streams and signed webhooks, and pushes the monthly quota from 50,000 up to 1,000,000 calls.

The logic is simple. The value of a rug-pull alert decays by the minute. Charging for the 10 minutes that matter is honest pricing, and it means the free tier is never crippled, only slightly behind.

Passwordless: a key in one email

Getting a key takes one field. You enter an email at app.rektradar.io/api-key, we send a magic link, you click it, the key appears. No password to create, no card to enter, no sales call. The whole point is that the friction of getting a key should be lower than the friction of writing a scraper, so people use the front door.

Anti-abuse that does not punish developers

The scraper incident shaped the abuse controls, and the design rule was: stop wholesale mirroring without adding friction for a normal integrator.

Per-IP key-creation cap of 3 keys per day. You cannot spin up a hundred free keys from one box to multiply your quota. One developer, a handful of keys, is fine; a key farm is not.
Disposable-email block. Throwaway inbox domains are rejected at signup, so the "one email = 10,000 calls" math cannot be gamed with an infinite supply of burner addresses.
Optional per-key IP allowlist, tier-scaled. You can pin a key to the IPs that are allowed to use it. If a key leaks, it is useless from anywhere you did not authorize. Higher tiers get more allowlist entries.

None of these touch the happy path. A developer who signs up with a real email and calls from their server never sees any of it. They exist to make the scraper's economics worse, not the integrator's.

What you can actually query

The free API exposes the same dataset the scraper was reaching for, the 103,954 analyzed tokens in the snapshot above, through a small set of documented endpoints:

GET /v1/token/:address returns a 0-100 risk score and the on-chain red-flag list (honeypot simulation result, ownership state, liquidity, deployer reputation, and more).
GET /v1/token/:address/full adds liquidity and holder distribution.
GET /v1/rugs?since=14d lists recent rug pulls.
GET /v1/recent is the live analysis feed.
GET /v1/deployers/top is the leaderboard of the wallets that ship the most scams.
GET /v1/stats returns the platform-wide counters (tokens scanned, scams detected, deployers mapped).

There is also an official TypeScript SDK if you would rather not write the HTTP plumbing yourself. With a free key, every one of those is callable today.

Limits of our data

Scorer-conditional, not isolated truth. "58% are scams" means 58% of tokens scored 70-or-above on our multi-signal scale. That is our classifier's judgment, not a court verdict. The threshold is a product choice; move it and the percentage moves.
Selection bias in what gets analyzed. A contract enters token_analysis only when our mempool-watcher or factory-watcher sees it deploy with enough liquidity to matter. Tokens launched through obscure paths, or with no real pool, are under-represented. The dataset describes the tradeable long tail of Ethereum, not literally every contract.
A moving target. The 81% scam rate on last-week launches is a 7-day window ending 2026-06-29. Scam techniques drift, campaigns spike and fade, and these numbers will read differently next month. Treat them as a snapshot, not a constant.
Rate-limit counts are approximate. The 1,911 calls and 96% 429 figures come from our nginx access logs for one source IP across three observed sessions. Session boundaries are inferred from gaps in activity, so "3 sessions" is a reasonable grouping, not a hard count.

TL;DR

A single Alibaba Cloud IP fuzzed our public API 1,911 times across 3 sessions; 96% were rate-limited (HTTP 429).
Instead of only blocking it, we shipped a documented free Ethereum scam-detection API over the same dataset: 103,954 analyzed tokens, 60,328 (58%) flagged scam.
Three tiers: anonymous 10 req/min (no signup), free key 40 req/min + 10,000 calls/month (one email, magic link, no card), paid real-time from 19.99/mo.
The delay is the paywall: targeted token lookups are real-time for everyone; activity feeds run ~10 min behind on free and live on paid.
Anti-abuse (3 keys/IP/day, disposable-email block, optional IP allowlist) targets scrapers, not developers.

Get a free key at rektradar.io/developers and you can query the score, flags, and deployer history for any Ethereum contract in one request, no signup required to start.

DEV Community