DEV Community

Davis
Davis

Posted on

I built a link preview API on Cloudflare Workers — and learned KV is not a counter

I shipped a link preview API in a day on Cloudflare Workers — and the most interesting bug had nothing to do with HTML parsing.

What I built

LinkPeek does one thing: give it a URL, get back clean JSON with everything you need to render a link card — title, description, images, favicon, site name, canonical URL, RSS/Atom feeds, oEmbed endpoint, and the full OpenGraph + Twitter Card maps.

curl "https://linkpeek.dpears.workers.dev/v1/preview?url=https://github.com"
Enter fullscreen mode Exit fullscreen mode
{
  "title": "GitHub · Change is constant. GitHub keeps you ahead.",
  "siteName": "GitHub",
  "image": "https://images.ctfassets.net/.../GH-Homepage-Universe-img.png",
  "favicon": "https://github.com/fluidicon.png",
  "og": { "site_name": "GitHub", "...": "..." },
  "twitter": { "card": "summary_large_image", "...": "..." }
}
Enter fullscreen mode Exit fullscreen mode

There's also a tiny zero-dep client on npm: linkpeek-client.

Why Workers is a great fit for this

HTMLRewriter is the killer feature. Parsing arbitrary HTML on an edge function sounds expensive, but HTMLRewriter is a streaming parser — you register element handlers and it processes the response body as it flows through:

const rewriter = new HTMLRewriter()
  .on("meta", metaHandler)
  .on("title", { text(t) { titleText += t.text; } })
  .on("link", feedAndFaviconHandler);
Enter fullscreen mode Exit fullscreen mode

I cap parsing at 1MB and cancel the stream after that — a page's metadata lives in <head>, so there's no reason to chew through a 20MB page.

KV as a response cache works great. 24h TTL, keyed by URL. Repeat lookups return in ~30ms globally.

The bug worth writing about: KV is not a counter

For the free tier I wanted a simple per-IP daily quota. First implementation: read a counter from KV, increment, write it back.

It enforced nothing. I fired 27 sequential requests at it during verification and every one returned 200.

KV is eventually consistent — reads can be served from a stale edge cache for up to 60 seconds. A burst of requests all read the same stale "0", increment to "1", and last-write-wins. Your counter crawls while traffic flies.

The fix: Workers has a purpose-built Rate Limiting binding that does accurate per-colo counting:

[[unsafe.bindings]]
name = "ANON_LIMITER"
type = "ratelimit"
namespace_id = "1001"
simple = { limit = 10, period = 60 }
Enter fullscreen mode Exit fullscreen mode
const { success } = await env.ANON_LIMITER.limit({ key: clientIP });
if (!success) return json({ error: "Rate limit exceeded…" }, 429);
Enter fullscreen mode Exit fullscreen mode

After the fix, a parallel burst of 16 requests: 5×200, 11×429. I kept the KV daily counter as a slow backstop — it does converge, just not fast enough to stop bursts on its own.

Lesson: use KV for caching, use the rate-limit binding (or Durable Objects) for counting.

Other things that mattered

  • SSRF guard: reject localhost, RFC-1918 ranges, .local/.internal hosts before fetching. An URL-fetching API is an SSRF machine if you skip this.
  • Honest status reporting: sites behind aggressive bot protection (e.g. Stack Overflow) return their challenge page. LinkPeek reports the target's real status instead of pretending.
  • Non-HTML targets: a HEAD-ish fallback returns type: "file" with content type for images/PDFs instead of erroring.

Try it

Happy to answer questions about HTMLRewriter, the rate-limit binding, or Workers KV quirks in the comments.

Top comments (0)