Zenovay

Posted on Feb 17

How We Built a Real-Time Analytics Platform on Cloudflare Workers (Architecture Deep-Dive)

#cloudflare #analytics #architecture #typescript

Most analytics platforms follow a familiar pattern: events land on a centralized server, get queued, and eventually show up in a dashboard minutes later. When we built zenovay.com, we wanted something fundamentally different. Sub-100ms event ingestion globally, real-time dashboards with live visitor counts, and zero cookies. This article walks through the architecture that makes that work.

This is not a product walkthrough. It is a technical deep-dive into the design decisions, tradeoffs, and production lessons of running an analytics platform on Cloudflare Workers.

1. Why Edge-First?

Traditional analytics architectures have a structural problem: the event producer (the visitor's browser) and the event consumer (the analytics server) are physically far apart.

Consider a visitor in Tokyo hitting a site tracked by an analytics service hosted in us-east-1. The tracking pixel request has to cross the Pacific Ocean, hit a load balancer, wait for an available process (or suffer a cold start), write to a database, and return a response. Round-trip latency: 300-800ms. During that time, the browser is potentially blocking on the request, degrading the user experience of the site being tracked.

The edge-first approach eliminates this entirely. With Cloudflare Workers, your event ingestion code runs in 300+ data centers worldwide. That visitor in Tokyo hits a Worker in Tokyo. The response comes back in single-digit milliseconds. No cold starts (Workers use V8 isolates, not containers), no cross-ocean roundtrips, no load balancer hop.

The numbers matter because analytics tracking scripts run on every page load of every tracked site. If your tracking script adds 500ms of latency to page loads, you are degrading the Core Web Vitals of every site that uses your product. At the edge, the overhead drops to effectively zero.

But edge-first introduces its own set of problems. You cannot just INSERT INTO postgres from 300 data centers simultaneously, you need session identification without centralized state, and you need to get real-time data back to dashboards despite your event source being globally distributed. The rest of this article covers how we solved each of these.

2. The Architecture

Here is the high-level data flow:

Browser (tracking script)
    |
    v
Cloudflare Worker (Hono.js)  <-- Edge: event validation, bot detection, geo lookup
    |
    ├──> Cloudflare KV          <-- Hot data: deduplication, rate limits, live counts
    ├──> Workers Analytics Engine <-- High-cardinality event stream (dual-write)
    └──> Supabase (PostgreSQL)   <-- Persistent storage: visitors, page_views, analytics
            |
            v
        Supabase Realtime (WebSocket)
            |
            v
        Dashboard (Next.js)      <-- Live updates via subscription

The API is a single Cloudflare Worker running Hono.js, an ultra-fast web framework designed for edge runtimes. Hono gives us express-style route handlers with middleware chaining, but with zero Node.js dependencies and sub-millisecond router overhead.

The Worker handles everything: tracking event ingestion, authentication (JWT validation via Supabase), analytics queries, billing (Stripe), and cron jobs (daily aggregation, retention cleanup). It is a monolith at the edge, and that is intentional. V8 isolates are cheap, and co-locating concerns eliminates inter-service latency.

Here is a simplified version of the tracking route handler:

import { Hono } from 'hono';
import { createSupabaseClient } from '../services/supabase';

const app = new Hono<{ Bindings: Env }>();

app.post('/:trackingCode', async (c) => {
  const trackingCode = c.req.param('trackingCode');
  const data = await c.req.json();

  // 1. Bot detection (User-Agent + Cloudflare Bot Management signals)
  const cfData = (c.req.raw as any).cf;
  if (cfData?.botManagement?.verifiedBot || isBot(data.user_agent)) {
    return c.json({ error: 'Bot traffic blocked' }, 403);
  }

  // 2. Rate limiting via KV (60 req/10s burst, 5000 req/hr sustained)
  const clientIP = c.req.header('CF-Connecting-IP') || 'unknown';
  const rateLimitKey = `tracking_ratelimit:${clientIP}`;
  // ... KV-based sliding window check ...

  // 3. Geolocation from Cloudflare's edge network (free, no API call)
  const geoData = {
    country_code: cfData?.country,
    city: cfData?.city,
    latitude: cfData?.latitude,
    longitude: cfData?.longitude,
  };

  // 4. Deduplication via KV (5-second window per session+URL)
  const dedupeKey = `dedupe:${websiteId}:${data.session_id}:${data.url}`;
  const recent = await c.env.CACHE.get(dedupeKey);
  if (recent) return c.json(JSON.parse(recent));

  // 5. Write to Supabase + dual-write to Analytics Engine
  const supabase = createSupabaseClient(c.env.SUPABASE_URL, c.env.SUPABASE_SERVICE_KEY);
  await supabase.from('visitors').upsert(visitorRecord);

  // 6. Cache the response for deduplication
  await c.env.CACHE.put(dedupeKey, JSON.stringify(response), {
    expirationTtl: 5
  });

  return c.json(response);
});

A few things to note:

Custom Supabase client. We do not use the official @supabase/supabase-js SDK at runtime. It pulls in too many Node.js dependencies that do not work in the Workers runtime. Instead, we wrote a lightweight PostgrestBuilder class that wraps fetch() calls to Supabase's PostgREST API directly. Same chainable .from().select().eq() syntax, but ~50KB smaller and fully edge-compatible.

Geolocation for free. Cloudflare attaches geolocation data to every request via the cf object, including country, city, latitude, longitude, timezone, and ASN. No external API call needed. This is one of the biggest advantages of running on Workers: you get production-grade geolocation at zero cost and zero latency.

Three KV namespaces. We separate concerns across dedicated KV namespaces: CACHE for real-time data and deduplication, RATE_LIMIT for rate limiting state, and SECURITY for IP reputation scoring. This prevents hot keys in one concern from affecting read latency in another.

3. Session Identification Without Cookies

Privacy-focused analytics means no cookies and no fingerprinting. But you still need to know whether two page views came from the same visitor. Here is how we handle it.

The tracking script generates two IDs on the client side:

session_id: A crypto.randomUUID() stored in sessionStorage. Dies when the tab closes. Represents a single browsing session.
visitor_id: A crypto.randomUUID() stored in localStorage. Persists for 365 days. Represents a returning visitor across sessions.

// Client-side tracking script (simplified)
const sessionId = sessionStorage.getItem('zv_sid')
  || (() => {
    const id = crypto.randomUUID();
    sessionStorage.setItem('zv_sid', id);
    return id;
  })();

const visitorId = localStorage.getItem('zv_vid')
  || (() => {
    const id = crypto.randomUUID();
    localStorage.setItem('zv_vid', id);
    return id;
  })();

This approach has important tradeoffs:

No cross-device tracking. A visitor on their phone and their laptop shows as two different visitors. This is by design. Cross-device tracking requires either login-based identity or invasive fingerprinting, and we chose to avoid both.

No cross-browser tracking. Different browsers on the same device get different visitor_id values because localStorage is browser-scoped. Again, by design.

Private browsing breaks session continuity. Incognito mode clears localStorage on close, so every incognito session is a "new" visitor. This slightly inflates unique visitor counts but maintains privacy.

The accuracy tradeoff is acceptable. In practice, the slight overcounting of unique visitors (estimated 5-15% depending on the audience) is a worthwhile trade for not setting any cookies and not doing any fingerprinting. Site owners get accurate trend data, they can see whether traffic is going up or down, which pages perform best, which countries their visitors come from, without compromising visitor privacy.

For session linking on the server side, when a tracking event arrives, the Worker looks for an existing visitor record matching either the session_id or the visitor_id:

// Server-side session resolution (simplified)
const { data: existingVisitors } = await supabase
  .from('visitors')
  .select('id, visited_at, visitor_id, session_id')
  .eq('website_id', websiteId)
  .gte('visited_at', twentyFourHoursAgo)
  .or(`session_id.eq.${data.session_id},visitor_id.eq.${data.visitor_id}`)
  .order('visited_at', { ascending: false })
  .limit(2);

// Prefer session_id match; fall back to visitor_id for cross-subdomain continuity
const match = existingVisitors?.find(v => v.session_id === data.session_id)
  || existingVisitors?.find(v => v.visitor_id === data.visitor_id);

The OR query eliminates a common N+1 pattern where you would first query by session_id, then separately by visitor_id if no match was found. One query, two potential match paths.

4. Batching Writes and Managing Database Load

A naive implementation would write every tracking event directly to PostgreSQL. At scale, this falls apart quickly. Here is what happens:

Connection exhaustion. Cloudflare Workers can spawn thousands of isolates simultaneously. Each one opening a direct Postgres connection would overwhelm connection limits.
Write amplification. A single visitor session generates many events: initial pageview, heartbeats every 30 seconds, scroll depth updates, page navigation, exit event. Writing each one as a separate INSERT is wasteful.
Hot rows. Updating a visitor record on every heartbeat creates lock contention on the same row.

Our approach combines several strategies:

Upsert instead of insert. Instead of creating a new record for every event, we upsert on session_id. The first event for a session creates the visitor record, subsequent events (heartbeats, page navigations) update it in place:

// Session continuation: UPDATE existing record
await supabase
  .from('visitors')
  .update({
    visited_at: new Date().toISOString(),
    page_url: trackingData.url,
    time_on_page: Math.max(trackingData.time_on_page, existing.time_on_page),
    scroll_depth_percentage: Math.max(trackingData.scroll_depth, existing.scroll_depth),
    had_interaction: trackingData.had_interaction || existing.had_interaction,
  })
  .eq('id', existingVisitor.id);

Note the Math.max calls. This prevents race conditions where a late-arriving heartbeat with older data could overwrite newer values.

KV-based deduplication. Before writing to Supabase, we check Cloudflare KV for a recent write with the same session_id + URL combination. The dedup window is 5 seconds with a TTL, so KV entries clean themselves up:

const dedupeKey = `dedupe:${websiteId}:${sessionId}:${url}`;
const cached = await c.env.CACHE.get(dedupeKey);
if (cached) return c.json(JSON.parse(cached)); // Return cached response

// ... process and write ...

await c.env.CACHE.put(dedupeKey, JSON.stringify(response), {
  expirationTtl: 5
});

Conditional ML recalculation. Each visitor gets a real-time value score (a weighted prediction based on country, device, browser, session behavior). Recalculating this on every heartbeat is expensive. We only recalculate when there is a meaningful signal change:

const shouldRecalculate =
  event !== 'heartbeat' ||           // Always recalc on pageviews
  heartbeatCount % 5 === 0 ||        // Every 5th heartbeat
  scrollDelta >= 20 ||               // Significant scroll change
  (hasInteraction && !hadPrior);     // First interaction detected

This reduces ML computation by roughly 80% on active sessions while keeping scores fresh.

Dual-write to Workers Analytics Engine. For high-cardinality queries (unique visitors by country over 90 days, for example), Supabase can be slow. We dual-write every event to Cloudflare's Workers Analytics Engine (WAE), which is designed for exactly this workload:

export function writeTrackingEvent(
  analytics: AnalyticsEngineDataset,
  event: Partial<WAETrackingEvent> & { trackingCode: string }
): void {
  analytics.writeDataPoint({
    indexes: [event.trackingCode],
    blobs: [
      event.eventType, event.url, event.referrer,
      event.country, event.city, event.browser,
      event.os, event.deviceType, event.sessionId,
      event.visitorId, event.isBot ? 'bot' : 'human',
    ],
    doubles: [
      event.loadTimeMs, event.scrollDepth,
      event.viewportWidth, event.viewportHeight,
      event.pagesViewed, event.valueScore,
    ],
  });
}

WAE writes are fire-and-forget with no backpressure. Cloudflare handles the buffering internally. Queries use SQL via their API, and they are fast even over billions of events because WAE uses columnar storage under the hood.

5. Real-Time Dashboard

The dashboard needs to show live data: visitors currently on the site, a 3D globe with real-time markers, and analytics that update without page refresh. We use a two-layer approach.

Layer 1: KV for instant counts. When a visitor arrives or leaves, the Worker updates a KV counter. The dashboard reads this counter for the live visitor badge:

// In the tracking handler - update live count
const visitorKey = `visitor:${websiteId}:${sessionId}`;
await c.env.CACHE.put(visitorKey, JSON.stringify({
  country: geoData.country_code,
  page: trackingData.url,
  value_score: valueScore,
  last_seen: Date.now()
}), { expirationTtl: 300 }); // Auto-expires after 5 minutes of inactivity

KV reads are globally fast (sub-5ms) because they are served from Cloudflare's edge cache. The 5-minute TTL means inactive visitors automatically disappear without any cleanup logic.

Layer 2: Supabase Realtime for detailed updates. For the full visitor list, the globe visualization, and the activity stream, the dashboard subscribes to Supabase Realtime channels:

// Dashboard component (simplified)
import { createClient } from '@supabase/supabase-js';

const supabase = createClient(SUPABASE_URL, SUPABASE_ANON_KEY);

// Subscribe to new visitors for this website
const channel = supabase
  .channel(`visitors:${websiteId}`)
  .on(
    'postgres_changes',
    {
      event: 'INSERT',
      schema: 'public',
      table: 'visitors',
      filter: `website_id=eq.${websiteId}`,
    },
    (payload) => {
      // New visitor arrived - add marker to globe, update sidebar
      addGlobeMarker({
        lat: payload.new.latitude,
        lng: payload.new.longitude,
        country: payload.new.country_name,
        value_score: payload.new.value_score,
      });
    }
  )
  .subscribe();

Supabase Realtime uses WebSockets under the hood, backed by PostgreSQL's LISTEN/NOTIFY. When the Worker inserts a new visitor record, PostgreSQL fires a notification, Supabase picks it up and pushes it through the WebSocket to all subscribed dashboard clients. Latency from write to dashboard update is typically under 500ms.

The 3D globe itself uses Mapbox GL JS with custom markers. Each new visitor gets a marker at their geographic coordinates, color-coded by value score (green for premium visitors, blue for medium, gray for low). Markers animate in and fade out based on session activity.

6. Performance Results

After running this architecture in production, here are the numbers:

Event ingestion P95: 47ms globally. This includes JSON parsing, bot detection, geolocation lookup (from the cf object, so free), KV dedup check, Supabase write, and response serialization. The Supabase write is the long pole at ~30-40ms, but it happens asynchronously relative to the response in most cases.

Event ingestion P99: 89ms. The tail latency comes from cold KV reads (first read after a key expires) and occasional Supabase connection resets.

Dashboard real-time latency: ~400ms. From the moment a visitor lands on a tracked site to the moment their marker appears on the dashboard globe. This includes Worker processing (~50ms), Supabase write (~40ms), PostgreSQL NOTIFY propagation (~100ms), WebSocket delivery (~100ms), and client-side rendering (~100ms).

Zero cold starts. V8 isolates spin up in under 5ms compared to 200-2000ms for container-based serverless. In practice, Workers are almost always warm because analytics tracking generates continuous traffic across all data centers.

Rate limiting accuracy: 99.7%. KV-based rate limiting has eventual consistency, so there is a small window where a burst can exceed limits. We accept this tradeoff because the alternative (centralized rate limiting) would add 50-100ms of latency to every request.

For comparison, a traditional setup with a Node.js server in us-east-1, writing to RDS PostgreSQL, would see P95 latencies of 200-600ms for visitors outside North America, with cold starts adding 500ms+ after periods of low traffic.

7. Lessons Learned

What worked well:

Hono.js is excellent for Workers. Express-style ergonomics with zero Node.js runtime overhead. The middleware chain pattern (CORS, IP blocklist, rate limit, bot protection, handler) is clean and composable.
Cloudflare's cf object for geolocation. Eliminating the external GeoIP API call removed ~100ms and an external dependency from every request. The data quality is comparable to MaxMind.
KV for deduplication and rate limiting. The global consistency model (write at one edge, readable everywhere within 60s) is perfect for these use cases. We do not need strict consistency. If a dedup check misses in a 60-second window, the worst case is one duplicate write.
Supabase as the persistence layer. PostgreSQL is boring in the best way. RLS policies, real-time subscriptions, and a REST API that works from edge runtimes without an SDK.

What was harder than expected:

Custom Supabase client. The official SDK does not work in Workers due to Node.js dependencies (node:crypto, node:events). Writing a compatible PostgREST client from scratch took a week and introduced subtle bugs around .maybeSingle() behavior and error code parsing.
KV write limits. Cloudflare KV has a 1,000 writes/second limit per namespace. Our rate limiter was writing on every request, quickly hitting this limit. We fixed it by only writing to KV when the request count is within 90% of the limit or when there are active violations, reducing writes by ~95%.
Connection resets to Supabase. Workers do not keep persistent HTTP connections between invocations. Every Supabase call opens a new TCP+TLS connection. At high traffic, this creates connection churn. Supabase handles it well, but P99 latency spikes during connection storms.
Time zone handling at the edge. JavaScript Date in Workers always uses UTC. Converting to the visitor's local timezone for "business hours" detection requires the timezone string from the cf object, and it is not always present.

What we would change:

Workers Analytics Engine from day one. We added WAE as a dual-write later, but it should have been the primary analytics store from the start. SQL queries over WAE are faster than aggregating raw visitor rows in PostgreSQL for any non-trivial time range.
Durable Objects for live counts. We started with KV for live visitor counting, but KV's eventual consistency means the count can be stale by up to 60 seconds. Durable Objects provide strong consistency for counters, we are migrating to this now.
Fewer Supabase queries per tracking request. The current flow makes 3-5 Supabase calls per event (website lookup, existing visitor check, upsert, page view insert, goal detection). Batching these into a single RPC call or using Supabase Edge Functions co-located with the database would cut latency significantly.

Wrapping Up

Building analytics at the edge is not just about speed, it changes what is architecturally possible. When your ingestion latency is under 50ms, you can show visitors on a live 3D globe as they arrive. When there are no cold starts, your tracking script never degrades the sites it monitors. When you run in 300+ locations, geolocation comes free.

The stack (Cloudflare Workers + Hono.js + Supabase + KV) is production-ready for this workload. The rough edges are real (custom SDK clients, KV write limits, connection churn), but they are solvable problems, not architectural dead ends.

If you are building something that needs global sub-100ms response times with real-time downstream updates, this pattern works. The event source runs at the edge, hot state lives in KV, cold state lives in PostgreSQL, and WebSockets bridge the gap to the client.

This architecture powers zenovay.com, the real-time analytics platform we built for privacy-focused website analytics with 3D globe visualization.

DEV Community