Alex

Posted on Oct 16

How I Built a Serverless, Privacy-First Analytics Tool on the Cloudflare Stack

#webdev #javascript #serverless #showdev

I've always been frustrated by the state of web analytics. You're either stuck with a complex, privacy-invasive giant like Google Analytics or a simpler tool that might not give you the deep insights you need. I wanted to build something that hit the sweet spot: powerful, a joy to use, and built on a foundation of absolute privacy.

Today, I want to show you how I built the backend for my project, Gridpoint Analytics, on a completely serverless Cloudflare stack.

Why Go All-In on Cloudflare?

Before diving into the code, let's talk about the "why." I chose this stack for three main reasons:

Performance: Cloudflare's edge network is everywhere. By running my logic as close to the user as possible, data ingestion is incredibly fast.
Simplicity & Cost: No servers to manage, no containers to configure. It's a true serverless experience, and the cost scales beautifully from zero.
Trust: I'm building a privacy-first tool. It felt right to build it on infrastructure that is also deeply invested in building a better, more private web.

The Core Architecture: A Quick Tour

The backend has three main jobs: ingest data, anonymize it, and serve it back to the dashboard. Here’s how each piece works.

1. Data Ingestion (Cloudflare Workers)

When a visitor loads a site with Gridpoint Analytics, our tiny tracking script sends a beacon with non-sensitive data (URL, referrer, screen size) to a Cloudflare Worker endpoint.

The Worker is the front door. It grabs the request data, including the IP address and User-Agent, and prepares it for the most important step: anonymization.

JavaScript

// A simplified look inside the Cloudflare Worker
export default {
  async fetch(request, env) {
    const { pathname, referrer } = new URL(request.url);
    const ip = request.headers.get('CF-Connecting-IP');
    const userAgent = request.headers.get('User-Agent');

    // Pass this data on for processing and storage
    await processAnalyticsData({ ip, userAgent, pathname, referrer, env });

    // Return a 1x1 pixel gif to the client
    return new Response(pixel, { headers: { 'Content-Type': 'image/gif' } });
  }
};

2. The Privacy Layer: Hashing and Anonymization

This is the heart of the system. We never store IP addresses or raw User-Agents. To count unique visitors for a 24-hour period, we create a hash using a daily-rotating salt.

The process is simple:

Concatenate the anonymized IP, User-Agent, and the daily salt.

Hash the resulting string using a fast, non-reversible algorithm like SHA-256.

Store the hash, and immediately discard the original PII.

This hash is unique enough to identify a visitor for one day without ever knowing who they are.

3. Data Storage (Cloudflare D1)

Once the data is anonymized, the Worker writes it to Cloudflare D1, which is essentially SQLite on the edge. For an analytics tool, where you're doing a high volume of writes and aggregated reads, it's a surprisingly good fit.

My primary table schema looks something like this:

SQL

CREATE TABLE pageviews (
  id TEXT PRIMARY KEY,
  timestamp INTEGER NOT NULL,
  pathname TEXT NOT NULL,
  referrer TEXT,
  user_hash TEXT NOT NULL,
  site_id TEXT NOT NULL
);

4. The API (Cloudflare Workers Again)

The dashboard frontend needs a way to get this data back. Another Cloudflare Worker acts as our API. It authenticates the user, takes a request for a specific site and date range, and queries D1 to pull the aggregated data.

JavaScript

// A simplified API endpoint to get pageviews
async function handleApiRequest(request, env) {
  // ... authentication logic ...

  const { siteId, dateRange } = await request.json();
  const ps = env.DB.prepare('SELECT pathname, COUNT(*) as views FROM pageviews WHERE site_id = ? GROUP BY pathname');
  const { results } = await ps.bind(siteId).all();

  return new Response(JSON.stringify(results), { headers: { 'Content-Type': 'application/json' } });
}

Link to the official Cloudflare D1 docs for more info on querying.

Challenges & Next Steps

Building on the edge has been an amazing experience, but it's not without its challenges. Learning the nuances of querying SQLite for complex time-series data was a big one.

This covers the backend, but what about the user-facing part? A great analytics tool should be a joy to use.

In Part 2, I'll break down how I built a fast, beautiful, and insightful dashboard with React, Recharts, and Tailwind CSS. Follow me to get notified when it drops!

Thanks for reading.

Top comments (2)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.