DEV Community

Zenovay
Zenovay

Posted on

Scoring anonymous web visitors without storing personal data

Most lead scoring assumes you know who the visitor is. We wanted to score how likely an anonymous session is to convert without storing anything that identifies a person. No third party cookie, no enrichment vendor, no PII. Here is how we did it for our analytics tool, including the parts that did not work.

The goal

Given a live, anonymous session, output a number from 0 to 100 that says "pay attention to this one". The constraint: we never store who the person is. We score the behavior of a session id that rotates, not a profile.

The signals that actually correlate

We tested a lot of inputs against real conversion data. The ones that held up:

type SessionFeatures = {
  returnSessions: number;       // distinct prior sessions. strongest signal by far
  viewedPricing: boolean;
  pricingThenReturned: boolean; // viewed pricing, left, came back
  docsDepth: number;            // scroll + pages on docs/integrations
  pathShape: "browse" | "direct" | "compare";
};
Enter fullscreen mode Exit fullscreen mode

The ones we dropped because they were noise: raw time on page (a backgrounded tab destroys it), single session scroll depth on its own, and device type (much weaker than people claim).

The scoring function

For low volume sites we do not use a model. Weighted rules beat a model until you have enough data to train on:

function scoreSession(f: SessionFeatures): number {
  let score = 0;
  score += Math.min(f.returnSessions, 5) * 12;   // return visits dominate
  if (f.viewedPricing) score += 10;
  if (f.pricingThenReturned) score += 20;         // intent
  score += Math.min(f.docsDepth, 100) * 0.2;
  if (f.pathShape === "compare") score += 8;
  return Math.min(Math.round(score), 100);
}
Enter fullscreen mode Exit fullscreen mode

It is deliberately boring. A heuristic you can explain beats a black box you cannot, especially when a customer asks "why is this session a 78".

Running it at the edge on the event stream

Scoring runs on each event as it arrives, so the score updates in near real time instead of in a nightly batch:

export default {
  async fetch(req: Request, env: Env) {
    const event = await req.json<AnalyticsEvent>();
    const features = await loadSessionFeatures(env, event.sessionId);
    const updated = applyEvent(features, event);
    const score = scoreSession(updated);
    await env.SESSIONS.put(event.sessionId, JSON.stringify({ ...updated, score }), {
      expirationTtl: 60 * 30, // sessions expire, nothing persists about a person
    });
    return new Response(JSON.stringify({ score }));
  },
};
Enter fullscreen mode Exit fullscreen mode

When we switch to a learned model

Only once a site has enough conversions to train on. Below that, a model overfits to a handful of events and is worse than the rules. We gate it on volume, not on whether it sounds impressive.

The honest limitations

  • You are scoring a session, not a person. A high score might be a competitor doing research.
  • No identity means no true cross device. We do not pretend otherwise.
  • Low volume sites get scores barely better than a coin flip, and the product says so rather than faking confidence. ## Takeaway

You can get genuinely useful visitor scoring without cookies or PII, but start with explainable weighted rules, not a model. The model is the last step, not the first.


Disclosure: my co founder and I build a web analytics tool called Zenovay that does this. The approach above is what we actually shipped. See zenovay.com

Top comments (0)