DEV Community

Yuchen Lin
Yuchen Lin

Posted on

Building FlowLens-Web: A HAR-Driven Data-Flow Observatory for Tracking Research

I wanted a practical answer to one question:

How do we measure web tracking signals in a way that is reproducible, explainable, and non-invasive?

This post walks through the approach, what we built, and what we learned from a 10-site batch run.

TL;DR

FlowLens-Web is a TypeScript CLI that:

  • records browser sessions with Playwright + HAR,
  • extracts identifier-like request signals,
  • scores evidence levels (L1-L5),
  • reports cross-domain reuse and cross-run persistence,
  • outputs Markdown + Mermaid summaries.

It is a research/measurement tool, not a blocker.

Architecture

Core stack:

  • Node.js + TypeScript
  • Playwright (Chromium)
  • tldts (eTLD+1 classification)
  • SHA-256 hashing for safe identifier matching

Pipeline:

  1. run scripted browsing scenario
  2. save HAR
  3. parse entries + normalize request metadata
  4. extract candidate identifier fields
  5. compute reuse/persistence signals
  6. assign evidence levels
  7. generate reports (case, matrix, A/B, funnel, longitudinal)

Evidence Model

We use explicit confidence tiers:

  • L1: third-party domain observed
  • L2: identifier-like field observed
  • L3: repeated within run
  • L4: cross-domain hash reuse
  • L5: cross-run persistence

This keeps interpretation honest: higher level = stronger network evidence, not guaranteed ad-decision proof.

CLI Workflows

Matrix (multi-site)

npm run flowlens -- study-matrix \
  --sites https://www.google.com,https://www.youtube.com \
  --scenarios baseline,engaged,ad-click \
  --runs 3
Enter fullscreen mode Exit fullscreen mode

A/B (causal contrast)

npm run flowlens -- study-ab \
  --url https://www.youtube.com \
  --control baseline \
  --treatment ad-click \
  --runs 3
Enter fullscreen mode Exit fullscreen mode

Funnel (stage deltas)

npm run flowlens -- study-funnel \
  --url https://www.google.com \
  --query running+shoes \
  --runs 3
Enter fullscreen mode Exit fullscreen mode

Longitudinal (stability over samples)

npm run flowlens -- study-longitudinal \
  --url https://www.wikipedia.org \
  --samples 7 \
  --runs 1
Enter fullscreen mode Exit fullscreen mode

Full-Batch Findings (Current Run)

Batch design:

  • 10 sites
  • 3 scenarios
  • target 3 runs/scenario

Outcome:

  • 9/10 sites produced complete scenario outputs
  • Amazon repeatedly failed under runtime constraints in this environment (timeouts/session closure), and was kept as explicit failed evidence

Pattern-level observations:

  • signal intensity varied strongly by site/scenario
  • deeper interaction stages often increased observed signal metrics
  • some content-centric cases remained low-signal across repeated runs

Why the Redaction Layer Matters

Raw tokens are not published.
Instead, FlowLens stores:

  • redacted preview
  • token length
  • stable hash for equality/reuse checks

That gives us reproducibility without leaking sensitive raw values.

What You Can Claim Responsibly

From this tooling and dataset, you can claim:

  • network-observed data-flow signals vary by context,
  • controlled behavior changes can shift measured signals,
  • reuse/persistence patterns are measurable in a repeatable way.

You cannot claim from network traces alone:

  • definitive platform-internal ad decision logic,
  • person-level identity resolution.

Engineering Notes

What worked well:

  • modular analysis pipeline
  • evidence-level abstraction for communication quality
  • matrix/funnel/A-B/longitudinal complement each other

What remains hard:

  • large-site reliability under fixed timeouts
  • anti-bot/session constraints
  • balancing coverage vs runtime cost

Read the Full Materials

  • Repository: https://github.com/yul761/FlowLens
  • Full-batch summary: data/reports/published/formal-v1-full-overall-summary.md
  • Academic-style article: data/reports/published/public-v1-academic-article.md

If You Want to Build on This

Next useful extensions:

  1. stronger single-variable controls (consent, login, click-id toggles)
  2. bootstrap confidence intervals on key deltas
  3. cross-environment runs (device profile/region)
  4. publication-grade data manifests

Closing

A lot of tracking debates are stuck between oversimplified claims and opaque internals.
A HAR-first, evidence-tier approach gives a practical middle path: measurable, repeatable, and honest about uncertainty.

Top comments (0)