DEV Community

Cover image for How I Finally Finished InsightTrack — My Abandoned Self-Hosted Analytics Platform
Nishikanta Ray
Nishikanta Ray

Posted on

How I Finally Finished InsightTrack — My Abandoned Self-Hosted Analytics Platform

GitHub “Finish-Up-A-Thon” Challenge Submission

This is a submission for the GitHub Finish-Up-A-Thon Challenge


Project Details

Name InsightTrack
GitHub github.com/nishikantaray/insighttrack
Stack React 18 · Vite 5 · Tailwind CSS · Express 4 · Node.js 20 · PostgreSQL · DuckDB

Demo

Demo1

Demo 2

Demo 3

What I Built

InsightTrack is a self-hosted, privacy-first web analytics platform — my personal answer to the question "Why do I have to send all my users' data to Google just to know how many people visited my site?"

It tracks pageviews, sessions, conversions, user flows, and engagement — all on your own infrastructure, with no cookies and no third-party data sharing. The architecture uses PostgreSQL for all writes and DuckDB as an embedded columnar OLAP engine for analytics reads, achieving 10-100× the query speed of a traditional row-store setup.

I started InsightTrack during a long weekend when I was frustrated with the privacy implications of Google Analytics and the price tags on alternatives like Plausible and Fathom. The idea was simple: build a clean, self-hostable dashboard that you could drop a <script> tag into and be live in 5 minutes.

I got a working v1 out. It had a dashboard, auth, an onboarding wizard, basic charts, funnels, and a Docker Compose file that actually ran. Then I stopped. The project sat in a local folder for months — almost good enough, but not quite. The DuckDB sync was fragile. There was no dark mode on half the pages. Dozens of planned features were never built. Query performance fell apart past 30 days of data.

The bones were good. I just never gave it the finish it deserved.

How it works

Your Website
  └── analytics.js (tracking script, ~2 KB)
        │  POST /api/track/event
        ▼
  Express API (Node.js)
        │  Writes
        ▼
  PostgreSQL ──── Incremental sync worker (every 5 min)
                        │
                        ▼
                    DuckDB (OLAP reads)
                        │
                        ▼
  React Dashboard ◀─── GET /api/analytics/*
Enter fullscreen mode Exit fullscreen mode
  • Write path: The tracking script sends events via navigator.sendBeacon. Express validates and inserts into PostgreSQL. Response is immediate — DuckDB is never in the write path.
  • Sync path: A high-watermark worker copies new rows from PostgreSQL → DuckDB incrementally, never rebuilding from scratch.
  • Read path: All 17 analytics endpoints read exclusively from DuckDB, running columnar aggregations 10-100× faster than a row-store query.
  • Auth: JWT-based authentication. All analytics routes are protected. Only /api/track/* and /api/auth/* are public.

PostgreSQL schema (source of truth)

Table Purpose
events Raw event log — pageviews, clicks, custom events
sessions Session metadata — duration, page count, entry/exit pages
sites Registered websites and their tracking keys
funnels Funnel definitions with ordered steps
daily_stats Pre-aggregated daily rollups
users Dashboard user accounts

Demo

Landing Page

Hero
Landing Hero

Features Grid
Features

How It Works
How It Works

Tech Stack
Tech Stack

Comparison Table vs Google Analytics
Comparison Table

Footer & CTA
Footer

Full landing page ![Full Landing Page](https://raw.githubusercontent.com/nishikantaray/insighttrack/main/screenshots/01-landing-full.png)

Authentication & Onboarding

Login Register
Login Register

Onboarding


Dashboard

Light Mode Dark Mode
Dashboard Dark Mode
KPI Cards + Sparklines Traffic Charts
KPI Cards Charts

Dashboard Bottom — Countries & Funnels


Analytics Pages

User Flow (Sankey) Real-Time
User Flow Realtime
Funnels Settings
Funnels Settings
Top Pages Profile
Pages Profile
Documentation API Reference
Docs API Reference
Sidebar Dark Landing
Sidebar Dark Landing

Stack: React 18 + Vite 5 + Tailwind CSS 3 · Express 4 + Node.js 20 · PostgreSQL + DuckDB · Docker Compose


The Comeback Story

Where it was (v1)

v1 had the core loop working: embed a tracking script → events land in PostgreSQL → a sync worker copies rows into a flat DuckDB table → the React dashboard reads from DuckDB. The fundamental architecture was sound.

What v1 shipped:

  • Dashboard with 4 KPI cards (Visitors, Pageviews, Bounce Rate, Avg. Session)
  • Traffic and pageview time-series charts (7d / 30d / 90d presets)
  • Top pages table
  • Basic conversion funnels
  • Device and country breakdown
  • JWT authentication, register/login, onboarding wizard with tracking snippet
  • Docker Compose deployment

But it had serious rough edges.

Performance wall. The DuckDB sync layer used a single flat table. On 90+ days of data, a KPI query scanned every row — ~620 ms on a local M1 machine. There was no partitioning, no archival strategy, and a crash meant rebuilding the entire table from scratch.

Half-finished features. There was a long list of planned but never-built ideas: UTM campaign tracking, comparison mode on charts, user flow analysis, a live visitor map on the Realtime page, engagement tracking, goal tracking, audience analytics, export to CSV/JSON — all unshipped.

UX gaps. The Settings page was a single form. The Documentation page was a placeholder. Half the pages lacked dark mode. There was no contextual help anywhere — a new user landing on the Funnels page had no idea what they were looking at.

No marketing story. The project had no landing page, no comparison table, no "why this instead of Google Analytics" pitch. It existed but didn't explain itself.


What changed (v1 → v2)

1. Hot+Cold Analytics Architecture

This was the most technically significant change. I replaced the flat DuckDB sync with a two-tier data lake:

  • Hot tier — DuckDB in-memory tables holding the last 30 days of events and sessions
  • Cold tier — Hive-partitioned Parquet files on disk for historical data, organized by site_id and event_date
data-lake/events/
  site_id=abc/
    event_date=2026-01-01/
      part-0001.parquet
    event_date=2026-01-02/
      part-0001.parquet
Enter fullscreen mode Exit fullscreen mode

A transparent UNION ALL view sits on top of both tiers, so every existing dashboard query kept working without modification:

CREATE OR REPLACE VIEW events AS
  SELECT * FROM events_hot
  UNION ALL
  SELECT * FROM read_parquet(
    'data-lake/events/site_id=*/event_date=*/part-0001.parquet',
    hive_partitioning = true
  );
Enter fullscreen mode Exit fullscreen mode

A dual-watermark incremental sync worker runs every 5 minutes, fetching only new rows from PostgreSQL using high-water marks (last_event_id for events, last_synced timestamp for sessions), then routing them to the correct tier based on age.

Result: 90-day queries dropped from ~620 ms → ~25 ms. A 25× speedup.

Recovery after a crash now resumes from the watermark instead of rebuilding the entire table.


2. Tracking Script — what it captures automatically

The auto-generated analytics.js (~2 KB) handles everything without configuration:

Data Point How
Page URL & path window.location.href
Referrer document.referrer
Browser, OS, device navigator.userAgent + screen width heuristics
Country Timezone via Intl.DateTimeFormat
Session Auto UUID in sessionStorage
Anonymous user ID Random UUID in localStorage — no cookies
Outbound link clicks click event listener
Scroll depth 25/50/75/100% milestones on page unload
Time on page Seconds per page sent on unload
Heatmap clicks x/y coordinates + CSS selector of every click
Rage clicks 3+ rapid clicks on same element within 1 second
Core Web Vitals LCP, FID, CLS, INP, TTFB via PerformanceObserver
JS errors window.onerror + unhandledrejection capture
Site search Form submit interception
UTM parameters utm_source, utm_medium, utm_campaign, utm_term, utm_content
SPA navigation popstate + pushState / replaceState listeners

Privacy built-in: DNT (navigator.doNotTrack) and GPC (navigator.globalPrivacyControl) are respected — if either is set, no data is collected.

Custom events are available via window.analytics.track('event_name', { ...properties }).


3. Complete v2 feature set

Core Dashboard

The main dashboard now has comparison mode — every chart can overlay the current period against the previous period. All 4 KPI cards have sparkline mini-charts. Date picker supports custom ranges in addition to presets.

Real-Time

  • Active visitor count (users in last 5 minutes, with pulsing indicator)
  • Interactive world map pinpointing active visitors by country
  • Live event stream — feed of the last 50–100 events as they happen
  • Active pages ranked by current visitor count
  • Real-time device breakdown

Engagement Analytics

Four metrics measured automatically by the tracking script:

Metric What it shows
Scroll Depth Per-page average scroll %; milestone table (25/50/75/100%)
Click Heatmaps x/y dot map per page, top clicked elements ranked
Rage Click Detection Pages and elements triggering 3+ rapid clicks within 1 second
Time on Page Average, median, min/max read time per page

User Flow & Funnels

  • Sankey diagram showing entry pages → transitions → exit pages with proportional path thickness
  • Multi-step funnel builder (e.g., Landing → Signup → Checkout → Purchase) with per-stage drop-off and conversion rate
  • Saved funnel definitions

Conversion & Goals

  • Goal types: Pageview (URL match), Event (custom event), Duration (time on page), Scroll Depth (threshold)
  • A/B test tracking — create experiments, assign variants, compare conversion rates
  • Revenue attribution — attach dollar values to purchase events, see total revenue and average order value by traffic source

Audience Analytics

  • New vs. returning visitors — pie/donut chart + daily trend
  • Cohort analysis — heatmap grid of Day 1 / Day 7 / Day 30 retention by first-visit week
  • Visitor segments — filter by device, browser, OS, country and compare KPIs side by side

Content Analytics

  • Entry pages — top landing pages ranked by visitor count and entry rate
  • Exit pages — top departure pages ranked by exit count and exit rate
  • Page flow — where visitors go after each page
  • Site search — queries captured from form submit interception with frequency counts

Acquisition Analytics

  • Full UTM campaign dashboard — source, medium, campaign, term, content breakdowns with visitor counts, bounce rates, and session duration
  • Social media tab — automatic classification of Facebook, Twitter/X, LinkedIn, Instagram, YouTube, Reddit, Pinterest, TikTok traffic
  • Search keyword tracking — organic and paid keywords driving traffic
  • UTM link builder — generate tagged URLs directly in the dashboard

Performance Monitoring

Core Web Vitals collected automatically:

Metric Good Needs Improvement Poor
LCP (Largest Contentful Paint) ≤ 2.5s ≤ 4.0s > 4.0s
FID (First Input Delay) ≤ 100ms ≤ 300ms > 300ms
CLS (Cumulative Layout Shift) ≤ 0.1 ≤ 0.25 > 0.25
INP (Interaction to Next Paint) ≤ 200ms ≤ 500ms > 500ms
TTFB (Time to First Byte) ≤ 800ms ≤ 1800ms > 1800ms

Color-coded metric cards (green/yellow/red) + per-page breakdown table.

JavaScript error tracking: message, source file, line/column, stack trace, error trend chart over time.

Reporting & Export

  • Data export — JSON/CSV download for KPI, traffic, pages, sources data; PNG export for charts
  • Annotations — pin notes to specific dates on charts (e.g., "launched new homepage") with category badges (deployment/marketing/incident)
  • Scheduled reports — configure daily/weekly/monthly email delivery with recipient list
  • Custom dashboards — save personalized widget layouts

Multi-Site Management

  • Add and manage unlimited websites under one account
  • Per-site tracking snippets with one-click copy
  • Per-site data retention policies (30/90/180/365 days)
  • Automated cleanup of expired events and sessions

4. UX and documentation overhaul

  • Settings page rebuilt as a 4-tab layout: General, Notifications/Alerts, Site Manager, and Privacy
  • All 13 analytics pages gained a collapsible PageNote banner with a business explanation and a developer tip per page
  • InfoTooltip on every MetricCard and ChartCard
  • Documentation page rebuilt with dual-audience tabs (Business Owner / Developer) and an interactive 4-view architecture diagram
  • Dark mode enforced consistently across every page and component

5. Marketing landing page

Hero, features grid, 3-step "How It Works", tech stack cards, and a side-by-side comparison table vs. Google Analytics across 9 dimensions.


Before vs. After

Dimension v1 (Before) v2 (After)
90-day query time ~620 ms ~25 ms (25×)
DuckDB architecture Single flat table, full scan Hot (in-memory) + Cold (Parquet), UNION view
Sync on crash Full table rebuild Resumes from dual watermark
Analytics pages 7 13
Tracking data points 8 19
Real-Time page Active count only Map + event stream + device breakdown
Engagement tracking None Scroll depth, heatmaps, rage clicks, time on page
Conversion & Goals Funnels only Goals, A/B testing, revenue attribution
Audience analytics None New/returning, cohorts, segments
Content analytics Top pages only Entry/exit pages, page flow, site search
Acquisition analytics None UTM campaigns, social, keywords, link builder
Performance monitoring None Core Web Vitals + JS error tracking
Reporting None Export, annotations, scheduled reports, custom dashboards
Multi-site Single site Unlimited sites with per-site settings
Comparison mode None Previous-period overlay on all charts
Export None CSV / JSON / PNG
Settings page Single form 4-tab layout with site manager
Contextual help None PageNote + InfoTooltip on every page
Dark mode coverage Partial Complete
Marketing landing page None Full page with comparison table

My Experience with GitHub Copilot

I'll be direct: I would not have shipped v2 without GitHub Copilot. Not because the individual problems were too hard — but because there were so many of them, and I was coming back to code I hadn't touched in months.

The architecture lift. The Hot+Cold DuckDB design required a new sync worker, new schema migrations, new UNION views, and updates to every existing query file simultaneously. Copilot helped me work through design decisions conversationally — I'd describe the constraint ("crash recovery must resume from a watermark, not rebuild the whole table") and it would walk me through the dual-watermark pattern, flag edge cases I hadn't considered (what happens to a row that ages from hot to cold between two sync cycles?), and generate the implementation that fit the existing codebase style.

The feature sweep. Going from 8 tracked data points to 19, from 7 pages to 13, from basic funnels to engagement tracking, cohort analysis, Web Vitals, and revenue attribution is a context-switching marathon. Copilot compressed the ramp-up time on each feature dramatically. When I opened the Acquisition page for the first time in months, Copilot immediately understood the existing data model, knew which UTM fields were already being captured in the tracking script, and scaffolded the campaign dashboard components that matched the existing useAnalytics hook pattern — not generic code, but code that fit this project.

The consistency work. The PageNote and InfoTooltip additions touched 13 pages. Adding Core Web Vitals collection to the tracking script required coordinating between analytics.js, the Express ingest route, the DuckDB schema, and the React performance page — four files that had to stay in sync. Copilot held the full context across all of them.

The documentation. Writing accurate technical docs for a system you know deeply is surprisingly tedious. Copilot drafted the architecture document for the hot/cold system from the actual code — referencing the real SQL, the real file paths, the real config variables — and I edited for clarity rather than writing from scratch.

The interaction style that worked best wasn't "write this for me." It was: here's the constraint, here's the existing pattern, here's where I'm confused. Copilot as a collaborator who has read your entire codebase is a genuinely different experience from Copilot as autocomplete. The finish-up phase of a project — juggling context across a dozen half-done things — is exactly where that kind of always-available, always-context-aware pair programmer pays off most.

InsightTrack v2 is the project I always meant to ship. It just needed time, a clear plan, and a better way to work through the backlog.


InsightTrack is open source — self-host your own privacy-first analytics in minutes.

Top comments (0)