This is a submission for the GitHub Finish-Up-A-Thon Challenge
Project Details
| Name | InsightTrack |
| GitHub | github.com/nishikantaray/insighttrack |
| Stack | React 18 · Vite 5 · Tailwind CSS · Express 4 · Node.js 20 · PostgreSQL · DuckDB |
Demo
What I Built
InsightTrack is a self-hosted, privacy-first web analytics platform — my personal answer to the question "Why do I have to send all my users' data to Google just to know how many people visited my site?"
It tracks pageviews, sessions, conversions, user flows, and engagement — all on your own infrastructure, with no cookies and no third-party data sharing. The architecture uses PostgreSQL for all writes and DuckDB as an embedded columnar OLAP engine for analytics reads, achieving 10-100× the query speed of a traditional row-store setup.
I started InsightTrack during a long weekend when I was frustrated with the privacy implications of Google Analytics and the price tags on alternatives like Plausible and Fathom. The idea was simple: build a clean, self-hostable dashboard that you could drop a <script> tag into and be live in 5 minutes.
I got a working v1 out. It had a dashboard, auth, an onboarding wizard, basic charts, funnels, and a Docker Compose file that actually ran. Then I stopped. The project sat in a local folder for months — almost good enough, but not quite. The DuckDB sync was fragile. There was no dark mode on half the pages. Dozens of planned features were never built. Query performance fell apart past 30 days of data.
The bones were good. I just never gave it the finish it deserved.
How it works
Your Website
└── analytics.js (tracking script, ~2 KB)
│ POST /api/track/event
▼
Express API (Node.js)
│ Writes
▼
PostgreSQL ──── Incremental sync worker (every 5 min)
│
▼
DuckDB (OLAP reads)
│
▼
React Dashboard ◀─── GET /api/analytics/*
-
Write path: The tracking script sends events via
navigator.sendBeacon. Express validates and inserts into PostgreSQL. Response is immediate — DuckDB is never in the write path. - Sync path: A high-watermark worker copies new rows from PostgreSQL → DuckDB incrementally, never rebuilding from scratch.
- Read path: All 17 analytics endpoints read exclusively from DuckDB, running columnar aggregations 10-100× faster than a row-store query.
-
Auth: JWT-based authentication. All analytics routes are protected. Only
/api/track/*and/api/auth/*are public.
PostgreSQL schema (source of truth)
| Table | Purpose |
|---|---|
events |
Raw event log — pageviews, clicks, custom events |
sessions |
Session metadata — duration, page count, entry/exit pages |
sites |
Registered websites and their tracking keys |
funnels |
Funnel definitions with ordered steps |
daily_stats |
Pre-aggregated daily rollups |
users |
Dashboard user accounts |
Demo
Landing Page
Comparison Table vs Google Analytics

Authentication & Onboarding
| Login | Register |
|---|---|
![]() |
![]() |
Dashboard
| Light Mode | Dark Mode |
|---|---|
![]() |
![]() |
| KPI Cards + Sparklines | Traffic Charts |
|---|---|
![]() |
![]() |
Analytics Pages
| User Flow (Sankey) | Real-Time |
|---|---|
![]() |
![]() |
| Funnels | Settings |
|---|---|
![]() |
![]() |
| Top Pages | Profile |
|---|---|
![]() |
![]() |
| Documentation | API Reference |
|---|---|
![]() |
![]() |
| Sidebar | Dark Landing |
|---|---|
![]() |
![]() |
Stack: React 18 + Vite 5 + Tailwind CSS 3 · Express 4 + Node.js 20 · PostgreSQL + DuckDB · Docker Compose
The Comeback Story
Where it was (v1)
v1 had the core loop working: embed a tracking script → events land in PostgreSQL → a sync worker copies rows into a flat DuckDB table → the React dashboard reads from DuckDB. The fundamental architecture was sound.
What v1 shipped:
- Dashboard with 4 KPI cards (Visitors, Pageviews, Bounce Rate, Avg. Session)
- Traffic and pageview time-series charts (7d / 30d / 90d presets)
- Top pages table
- Basic conversion funnels
- Device and country breakdown
- JWT authentication, register/login, onboarding wizard with tracking snippet
- Docker Compose deployment
But it had serious rough edges.
Performance wall. The DuckDB sync layer used a single flat table. On 90+ days of data, a KPI query scanned every row — ~620 ms on a local M1 machine. There was no partitioning, no archival strategy, and a crash meant rebuilding the entire table from scratch.
Half-finished features. There was a long list of planned but never-built ideas: UTM campaign tracking, comparison mode on charts, user flow analysis, a live visitor map on the Realtime page, engagement tracking, goal tracking, audience analytics, export to CSV/JSON — all unshipped.
UX gaps. The Settings page was a single form. The Documentation page was a placeholder. Half the pages lacked dark mode. There was no contextual help anywhere — a new user landing on the Funnels page had no idea what they were looking at.
No marketing story. The project had no landing page, no comparison table, no "why this instead of Google Analytics" pitch. It existed but didn't explain itself.
What changed (v1 → v2)
1. Hot+Cold Analytics Architecture
This was the most technically significant change. I replaced the flat DuckDB sync with a two-tier data lake:
- Hot tier — DuckDB in-memory tables holding the last 30 days of events and sessions
-
Cold tier — Hive-partitioned Parquet files on disk for historical data, organized by
site_idandevent_date
data-lake/events/
site_id=abc/
event_date=2026-01-01/
part-0001.parquet
event_date=2026-01-02/
part-0001.parquet
A transparent UNION ALL view sits on top of both tiers, so every existing dashboard query kept working without modification:
CREATE OR REPLACE VIEW events AS
SELECT * FROM events_hot
UNION ALL
SELECT * FROM read_parquet(
'data-lake/events/site_id=*/event_date=*/part-0001.parquet',
hive_partitioning = true
);
A dual-watermark incremental sync worker runs every 5 minutes, fetching only new rows from PostgreSQL using high-water marks (last_event_id for events, last_synced timestamp for sessions), then routing them to the correct tier based on age.
Result: 90-day queries dropped from ~620 ms → ~25 ms. A 25× speedup.
Recovery after a crash now resumes from the watermark instead of rebuilding the entire table.
2. Tracking Script — what it captures automatically
The auto-generated analytics.js (~2 KB) handles everything without configuration:
| Data Point | How |
|---|---|
| Page URL & path | window.location.href |
| Referrer | document.referrer |
| Browser, OS, device |
navigator.userAgent + screen width heuristics |
| Country | Timezone via Intl.DateTimeFormat
|
| Session | Auto UUID in sessionStorage
|
| Anonymous user ID | Random UUID in localStorage — no cookies |
| Outbound link clicks |
click event listener |
| Scroll depth | 25/50/75/100% milestones on page unload |
| Time on page | Seconds per page sent on unload |
| Heatmap clicks | x/y coordinates + CSS selector of every click |
| Rage clicks | 3+ rapid clicks on same element within 1 second |
| Core Web Vitals | LCP, FID, CLS, INP, TTFB via PerformanceObserver
|
| JS errors |
window.onerror + unhandledrejection capture |
| Site search | Form submit interception |
| UTM parameters |
utm_source, utm_medium, utm_campaign, utm_term, utm_content
|
| SPA navigation |
popstate + pushState / replaceState listeners |
Privacy built-in: DNT (navigator.doNotTrack) and GPC (navigator.globalPrivacyControl) are respected — if either is set, no data is collected.
Custom events are available via window.analytics.track('event_name', { ...properties }).
3. Complete v2 feature set
Core Dashboard
The main dashboard now has comparison mode — every chart can overlay the current period against the previous period. All 4 KPI cards have sparkline mini-charts. Date picker supports custom ranges in addition to presets.
Real-Time
- Active visitor count (users in last 5 minutes, with pulsing indicator)
- Interactive world map pinpointing active visitors by country
- Live event stream — feed of the last 50–100 events as they happen
- Active pages ranked by current visitor count
- Real-time device breakdown
Engagement Analytics
Four metrics measured automatically by the tracking script:
| Metric | What it shows |
|---|---|
| Scroll Depth | Per-page average scroll %; milestone table (25/50/75/100%) |
| Click Heatmaps | x/y dot map per page, top clicked elements ranked |
| Rage Click Detection | Pages and elements triggering 3+ rapid clicks within 1 second |
| Time on Page | Average, median, min/max read time per page |
User Flow & Funnels
- Sankey diagram showing entry pages → transitions → exit pages with proportional path thickness
- Multi-step funnel builder (e.g., Landing → Signup → Checkout → Purchase) with per-stage drop-off and conversion rate
- Saved funnel definitions
Conversion & Goals
- Goal types: Pageview (URL match), Event (custom event), Duration (time on page), Scroll Depth (threshold)
- A/B test tracking — create experiments, assign variants, compare conversion rates
- Revenue attribution — attach dollar values to purchase events, see total revenue and average order value by traffic source
Audience Analytics
- New vs. returning visitors — pie/donut chart + daily trend
- Cohort analysis — heatmap grid of Day 1 / Day 7 / Day 30 retention by first-visit week
- Visitor segments — filter by device, browser, OS, country and compare KPIs side by side
Content Analytics
- Entry pages — top landing pages ranked by visitor count and entry rate
- Exit pages — top departure pages ranked by exit count and exit rate
- Page flow — where visitors go after each page
- Site search — queries captured from form submit interception with frequency counts
Acquisition Analytics
- Full UTM campaign dashboard — source, medium, campaign, term, content breakdowns with visitor counts, bounce rates, and session duration
- Social media tab — automatic classification of Facebook, Twitter/X, LinkedIn, Instagram, YouTube, Reddit, Pinterest, TikTok traffic
- Search keyword tracking — organic and paid keywords driving traffic
- UTM link builder — generate tagged URLs directly in the dashboard
Performance Monitoring
Core Web Vitals collected automatically:
| Metric | Good | Needs Improvement | Poor |
|---|---|---|---|
| LCP (Largest Contentful Paint) | ≤ 2.5s | ≤ 4.0s | > 4.0s |
| FID (First Input Delay) | ≤ 100ms | ≤ 300ms | > 300ms |
| CLS (Cumulative Layout Shift) | ≤ 0.1 | ≤ 0.25 | > 0.25 |
| INP (Interaction to Next Paint) | ≤ 200ms | ≤ 500ms | > 500ms |
| TTFB (Time to First Byte) | ≤ 800ms | ≤ 1800ms | > 1800ms |
Color-coded metric cards (green/yellow/red) + per-page breakdown table.
JavaScript error tracking: message, source file, line/column, stack trace, error trend chart over time.
Reporting & Export
- Data export — JSON/CSV download for KPI, traffic, pages, sources data; PNG export for charts
- Annotations — pin notes to specific dates on charts (e.g., "launched new homepage") with category badges (deployment/marketing/incident)
- Scheduled reports — configure daily/weekly/monthly email delivery with recipient list
- Custom dashboards — save personalized widget layouts
Multi-Site Management
- Add and manage unlimited websites under one account
- Per-site tracking snippets with one-click copy
- Per-site data retention policies (30/90/180/365 days)
- Automated cleanup of expired events and sessions
4. UX and documentation overhaul
- Settings page rebuilt as a 4-tab layout: General, Notifications/Alerts, Site Manager, and Privacy
-
All 13 analytics pages gained a collapsible
PageNotebanner with a business explanation and a developer tip per page -
InfoTooltipon everyMetricCardandChartCard - Documentation page rebuilt with dual-audience tabs (Business Owner / Developer) and an interactive 4-view architecture diagram
- Dark mode enforced consistently across every page and component
5. Marketing landing page
Hero, features grid, 3-step "How It Works", tech stack cards, and a side-by-side comparison table vs. Google Analytics across 9 dimensions.
Before vs. After
| Dimension | v1 (Before) | v2 (After) |
|---|---|---|
| 90-day query time | ~620 ms | ~25 ms (25×) |
| DuckDB architecture | Single flat table, full scan | Hot (in-memory) + Cold (Parquet), UNION view |
| Sync on crash | Full table rebuild | Resumes from dual watermark |
| Analytics pages | 7 | 13 |
| Tracking data points | 8 | 19 |
| Real-Time page | Active count only | Map + event stream + device breakdown |
| Engagement tracking | None | Scroll depth, heatmaps, rage clicks, time on page |
| Conversion & Goals | Funnels only | Goals, A/B testing, revenue attribution |
| Audience analytics | None | New/returning, cohorts, segments |
| Content analytics | Top pages only | Entry/exit pages, page flow, site search |
| Acquisition analytics | None | UTM campaigns, social, keywords, link builder |
| Performance monitoring | None | Core Web Vitals + JS error tracking |
| Reporting | None | Export, annotations, scheduled reports, custom dashboards |
| Multi-site | Single site | Unlimited sites with per-site settings |
| Comparison mode | None | Previous-period overlay on all charts |
| Export | None | CSV / JSON / PNG |
| Settings page | Single form | 4-tab layout with site manager |
| Contextual help | None |
PageNote + InfoTooltip on every page |
| Dark mode coverage | Partial | Complete |
| Marketing landing page | None | Full page with comparison table |
My Experience with GitHub Copilot
I'll be direct: I would not have shipped v2 without GitHub Copilot. Not because the individual problems were too hard — but because there were so many of them, and I was coming back to code I hadn't touched in months.
The architecture lift. The Hot+Cold DuckDB design required a new sync worker, new schema migrations, new UNION views, and updates to every existing query file simultaneously. Copilot helped me work through design decisions conversationally — I'd describe the constraint ("crash recovery must resume from a watermark, not rebuild the whole table") and it would walk me through the dual-watermark pattern, flag edge cases I hadn't considered (what happens to a row that ages from hot to cold between two sync cycles?), and generate the implementation that fit the existing codebase style.
The feature sweep. Going from 8 tracked data points to 19, from 7 pages to 13, from basic funnels to engagement tracking, cohort analysis, Web Vitals, and revenue attribution is a context-switching marathon. Copilot compressed the ramp-up time on each feature dramatically. When I opened the Acquisition page for the first time in months, Copilot immediately understood the existing data model, knew which UTM fields were already being captured in the tracking script, and scaffolded the campaign dashboard components that matched the existing useAnalytics hook pattern — not generic code, but code that fit this project.
The consistency work. The PageNote and InfoTooltip additions touched 13 pages. Adding Core Web Vitals collection to the tracking script required coordinating between analytics.js, the Express ingest route, the DuckDB schema, and the React performance page — four files that had to stay in sync. Copilot held the full context across all of them.
The documentation. Writing accurate technical docs for a system you know deeply is surprisingly tedious. Copilot drafted the architecture document for the hot/cold system from the actual code — referencing the real SQL, the real file paths, the real config variables — and I edited for clarity rather than writing from scratch.
The interaction style that worked best wasn't "write this for me." It was: here's the constraint, here's the existing pattern, here's where I'm confused. Copilot as a collaborator who has read your entire codebase is a genuinely different experience from Copilot as autocomplete. The finish-up phase of a project — juggling context across a dozen half-done things — is exactly where that kind of always-available, always-context-aware pair programmer pays off most.
InsightTrack v2 is the project I always meant to ship. It just needed time, a clear plan, and a better way to work through the backlog.
InsightTrack is open source — self-host your own privacy-first analytics in minutes.


























Top comments (0)