DEV Community

Cover image for Vector Similarity, Zero Client JS: Decoupled Analytics on a Side Project Budget
Jason Agostoni
Jason Agostoni

Posted on

Vector Similarity, Zero Client JS: Decoupled Analytics on a Side Project Budget

A leaderboard for DumbQuestion.ai sounds simple. Track the most asked questions, display them. Done. Except people never ask the same question the same way twice.

I was curious about how creative users of DumbQuestion.ai got with their questions, and I thought others might be as well. So I built a leaderboard of the most frequently asked dumb questions.

The Overqualified persona calls it THE ARCHIVE OF INCOMPETENCE.
The Weary persona calls it THE WALL OF REGRET.
[REDACTED] calls it THE WATCHLIST.
The Compliant calls it THE WALL OF EXCELLENCE (bless its reprogrammed heart).

Building it turned out more interesting than it sounds.


The Product Challenge

People ask the same dumb question in a hundred different ways. "What is 2+2?" and "can you add two plus two for me?" are functionally identical. A simple string counter would give you noise, not signal. I needed semantic matching, not string matching.

This is a solved problem in the ML world, but the typical solutions come with tradeoffs: heavyweight models, expensive APIs, or significant latency added to the critical path. None of those fit a "brutally efficient" side project.

The Solution: Vector Similarity on a Budget

Each question gets run through an embedding model and compared against a Qdrant vector database. Qdrant's free tier is remarkably generous for a side project workload, but self-hosting is trivially easy if you need it.

The matching logic is straightforward:

  • Generate an embedding for the incoming question
  • Compare against existing embeddings using cosine similarity
  • If similarity exceeds a threshold, increment that question's counter
  • If it's new, add it to the database
  • The first instance of a question becomes the official display version

The embedding call costs fractions of a cent. The similarity comparison is fast. The result is a leaderboard that actually understands context rather than just matching strings.

The key architectural decision: None of this runs in the main app.

Adding vector similarity matching to every request would add latency, bloat the container, and burn more compute. Anti-pattern to the "brutally efficient" principle I've been following throughout. Instead, every question flows through the console output, gets picked up by a Vector sidecar container, routed through GCP Pub/Sub, and processed asynchronously on my Mac Mini home server (more later).

The Mac Mini handles the Qdrant comparisons and updates a JSON file in Cloudflare R2 storage. When a user hits the leaderboard page it loads directly from R2. No live database queries. No per-request costs. Essentially free page loads at any scale.

What Ended Up on the Leaderboard?

As early users started using the app, the leaderboard filled up with exactly what you'd expect: actual dumb questions, a handful of self-awareness probes, and more than a few prompt injection attempts.

Apparently people read this series and went straight for the easter eggs.


The leaderboard was just one piece of a larger analytics picture. Building it taught me something useful: the most interesting features don't always belong in your main app. That same principle shaped the entire analytics stack.


The Observability Problem

Running a side project means making real product decisions with limited data. Are people actually asking questions or just bouncing off the homepage? Which sites are driving traffic? Are ads being seen, clicked, ignored?

Two constraints shaped the solution: no client-side JavaScript (page bloat is the enemy of brutal efficiency) and no SaaS analytics bill that spikes with usage.

So I built (assembled, really) my own stack from open source tools. On a Mac Mini sitting at home.

The Full Pipeline

Every event in DumbQuestion.ai emits structured telemetry to standard console output:

  • HTTP requests (method, path, status, duration)
  • Questions asked (anonymized)
  • Searches performed
  • LLM operations (model, token counts, duration, cost)
  • Prompt injection attempts
  • Custom product events (Question Asked, Shared, Ad Shown, Ad Clicked)

The Go/GIN framework handles much of the HTTP telemetry automatically. The rest is custom instrumentation added deliberately at key points in the application.

A Vector sidecar container picks up the console output and routes it to GCP Pub/Sub. This is the critical architectural decision: Pub/Sub acts as a resilient buffer between the main app and everything downstream. The Mac Mini can go down, lose power, or restart. Once it comes back up, the stack picks up exactly where it left off. No data loss, no backfill scripts, no drama.

From Pub/Sub, a second Vector instance on the Mac Mini routes to two primary targets:

Plausible handles user behavior and product analytics:

  • Page views and session depth
  • UTM tag tracking (know exactly which article drove which visit)
  • User journey depth (did they just hit the root page or actually ask a question?)
  • Browser, device type, country of origin
  • Custom events: Question Asked, Shared, Ad Shown, Ad Clicked

All of this without a single line of client-side JavaScript. No tracking scripts, no page weight, no GDPR cookie banners for analytics. Pure server-side telemetry piped through the same pipeline as everything else.

Parseable handles the operational side:

  • LLM performance metrics and cost tracking by day
  • Ad CTR dashboards
  • Log aggregation for debugging and incident investigation

Think of it as Plausible for the product lens, Parseable for the business and ops lens.

The Resilience Payoff

I've had power outages. Slowdowns. The occasional restart. Every time, the stack catches up from where Pub/Sub left off without any manual intervention.

This isn't accidental. Designing around failure rather than pretending it won't happen is the difference between a toy and a production system. The GCP Pub/Sub buffer was a deliberate choice specifically because I knew the downstream consumers (Mac Mini, Qdrant, Plausible, Parseable) were running on non-guaranteed infrastructure.

Even on a Mac Mini, you can build something production-grade. You just have to design for it.

What I Learned

Two things surprised me building this:

First: How much you can accomplish by treating console output as a first-class telemetry stream. No SDKs, no agents baked into the app, no client-side scripts. Just structured logging and a pipeline that knows what to do with it.

Second: How much the "keep it off the critical path" principle scales. It started as a constraint (keep the main container lean) and became a design philosophy. The leaderboard, the analytics - none of it runs in the main app. All of it works reliably because the main app doesn't have to care about it.

AI helped build all of it. But knowing what to measure, where to put the seams, and how to design for failure? Still the interesting (and super fun) part.

dumbquestion.ai

Top comments (0)