How I built 10xInterview.com solo — the dual-backend AI router, SSE token streaming, pgvector for embeddings, and webhook-driven billing. A Google Interview Warmup alternative.
When Google quietly retired Interview Warmup, a small but useful tool that let you record spoken interview answers and get AI feedback, I had two reactions. First: that's a shame, I used it. Second: I can build something better.
A year later, 10xInterview is live. Solo-built. One Go binary, one React SPA, one Postgres database, one shell script to deploy the whole thing.
This post is the engineering walkthrough — the four architectural decisions that made the codebase still feel small after a year of shipping features. If you're building anything AI-heavy as a solo founder, the patterns below are the ones I'd reach for again.
TL;DR: Boring stack, opinionated routing, stream everything, single database, webhooks as source of truth. Code samples inline.
The stack (deliberately boring)
Backend Go 1.23, Chi router, single binary
Database Postgres 17 + pgvector
Frontend React 19, Vite, TanStack Query, shadcn/ui
AI Vertex AI (free tier) + Gemini API (Pro)
Speech Google STT + TTS
Infra Cloud Run x2, Cloud SQL, HTTPS LB, Secret Manager
Billing Razorpay Subscriptions (webhook-driven)
Deploy One idempotent shell script
No microservices. No event bus. No Kafka. No Redis (yet). No second database for vectors. No SSR. No global client store. No framework-of-the-month.
The constraint of "one person has to be able to operate this at 3am" drove every choice.
Decision 1: The dual-backend AI Router
Problem: Free users will exhaust an expensive LLM API instantly. But Pro users expect a meaningfully better experience. How do you serve both from the same handler code?
Solution: Every AI capability is an interface with two implementations. A Router struct picks at request time based on the plan in context.
// services/agent/router.go
type Reviewer interface {
Review(ctx context.Context, in ReviewInput) (ReviewOutput, error)
}
type ReviewerRouter struct {
Free Reviewer // Vertex AI, Gemini 2.5 Flash
Paid Reviewer // Gemini API, stronger model
}
func (r *ReviewerRouter) Review(ctx context.Context, in ReviewInput) (ReviewOutput, error) {
if auth.PlanFromContext(ctx) == auth.PlanPro {
return r.Paid.Review(ctx, in)
}
return r.Free.Review(ctx, in)
}
Handlers never check the plan. They never know which backend they hit. The auth middleware puts the plan in context; the router does the right thing.
Three benefits compound from this pattern:
-
Graceful degradation by default. If
GEMINI_API_KEYisn't set, the Paid implementation isniland the router falls back to Free. Pro users get the cheaper model silently rather than a 500. A rotated key doesn't take the site down. -
Local dev with zero credentials. If
AGENT_ENABLED=false, every agent becomes a deterministic stub returning canned data. New contributors clone the repo,go run ./cmd/server, and have a working app without ever touching Google Cloud. -
Adding a third tier is a struct field. When (if) I add an Enterprise tier with a different model, it's
r.Enterprise = ...and one extra case in the dispatch. No handler changes. I have ~6 of these routers now — reviewer, generator, designer, live interviewer, explainer, resume parser. The pattern paid for itself by Router #2.
Decision 2: SSE token streaming over a tiny in-process broker
Problem: The first version of answer review was a synchronous REST call. Upload audio → wait 8–14 seconds → render the JSON response. It worked. It also felt awful.
Solution: Stream the LLM output token-by-token over Server-Sent Events. The frontend renders score and feedback as they generate.
The architecture is intentionally minimal:
// Broker.go — ~150 lines total
type Broker struct {
mu sync.RWMutex
subs map[string][]chan Event // submissionID -> subscribers
}
func (b *Broker) Subscribe(id string) (<-chan Event, func()) {
ch := make(chan Event, 32)
b.mu.Lock()
b.subs[id] = append(b.subs[id], ch)
b.mu.Unlock()
return ch, func() { /* unsubscribe + close */ }
}
func (b *Broker) Publish(id string, e Event) {
b.mu.RLock()
for _, ch := range b.subs[id] {
select {
case ch <- e:
default: // drop if subscriber is slow
}
}
b.mu.RUnlock()
}
The HTTP handler:
func (h *Handler) StreamSubmission(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/event-stream")
w.Header().Set("Cache-Control", "no-cache")
flusher := w.(http.Flusher)
events, cancel := h.broker.Subscribe(submissionID)
defer cancel()
for {
select {
case e := <-events:
fmt.Fprintf(w, "data: %s\n\n", e.JSON())
flusher.Flush()
if e.Type == "complete" { return }
case <-r.Context().Done():
return
}
}
}
The frontend opens an EventSource:
const es = new EventSource(`/api/v1/submissions/${id}/stream`);
es.onmessage = (e) => {
const event = JSON.parse(e.data);
setReview((prev) => mergeEvent(prev, event));
};
Total wall-clock time is identical to the synchronous version. Perceived time is roughly half. Users start reading feedback at token ~30 instead of waiting for token ~400.
Cost of this UX upgrade: ~200 lines of Go, ~40 lines of TS, zero new infrastructure. No Redis, no NATS, no Kafka. Pub/sub inside a monolith doesn't need a message queue.
Caveat: This pattern only works because the backend is a single Cloud Run instance during a session — submissions and subscribers live in the same process. The moment I need to scale to multiple instances per user session, I'll either swap the broker for Redis pub/sub or pin sessions with sticky cookies. Not today's problem.
Decision 3: Postgres + pgvector instead of a vector database
Problem: Two features need vector similarity:
- Question deduplication — when admins or the AI bulk-generate new questions, "What is a closure?" worded 14 different ways shouldn't all enter the catalog.
- Resume-aware recommendations — given an uploaded resume's embedding, surface the questions most similar to the candidate's stated skills. The almost-mistake: I nearly reached for Pinecone.
The actual solution: CREATE EXTENSION vector; in a migration, one extra column on the questions table, one extra column on resumes, done.
-- migrations/0007_add_embeddings.sql
CREATE EXTENSION IF NOT EXISTS vector;
ALTER TABLE questions
ADD COLUMN embedding vector(768);
CREATE INDEX questions_embedding_idx
ON questions
USING hnsw (embedding vector_cosine_ops);
The dedup check before insert:
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM questions
WHERE topic_id = $2
ORDER BY embedding <=> $1
LIMIT 1;
-- reject if similarity > 0.92
The recommendation query:
SELECT q.id, q.title, q.topic_id
FROM questions q
JOIN topics t ON t.id = q.topic_id
WHERE t.id = ANY($2) -- relevant topics from resume
ORDER BY q.embedding <=> $1 -- resume embedding
LIMIT 20;
One database. One backup. One thing to monitor. One connection pool.
The argument against pgvector is usually "it doesn't scale past N vectors." For 10xInterview's workload — currently ~50k question embeddings, growing slowly — that ceiling is years away. The day it stops being the right call, swapping it out is a localized refactor in one file. The optionality is preserved.
If you're at sub-1M vectors and you're considering a separate vector DB: try pgvector first. You'll likely never need to migrate.
Decision 4: Webhooks are the single source of truth for billing
Problem: Billing bugs are the worst bugs. Users pay, the system doesn't notice, support tickets pile up. Trust evaporates.
The rule: The checkout endpoint never marks a user Pro. Only webhooks do.
POST /api/v1/billing/checkout → Mints Razorpay subscription, returns IDs.
Does NOT change user.plan.
POST /webhooks/razorpay → Razorpay-initiated. HMAC verified.
Inserts to payment_events.
Updates user.plan ONLY if insert succeeded.
The webhook handler in full:
func (h *Handler) RazorpayWebhook(w http.ResponseWriter, r *http.Request) {
body, _ := io.ReadAll(r.Body)
sig := r.Header.Get("X-Razorpay-Signature")
if !verifyHMAC(body, sig, h.cfg.RazorpayWebhookSecret) {
http.Error(w, "bad signature", 401)
return
}
var evt RazorpayEvent
if err := json.Unmarshal(body, &evt); err != nil {
http.Error(w, "bad json", 400); return
}
// The idempotency key
err := h.db.InsertPaymentEvent(r.Context(), PaymentEvent{
ProviderEventID: evt.ID, // UNIQUE constraint
Type: evt.Event,
Payload: body,
})
if errors.Is(err, ErrDuplicate) {
w.WriteHeader(200) // already processed; no-op
return
}
if err != nil {
http.Error(w, "db error", 500); return
}
switch evt.Event {
case "subscription.activated", "subscription.charged":
h.db.UpgradeUser(r.Context(), evt.Subscription.UserID, "pro", evt.Subscription.EndAt)
case "subscription.cancelled", "subscription.halted":
// Don't downgrade immediately. Let it expire naturally.
h.db.MarkCancelled(r.Context(), evt.Subscription.UserID)
}
w.WriteHeader(200)
}
Four properties this gives you:
-
Idempotent by construction. Razorpay can retry the same webhook 10 times. The unique constraint on
provider_event_idmeans the first one wins and the rest are no-ops. No application-level dedup logic required. - Cookie-auth-free endpoint. The webhook route is mounted at the root, outside the cookie middleware. Razorpay doesn't bring cookies, and there's nothing for an attacker to forge — the HMAC is the auth.
-
Audit trail for free. Every payment event ever received is in
payment_events. Disputes, refunds, "why was I charged" tickets — all answerable with one SQL query. -
No nightly cron. Downgrade-on-expiry is handled by the auth middleware: when a Pro user's
pro_untilis in the past on an incoming request, the middleware downgrades them in-flight. State is always correct because state is always checked, not swept. One operational note: Razorpay Subscription Plans only support INR without a support-ticket process to enable multi-currency. So the pricing page shows USD as a display-only conversion for non-Indian visitors (detected via browser timezone) and charges INR. Small detail. Saves a lot of confused support tickets.
The deploy story
PROJECT_ID=my-prj DOMAIN=10xinterview.com ADMIN_EMAILS=me@x.com \
GOOGLE_CLIENT_ID=… GOOGLE_CLIENT_SECRET=… \
./deploy.sh
That's a fresh GCP project to a live deployment in ~12 minutes. deploy.sh provisions:
- Cloud SQL instance with pgvector enabled
- Two Cloud Run services (api + web)
- HTTPS Load Balancer with a Google-managed cert
- Cloud Run Job for the embeddings backfill
- Secret Manager entries for every credential
Every step uses
describe-or-create. Re-running is safe. Mid-script failures are recoverable just by running again.
Razorpay and Gemini API keys are mounted only after their secrets exist — a first-run deploy without them is fine. The AI features degrade to free-tier behavior; billing endpoints return 503 Service Unavailable until you add the key.
Idempotent deploys + graceful degradation = a one-person SaaS you can actually sleep through the night with.
What I'd do differently
A year in, three things I'd change if I started over:
-
Adopt sqlc earlier. I started with hand-written
database/sqland migrated to sqlc around month 4. The codebase would be cleaner if I'd started there. - Use the same model for embeddings on both tiers. I briefly tried using a stronger embedding model for Pro users. The recall difference wasn't worth the cost and the dual-embedding bookkeeping was a nightmare. Now both tiers use the same Vertex embedding model.
-
Skip the design system experimentation. I tried Tailwind UI, then Park UI, then shadcn/ui. Should've started with shadcn/ui. Lost two weekends.
If you used to use Google's Interview Warmup
A quick aside, because I keep seeing this question on Reddit.
Google retired Interview Warmup — the experimental tool from Grow with Google that let you record an answer and get AI feedback. A lot of people liked it. It's gone.
10xInterview isn't a clone. The mapping:
- What Warmup did: record a spoken answer → get analysis on insights mentioned, vocabulary, talking points.
- What 10xInterview overlaps on: record a spoken answer → get a 0–100 score and specific suggestions, streamed in real time.
- What 10xInterview adds: curated question library by topic and skill, mock interviews with aggregated final reports, a live AI interviewer that asks adaptive follow-ups, resume-aware question recommendations, on-demand explanations with mermaid diagrams.
- What it doesn't do yet: STAR-format behavioural scoring (on the near-term roadmap). Not affiliated with Google. Just filling a gap they left.
The free tier covers the core "record and get scored" loop with weekly limits. The Pro tier unlocks the live interactive interviewer — the harder pressure test that Warmup never offered.
Try it
Sign in with Google. Record one answer. The free tier gives you the full library, scored answers, and mock interviews without paying anything.
If you spot an architectural choice in this post you'd have made differently — I want to hear it. Reply here or grab my email from the site.
Built with Go, React, Postgres, and Google Cloud. The entire backend is small enough to read in an afternoon. The Router pattern, SSE broker, pgvector queries, and webhook handler shown above are the load-bearing pieces — everything else is glue.
*If this was useful, a ❤️ helps the post reach more devs. The next post in this series will be a deep-dive on the SSE broker pattern — implementing it in Go without external dependencies, edge cases I hit, and when you should reach fo
Top comments (0)