Alex Huang

Posted on Feb 28

How I Vibe-Coded a Full-Stack AI Word Game — With Zero Web Dev Experience

#webdev #ai

I'm an ML/algorithm engineer. My day job involves training models, tuning embeddings, and writing Python scripts. I had never built a web app. No React. No CSS. No "deploy to production" experience beyond python app.py.

But I had an idea: what if I could turn semantic similarity — the thing I work with every day — into a game anyone could play?

The result is Hot and Cold Game, a daily word-guessing puzzle where AI tells you how "hot" or "cold" your guess is based on meaning, not spelling. Think Wordle meets word embeddings.

I built the entire thing through vibe coding — using AI tools to handle the parts I didn't know, while leveraging my ML expertise for the parts I did. Here's how.

The Idea: Making Embeddings Fun

In my day job, I use text embeddings constantly. One day I was debugging a similarity search and noticed something: the distance between "ocean" and "wave" was small, while "ocean" and "bicycle" was huge.

That's... basically a game.

I knew about Contexto and Semantle — similar games that already had thousands of players. But I thought I could do it differently: better feedback UI, a hint system, daily leaderboards, and a hot/cold temperature metaphor that makes the abstract concept of "semantic distance" intuitive.

The problem? I'd need a full-stack web app. And I'd never built one.

Phase 0: Requirements with Claude Code's AskUserQuestion

My first instinct was to just start coding. Bad idea. Instead, I opened Claude Code and let it interview me about what I actually wanted to build.

Using Claude Code's AskUserQuestion feature, we went through a structured requirements session:

"What are the core game mechanics?" → Guess a word, get a rank (1 = correct, 60,000+ = cold). Temperature gradient visualization.
"What's the auth strategy?" → Google OAuth for simplicity. But also support anonymous play — don't gate the fun.
"What's the MVP scope?" → 3 pages only: Game, How to Play, Archive. No settings, no profile, no social features. Ship fast.
"What's your deployment constraint?" → Self-hosted server, I already have one.

This back-and-forth produced a 1,500-line product spec that became the single source of truth for the entire project. Every later decision referenced this document.

Lesson learned: Don't start coding. Start by letting the AI ask you hard questions about what you're actually building.

Phase 1: Architecture — Playing to My Strengths

This is where my ML background actually helped. Claude Code and I designed the tech stack together:

Next.js Frontend ←→ FastAPI Backend ←→ Supabase Postgres + Redis
                                            ↑
                                        pgvector
                                    (semantic search)

Backend: FastAPI (Python) — I know Python. This was non-negotiable. Async, fast, great OpenAPI docs.
Database: Supabase Postgres + pgvector — I'd used pgvector for ML projects. Vector similarity search is literally my job.
Embeddings: OpenAI text-embedding-3-small — 1536 dimensions, cached in Redis. Each word gets embedded once, then stored forever.
Frontend: Next.js + TypeScript + Tailwind + shadcn/ui — I had no idea what any of this meant. But Claude Code picked it, and I trusted the vibes.

The architecture decomposed into 58 tasks across 10 phases, from infrastructure setup to deployment. Claude Code generated the full task breakdown, dependency graph, and execution order.

Phase 2: The Frontend Design Odyssey

This was my biggest struggle. I'm an engineer who thinks in tensors and loss functions, not margins and hover states.

I tried everything:

v0 (Vercel)

My first attempt. I described the game UI and got a working React component in seconds. But the output was too generic — it looked like every other SaaS landing page. Hard to customize deeply.

Base44

Interesting concept — describe your app and get a full codebase. But the generated code was hard to modify. When I wanted to change the heat gradient bar from a simple slider to a custom multi-color visualization, I was fighting the tool more than using it.

Stitch

Similar experience. Great for prototyping, not great for production-quality output that I'd need to maintain.

Figma → Figma MCP → Claude Code (The Winner)

I finally landed on a workflow that actually worked:

Design in Figma — I created a complete prototype with 18 screens covering every game state (not started, in progress, hints used, give up confirmation, solved, standings...). As an ML engineer, I found Figma surprisingly intuitive — it's basically "drag rectangles and pick colors."
Connect via Figma MCP — This was the magic. Figma MCP (Model Context Protocol) let Claude Code directly read my Figma designs. Instead of me trying to describe the UI in words, I'd just say "implement the design from this Figma node" and point to the URL.
Claude Code generates components — With access to the actual design file, Claude Code produced React components that matched my mockups pixel-by-pixel. Colors, spacing, layout — all pulled directly from Figma.

This three-step pipeline (Design in Figma → MCP bridge → Claude Code implementation) was a game-changer. The frontend went from my biggest blocker to my fastest phase.

27 frontend tasks completed in one week. 44 files. ~3,500 lines of code.

I couldn't have written a single line of it manually.

Phase 3: Backend — Where ML Meets Web Dev

The backend was where I felt most at home... and where I hit the hardest bugs.

The Easy Parts (for an ML engineer)

Setting up embeddings, similarity search, and rank computation was trivial. This is literally what I do:

# Generate embedding for a guessed word
embedding = await openai_client.embeddings.create(
    model="text-embedding-3-small",
    input=word
)

# Query pgvector for similarity with secret word
similarity = 1 - cosine_distance(guess_embedding, secret_embedding)

# Rank among 60,000 pre-computed word vectors
rank = await db.execute(
    select(func.count())
    .where(WordEmbedding.similarity > similarity)
) + 1

The hint system was fun to build — pre-compute the top 1000 similar words at midnight, then serve them in tiers. The scoring algorithm accounts for guess count, time spent, and hints used.

The Hard Part: Data Flow Design

Here's where I burned the most time. The core guess flow seems simple on paper:

User types word → Frontend validates → API call →
Backend checks dictionary → Generates embedding →
Queries similarity → Calculates rank → Caches result →
Stores in DB → Returns to frontend → Updates UI

But in practice, every step had edge cases:

Problem 1: Race conditions. Users could spam the submit button and create duplicate guesses. Solution: Redis Lua script for atomic duplicate checking (< 1ms), plus frontend debouncing (300ms) and AbortController to cancel in-flight requests.

Problem 2: State consistency. After page refresh, the game state needed to be perfectly restored. I initially tried localStorage, but it got out of sync with the server. Final solution: single source of truth on the backend, with TanStack Query caching on the frontend.

Problem 3: Cold start latency. First guess of the day took 800ms+ because the embedding needed to be generated and cached. Solution: pre-compute embeddings for the top 50,000 words at midnight using a scheduled job.

Problem 4: The GIVEN_UP vs FAILED mapping. The backend used GIVEN_UP as a status, but the frontend UI components expected FAILED. This caused bugs everywhere until we created a proper mapGameStatusToUI() utility. Sounds trivial? It took me two debugging sessions to even figure out why things were breaking.

I wrote a complete data flow analysis document mapping every user click to its frontend hook → API endpoint → backend service → database query → Redis cache interaction. This document became essential for debugging.

Lesson learned: For full-stack apps, the data flow between frontend and backend is where 80% of the bugs live. Document it obsessively.

Phase 4: Integration — The 25-Task Gauntlet

With frontend and backend built separately, I needed to wire them together. This became 25 integration tasks across 6 phases:

Phase 0: Infrastructure (types, API client, error handling, TanStack Query setup)
Phase 1: Authentication (Supabase + Google OAuth + anonymous users)
Phase 2-3: Static pages and Archive
Phase 4: Game core (the big one — 8 tasks including the critical submit-guess flow)
Phase 5: Statistics and leaderboards
Phase 6: QA with a 69-item checklist

The integration phase revealed every assumption mismatch between frontend and backend:

Time format: backend returns seconds (125), frontend needs "02:05"
Null handling: backend returns null, frontend expects undefined
Pagination: off-by-one errors in archive listing
Auth: Supabase JWT token not being passed correctly in SSR context

Each of these was a 30-minute debugging session with Claude Code. The pattern was always the same: describe the symptom, let Claude Code trace the data flow, find the mismatch, fix it.

The Result: 100+ DAU and Growing

Hot and Cold Game launched and hit 100+ daily active users organically. No ads, no Product Hunt launch — just SEO and word of mouth.

What shipped:

Daily word challenge with AI-powered semantic similarity
Hot/cold temperature gradient visualization
3-tier hint system
Daily leaderboards and standings
Game archive with 30 days of past challenges
Google OAuth + anonymous play
Full mobile responsive design
Google AdSense integration

By the numbers:

58 MVP tasks + 25 integration tasks = 83 total tasks
~3,500 lines of frontend code (TypeScript/React)
~4,000 lines of backend code (Python/FastAPI)
18 Figma screens designed for every game state
1 ML engineer with zero prior web dev experience
~4 weeks from idea to production

What I Learned About Vibe Coding

1. AI tools are force multipliers, not magic wands

Claude Code didn't build this app for me. It built it with me. I brought domain expertise (embeddings, similarity search, ML infrastructure), and it brought web development knowledge (React patterns, CSS layouts, auth flows). The collaboration worked because both sides contributed something real.

2. The design → code pipeline matters more than the code itself

My biggest breakthrough wasn't a clever algorithm — it was the Figma → Figma MCP → Claude Code pipeline. Once I could design what I wanted visually and have AI implement it faithfully, development speed 10x'd.

3. Document your data flow or suffer

For full-stack apps, the contract between frontend and backend is everything. I wrote a detailed data flow analysis document that mapped every user action through the entire stack. This single document prevented (and debugged) more issues than any testing framework.

4. Start with requirements, not code

The AskUserQuestion session that produced the initial spec saved weeks of rework. Every time I was tempted to "just start coding," having that spec pulled me back to the actual requirements.

5. Play to your strengths, delegate the rest

I didn't try to learn CSS or React patterns from scratch. I let AI handle the frontend implementation while I focused on what I know — data flow architecture, caching strategies, and making pgvector do cool things. The result was a better product than if I'd tried to learn everything myself.

Want to Try It?

Play Hot and Cold Game →

New challenge every day. Guess a word, see how hot or cold you are. No download, no signup required.

If you're an ML engineer with a side project idea that needs a web frontend — just start. The tools are good enough now. Your domain expertise is the hard part, and you already have that.

Built with Claude Code, Figma, FastAPI, Next.js, pgvector, and OpenAI Embeddings.

Tags: #ai #webdev #gamedev #machinelearning

DEV Community