DEV Community

Cover image for What If Your Portfolio Verifier Could Actually See Your UI?
Soham
Soham Subscriber

Posted on

What If Your Portfolio Verifier Could Actually See Your UI?

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4


Cairn — Turn "I Want to Be an AI Engineer" Into a Verified Portfolio


What I Built

Every year, millions of people type something like "I want to become an AI engineer" into Google and then spend the next two weeks drowning in tabs. Roadmaps that don't know where you're starting from. YouTube playlists with no structure. Free courses with 13% completion rates. ChatGPT roadmaps that vanish the moment you close the tab.

I built Cairn because that gap is real, and the 100x-cheaper alternative to a coding bootcamp genuinely does not exist yet.

Cairn is a personalized AI learning and career engine. You tell it your goal in plain English — "I want to land an AI engineering role in 6 months, I know Python basics and built one Flask app" — and it builds you a structured 12-week path: phases, weekly milestones with actual deliverables, curated free resources scored by real learner outcomes, and projects to build. Not a static document. A living system that adapts as you go.

But the part I'm most proud of is what happens when you ship something.

You submit your GitHub repo. Cairn pulls the code, runs it through a three-stage evaluation pipeline — structural checks, LLM-powered code review, and a multimodal visual review where Gemma 4 12B literally looks at screenshots of your running app — and if you pass, it issues a cryptographically signed credential that goes on your public portfolio at cairn.dev/u/your-handle. That's the URL you put on your resume. Not "completed Course X." Verified work.

The three things every learning product gets wrong:

Problem How Cairn addresses it
Path paralysis — "what do I even learn?" Personalized 12-week path generated from your actual starting point, not a generic template
No accountability — you ghost your own goals Daily nudges, streak tracking, adaptive replanning when life happens
No proof-of-work — tutorial clones don't get interviews Multi-stage project verification with HMAC-signed credentials on a public portfolio

The product was designed for Indian engineering students and career-switchers first — a market of 10M+ active learners who are priced out of bootcamps (₹3 lakh+) but get zero structure from free resources. But the same problem exists globally, and the architecture reflects that.


Demo

🎬 5-minute walkthrough

🌐 Live app (https://cairnapp.netlify.app/)

👤 Example public portfolio — no signup needed https://cairnapp.netlify.app/example

The example portfolio is the fastest way to see what a verified Cairn profile looks like — the thing a user puts on their resume after 12 weeks.


Code

🔗 GitHub repository (https://github.com/Soham-0047/cairn)

The repo is a full-stack TypeScript monorepo:

frontend/    → Next.js 15 (App Router, SSR, Tailwind)
backend/     → Express + TypeScript + MongoDB + Mongoose
              └── llm/   → provider-agnostic router with fallback chains
Enter fullscreen mode Exit fullscreen mode

Everything brand-visible — the name, logo, colors, hero copy, CTA text, LLM provider chains — is editable at runtime via /admin without touching code. That's not an accident; it's what makes the codebase reusable for the next hackathon without a rewrite.


How I Used Gemma 4

This is the part I want to explain carefully, because I think the interesting choice isn't which Gemma 4 model I used — it's why I used three different ones for three different jobs.

The problem with "just use the biggest model"

Path generation requires loading 50+ curated resources, similar past learners' completed paths, and a user's full profile into a single prompt — then reasoning over all of it to produce a coherent 12-week plan. That needs a large model with a long context window.

Parsing a goal statement from a text box — extracting structured fields like current_skills, target_role, timeline_weeks — is a small, latency-sensitive extraction task. Using a 31B model for that would be wasteful and slow.

Reviewing a screenshot of someone's running app to check whether the UI matches what the code claims? That's not a text problem at all.

Three different shapes of job. Three different models.


Model 1 — Gemma 4 4B: Goal parsing at onboarding

When a user types their goal in plain English, that text goes to Gemma 4 4B for structured extraction.

Why 4B? It runs in ~600ms on Google AI Studio's free tier. This call happens multiple times per onboarding session as users refine their input. A 27B model here would feel sluggish and blow through rate limits before the user even starts their path. The task is bounded: extract a handful of fields from a short paragraph. 4B handles it cleanly, fast, and free.

// backend/src/llm/router.ts
{
  task: "parse_goal",
  primary: "gemma-4-4b",  // fast + cheap; perfect for extraction
  fallback: ["gemini-flash"]
}
Enter fullscreen mode Exit fullscreen mode

Model 2 — Gemma 4 27B: Path generation and code review

The heavy lifting goes to Gemma 4 27B via Google AI Studio and OpenRouter's free Gemma 4 endpoint.

Path generation loads the user's structured profile, a subset of the curated resource corpus (~50 items matched to the user's target role), and summarized outcome data from similar past learners — all in a single prompt, well inside the 128K context window. The model has to reason about dependencies between topics, realistic hours-per-week, and what kinds of projects actually demonstrate the skills required for the target role. This is not a task for a small model.

The same model handles code review at project evaluation Stage 2: reading a multi-file repo snapshot and answering whether the code does what the README claims, whether it demonstrates the skills the user said they practiced, and where specifically it falls short. Multi-file code reasoning with long context is exactly where 27B earns its place.

SYSTEM: You are an expert career coach generating a personalized learning path.
You have access to {N} similar-profile learners' actual completed paths and outcomes.
Prefer concrete projects over passive content...

[resource corpus subset]
[similar past learner paths with outcomes]
[user's profile + target role + weekly hours]

OUTPUT: JSON path schema only, no preamble.
Enter fullscreen mode Exit fullscreen mode

Model 3 — Gemma 4 12B Vision: The hero feature

This is the one I'm most excited about.

When a user submits their project, they can upload up to 4 screenshots of their running application alongside the GitHub URL. Gemma 4 12B — the vision-capable variant — looks at the actual UI and cross-references it against what the code and README claim.

This catches two failure modes that pure code review misses:

  • "Looks polished but the code is sloppy" — a nice-looking template wrapped around someone else's logic
  • "Code is technically fine but the UI is a placeholder" — the README claims a working app; the screenshot shows a 404

No other evaluation approach I know of does this. It's not just "did the tests pass." It's "show me what you actually shipped."

// eval.service.ts — Stage 3
const visualReview = await llmRouter.call({
  task: "visual_eval",
  model: "gemma-4-12b-vision",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: visualEvalPrompt(repo, readme) },
        ...screenshots.map(s => ({ type: "image_url", image_url: s }))
      ]
    }
  ]
});
Enter fullscreen mode Exit fullscreen mode

The evaluation page shows the user exactly which provider and model ran each stage. That transparency is intentional — the whole model selection story is visible to anyone who submits a project, not just buried in the docs.


The full pipeline

User submits GitHub repo + screenshots
         │
         ▼
Stage 1: Structural (deterministic)
  • README present?
  • Commit count + history
  • Tests exist?
  • File tree size reasonable?
         │
         ▼
Stage 2: Code review  →  Gemma 4 27B
  • Does code match README claims?
  • Originality vs known tutorial repos
  • Domain-specific checks (ML training loops? Backend auth?)
         │
         ▼
Stage 3: Visual review  →  Gemma 4 12B (vision)
  • Does the UI match what the code claims?
  • Polish level: shipped / demo / prototype?
  • Per-screenshot findings
         │
         ▼
Stage 4: Synthesis
  • Weighted score (pass threshold: ≥0.65 + originality ≥0.55)
  • If passing: HMAC-signed credential → public portfolio
  • If failing: specific, actionable feedback
Enter fullscreen mode Exit fullscreen mode

Provider routing + fallback

The LLM router is a ~300-line module that tracks rate-limit headroom per provider in Redis and automatically falls back when free tiers are exhausted.

Google AI Studio  →  OpenRouter (Gemma 4 free)  →  Gemini 2.5 Pro  →  DeepSeek V3
Enter fullscreen mode Exit fullscreen mode

No user-visible failure. No surprise bills — there's a monthly cost ceiling per route; if every free tier is exhausted, the router refuses the call and shows a graceful degradation message rather than hitting a paid endpoint uncontrolled.

The entire routing table is editable at runtime via /admin/providers. Swapping Gemma 4 out for any other model family is a UI change, not a code change. I built it that way deliberately.


Why this is a Gemma 4 submission and not a "Gemini with Gemma branding" submission

All three Gemma variants are running on Google AI Studio's free tier as the primary provider. The fallback for 27B tasks is OpenRouter's free google/gemma-4-27b endpoint — not Gemini. The vision eval has no non-Gemma fallback; if Gemma 4 12B is unavailable, Stage 3 is skipped and the evaluation is flagged as "code-only review." The multimodal story only works with Gemma 4.


Tech Stack

  • Frontend: Next.js 15 (App Router, SSR), React 18, TypeScript, Tailwind CSS, NextAuth
  • Backend: Node.js 20, Express, TypeScript, Mongoose, MongoDB Atlas
  • LLM routing: Custom provider-agnostic router — Google AI Studio, OpenRouter, Groq, Cerebras, Together AI
  • Storage: MongoDB Atlas (paths/progress), Cloudflare R2 (screenshots)
  • Auth: GitHub OAuth via NextAuth
  • Payments: Razorpay (India UPI + cards), Stripe (global)

Built solo, in public, for the people who are trying to get their first AI engineering role and can't afford to spend ₹3 lakh on a bootcamp. If that's you — or if you know someone it describes — Cairn is built for exactly that.

Top comments (0)