DEV Community

Cover image for ML-Powered Adaptive IQ Test
Om Prakash
Om Prakash

Posted on

ML-Powered Adaptive IQ Test

I Built an ML-Powered Adaptive IQ Test in Next.js 14 — Here's Exactly How It Works

Tags: nextjs typescript machinelearning webdev


Most online IQ tests are broken.

They ask the same 20 questions to everyone — whether you're a 14-year-old high school student or a 35-year-old PhD. They don't adapt. They don't remember what you've already answered. And after two or three retakes, you've memorised the answers. The score stops meaning anything.

I got frustrated enough to build my own. After a few months of work, I launched IQ Platform — a free, fully adaptive cognitive assessment tool that calibrates question difficulty to your age, education, and occupation, scores you using a multi-feature ML regression model, and remembers which questions you've already seen across sessions.

Here's the complete technical breakdown of how I built it.


The Stack

  • Next.js 14 (App Router)
  • TypeScript throughout
  • Tailwind CSS for styling
  • Recharts for radar and trend charts
  • Custom ML scoring engine (no external ML library — pure math)
  • localStorage for session history (no backend, no database)
  • Vercel for deployment

No Auth. No database. No backend API. Everything meaningful happens on the client or at build time.


The Core Problem: Static Tests Are Meaningless

Here's why standard online IQ tests fail:

Standard test:
  User A: 14 years old, high school → gets "What is 30% of 200?"
  User B: 26 years old, PhD CS      → gets "What is 30% of 200?"

Both answer correctly. Both score the same. 
But the question told you nothing useful about User B.
Enter fullscreen mode Exit fullscreen mode

The solution is adaptive difficulty — selecting questions based on who is actually taking the test. This is how real psychometric assessments like the WAIS-IV work. They calibrate to the individual.


Part 1: The Question Bank (147 Questions, 6 Domains)

The question bank covers six cognitive domains:

Domain What it measures
Numerical Arithmetic, algebra, ratios, percentages, logarithms
Verbal Vocabulary, analogies, anagrams, antonyms
Pattern Number series, letter sequences, figurate numbers
Logical Syllogisms, seating arrangements, propositional logic
Memory Sequence recall, list recognition, position memory
Spatial 3D geometry, rotations, Euler's formula, clock angles

Each question has a difficulty rating from 1 (school level) to 5 (postgraduate/doctorate level), plus optional minAge, maxAge, and tags (STEM, arts, language, general):

export interface Question {
  id: string
  category: QuestionCategory
  difficulty: 1 | 2 | 3 | 4 | 5
  minAge?: number
  maxAge?: number
  tags?: string[]   // 'stem' | 'arts' | 'language' | 'general'
  text: string
  options: string[]
  correctIndex: number
  explanation: string
  timeLimit: number  // seconds per question
}
Enter fullscreen mode Exit fullscreen mode

A difficulty-5 Logical question looks like this:

{
  id: 'l8', category: 'Logical', difficulty: 5, tags: ['stem'],
  text: 'Cards A, D, 4, 7. Which to flip to verify "vowel → even on other side"?',
  options: ['A and 4', 'A and 7', 'D and 4', 'A, D and 4'],
  correctIndex: 1, timeLimit: 40,
  explanation: 'Flip A (check even) and 7 (check no vowel). Wason selection task.'
}
Enter fullscreen mode Exit fullscreen mode

While a difficulty-1 version looks like:

{
  id: 'l1', category: 'Logical', difficulty: 1, minAge: 10, maxAge: 16,
  text: 'All cats are animals. Whiskers is a cat. Therefore:',
  options: ['Whiskers is an animal', 'All animals are cats', ...],
  correctIndex: 0, timeLimit: 15,
  explanation: 'Simple syllogism: cat → animal'
}
Enter fullscreen mode Exit fullscreen mode

Part 2: Profile-Aware Adaptive Selection

When a user fills in their profile, the system computes a difficulty target based on three inputs:

export const EDU_DIFFICULTY: Record<string, number> = {
  school: 1.5,
  diploma: 2.2,
  undergraduate: 2.8,
  postgraduate: 3.5,
  doctorate: 4.2,
}

export function getDifficultyRange(profile: UserProfile) {
  let target = EDU_DIFFICULTY[profile.education] ?? 2.5

  // Age-based adjustment
  if (profile.age <= 13) target = Math.min(target, 1.5)
  else if (profile.age <= 16) target = Math.min(target, 2.2)
  else if (profile.age <= 19) target = Math.min(target, 3.0)
  else if (profile.age >= 30 && isStem(profile.occupation)) {
    target = Math.max(target, 3.2)
  }

  return {
    min: Math.max(1, Math.floor(target - 1)),
    max: Math.min(5, Math.ceil(target + 1.5)),
    target
  }
}
Enter fullscreen mode Exit fullscreen mode

So a 22-year-old B.Tech CS student gets a difficulty window of 2–4, while a 14-year-old high school student gets 1–2. The InfoScreen even shows a live difficulty tier preview as the user fills out the form — "Advanced tier — challenging questions requiring deeper reasoning" — updating in real time.

The isStem() helper checks the occupation string for keywords:

function isStem(occ: string): boolean {
  return ['engineer', 'developer', 'programmer', 'data', 'scientist',
    'computer', 'software', 'hardware', 'cs', 'iot', 'electronics',
    'ai', 'ml', 'cyber', 'network']
    .some(k => occ.toLowerCase().includes(k))
}
Enter fullscreen mode Exit fullscreen mode

STEM occupations get Numerical, Pattern, and Logical questions boosted. Arts/language occupations get Verbal questions prioritised.


Part 3: The Anti-Repeat System

This was the feature that prompted the whole project. After 2–3 retakes, users were memorising answers.

The solution: track recently seen question IDs in localStorage and always serve unseen questions first.

const SEEN_KEY = 'iq_platform_seen_questions_v2'
const MAX_SEEN = 80  // remember last 80 question IDs

function getSeenIds(): Set<string> {
  try {
    const raw = localStorage.getItem(SEEN_KEY)
    return raw ? new Set(JSON.parse(raw)) : new Set()
  } catch { return new Set() }
}

function markAsSeen(ids: string[]): void {
  const existing = [...getSeenIds()]
  const updated = [...ids, ...existing].slice(0, MAX_SEEN)
  localStorage.setItem(SEEN_KEY, JSON.stringify(updated))
}
Enter fullscreen mode Exit fullscreen mode

During selection, unseen questions are sorted to the front:

const unseen = pool
  .filter(q => !seenIds.has(q.id))
  .sort(() => Math.random() - 0.5)

const seen = pool
  .filter(q => seenIds.has(q.id))
  .sort(() => Math.random() - 0.5)

// Always prefer unseen
const prioritised = [...unseen, ...seen]
selected.push(...prioritised.slice(0, perCategory))
Enter fullscreen mode Exit fullscreen mode

With 147 questions and 20 per test, a user can take 7+ tests before any question repeats — and when repeats do happen, they're shuffled into different positions and contexts.


Part 4: The ML Scoring Engine

This is the most interesting part technically. Instead of just calculating a percentage, I built a 7-feature weighted linear regression model that maps raw performance to an IQ estimate.

Feature Extraction

const features = {
  accuracyScore,        // proportion of correct answers
  speedScore,           // based on average time per correct answer
  consistencyScore,     // penalises random correct answers (guessing)
  difficultyWeighted,   // harder correct answers worth more
  categoryBalance,      // uniform performance > spikey
  adaptiveScore,        // penalises timed-out questions
  educationNorm,        // calibrates for education baseline
  ageNorm,              // peak window (16-35) = 1.0
}
Enter fullscreen mode Exit fullscreen mode

Speed scoring — not just "was it right" but "how quickly":

function calcSpeed(answers: TestAnswer[]): number {
  const correct = answers.filter(a => a.correct && !a.timedOut)
  const avg = correct.reduce((s, a) => s + a.timeSpent, 0) / correct.length
  if (avg < 6)  return 1.0
  if (avg < 10) return 0.9
  if (avg < 15) return 0.8
  if (avg < 20) return 0.7
  if (avg < 25) return 0.55
  return 0.4
}
Enter fullscreen mode Exit fullscreen mode

Consistency scoring — detects guessing by measuring variance within each domain:

function calcConsistency(answers: TestAnswer[]): number {
  const byCategory: Record<string, boolean[]> = {}
  answers.forEach(a => {
    if (!byCategory[a.category]) byCategory[a.category] = []
    byCategory[a.category].push(a.correct)
  })

  let totalVariance = 0, categories = 0
  for (const cat in byCategory) {
    const results = byCategory[cat]
    if (results.length < 2) continue
    const mean = results.filter(Boolean).length / results.length
    const variance = results.reduce(
      (s, r) => s + Math.pow((r ? 1 : 0) - mean, 2), 0
    ) / results.length
    totalVariance += variance
    categories++
  }
  return clamp(1 - (totalVariance / categories) * 1.5, 0.3, 1.0)
}
Enter fullscreen mode Exit fullscreen mode

If you get 3 correct and 3 wrong alternating in the same category, that high variance lowers your consistency score — which reduces your final IQ estimate.

The Regression

const WEIGHTS = {
  accuracy:      0.42,
  speed:         0.12,
  consistency:   0.10,
  difficulty:    0.22,
  balance:       0.08,
  adaptive:      0.06,
}

const rawLinear =
  WEIGHTS.accuracy  * accuracyScore         +
  WEIGHTS.speed     * speedScore            +
  WEIGHTS.consistency * consistencyScore    +
  WEIGHTS.difficulty  * difficultyWeighted  +
  WEIGHTS.balance   * categoryBalance       +
  WEIGHTS.adaptive  * adaptiveScore
Enter fullscreen mode Exit fullscreen mode

Mapping to IQ Scale

Raw score (0–1) maps to IQ (70–145) using a z-score approach with the normal distribution:

// Map raw score to z-score, scale by education and age norms
const zScore = (rawScore - 0.5) * 4.2 * educationNorm * ageNorm

// IQ = mean + z * SD (mean=100, SD=15)
const iqEstimate = Math.round(clamp(100 + zScore * 15, 70, 145))

// 90% confidence interval
const ciMargin = Math.round(8 + (1 - consistencyScore) * 6)
const confidenceInterval = [iqEstimate - ciMargin, iqEstimate + ciMargin]
Enter fullscreen mode Exit fullscreen mode

A raw score of 0.5 maps to exactly IQ 100 (average). A raw score of 1.0 maps to approximately IQ 135. Demographic norms adjust the z-score — a doctorate-level user (educationNorm = 1.07) needs a higher raw performance to achieve the same IQ estimate as an undergraduate (educationNorm = 1.0), reflecting calibration to education-group baselines.

The percentile is calculated using a proper normal distribution CDF approximated via the Abramowitz and Stegun error function:

function erf(x: number): number {
  const sign = x >= 0 ? 1 : -1; x = Math.abs(x)
  const a1=0.254829592, a2=-0.284496736, a3=1.421413741,
        a4=-1.453152027, a5=1.061405429, p=0.3275911
  const t = 1/(1+p*x)
  const y = 1-(((((a5*t+a4)*t)+a3)*t+a2)*t+a1)*t*Math.exp(-x*x)
  return sign*y
}

const percentile = Math.round(
  0.5 * (1 + erf((iqEstimate - 100) / (15 * Math.sqrt(2)))) * 100
)
Enter fullscreen mode Exit fullscreen mode

Part 5: Per-User History Without a Backend

Session history is stored entirely in localStorage, keyed by a normalised username:

export function normaliseUserKey(name: string): string {
  return name.trim().toLowerCase().replace(/\s+/g, ' ')
}
Enter fullscreen mode Exit fullscreen mode

This means "Om", "om", and "OM" all map to the same history bucket — but "Om" and "Rahul" are completely separate. No login required.

export function saveSession(profile, result, totalQ, correct, avgTime, diffRange) {
  const session: TestSession = {
    id: `session_${Date.now()}_${Math.random().toString(36).slice(2, 7)}`,
    timestamp: Date.now(),
    userKey: normaliseUserKey(profile.name),
    iqEstimate: result.iqEstimate,
    categoryScores: result.categoryScores,
    mlFeatures: result.mlFeatures,
    difficultyRange: diffRange,
    // ...
  }
  const existing = getAllSessions()
  localStorage.setItem(STORAGE_KEY, JSON.stringify(
    [session, ...existing].slice(0, 100)
  ))
}
Enter fullscreen mode Exit fullscreen mode

The History page fetches all sessions, groups them by userKey, and renders per-user filter tabs. The IQ trend chart (Recharts LineChart) shows your score trajectory across sessions. Domain averages accumulate lifetime category accuracy.


Part 6: The Neural Network Homepage

The homepage hero is a fully custom animated neural network rendered on a <canvas> element — no library, pure browser APIs.

38 nodes drift slowly across the canvas. Synaptic connections form between nodes within 160px of each other, with opacity proportional to distance. Signal particles spawn on random connections and travel along them with a glow effect.

function spawnSignal() {
  const from = Math.floor(Math.random() * nodes.length)
  // Find nearest neighbour within 180px
  let best = -1, bestDist = Infinity
  nodes.forEach((n, i) => {
    if (i === from) return
    const d = Math.hypot(n.x - nodes[from].x, n.y - nodes[from].y)
    if (d < 180 && d < bestDist) { bestDist = d; best = i }
  })
  if (best !== -1) signals.push({ fromIdx: from, toIdx: best, t: 0, ... })
}

// Signal particle rendering
signals.forEach(sig => {
  sig.t += sig.speed
  const from = nodes[sig.fromIdx], to = nodes[sig.toIdx]
  const x = from.x + (to.x - from.x) * sig.t
  const y = from.y + (to.y - from.y) * sig.t

  // Radial gradient glow
  const grad = ctx.createRadialGradient(x, y, 0, x, y, 8)
  grad.addColorStop(0, sig.color + 'dd')
  grad.addColorStop(1, sig.color + '00')
  ctx.fillStyle = grad
  ctx.arc(x, y, 8, 0, Math.PI * 2)
  ctx.fill()
})
Enter fullscreen mode Exit fullscreen mode

It runs at 60fps on modern hardware and falls back gracefully on mobile.


What I Learned

1. Defining "adaptive" is harder than it sounds. My first version just randomised questions. Then I added difficulty tiers. Then age ranges. Then occupation tags. Each layer felt necessary once I had the previous one. The question is always: what dimensions of the user actually matter for question selection?

2. localStorage is surprisingly capable for personal data. No backend, no auth, no GDPR nightmare. For a tool that's genuinely private by design, browser storage is the right call — not a compromise.

3. Weighted regression beats percentage scoring immediately. When I switched from correct/total * 100 to the 7-feature model, the results suddenly felt more meaningful. A user who answers 12/20 quickly on hard questions scores very differently from one who answers 12/20 slowly on easy questions — as they should.

4. The canvas animation took longer than the ML engine. The neural network hero is purely cosmetic but it's what people notice first and share. Don't underestimate the value of a visually distinctive entry point.

5. People want to retake immediately. The most common feedback in the first week was "I want to try again." The anti-repeat system was added within 48 hours of launching. Build for the thing people actually do, not the thing you imagined they'd do.


Live Demo & Source

🔗 Try it: https://iq-platform-plum.vercel.app

Built during my final year B.Tech in Computer Science at COER University, Roorkee.

The platform is free, has no login, and stores nothing on a server. Questions are selected fresh for each session based on your profile, and the ML engine shows you a full breakdown of every feature that contributed to your score.

Would love your feedback — especially on the ML model weights and whether the adaptive selection feels right across different education levels. Drop a comment or connect on LinkedIn.


If you found this useful, consider leaving a reaction — it helps other developers find it. 🦄

Top comments (0)