I Built an ML-Powered Adaptive IQ Test in Next.js 14 — Here's Exactly How It Works
Tags: nextjs typescript machinelearning webdev
Most online IQ tests are broken.
They ask the same 20 questions to everyone — whether you're a 14-year-old high school student or a 35-year-old PhD. They don't adapt. They don't remember what you've already answered. And after two or three retakes, you've memorised the answers. The score stops meaning anything.
I got frustrated enough to build my own. After a few months of work, I launched IQ Platform — a free, fully adaptive cognitive assessment tool that calibrates question difficulty to your age, education, and occupation, scores you using a multi-feature ML regression model, and remembers which questions you've already seen across sessions.
Here's the complete technical breakdown of how I built it.
The Stack
- Next.js 14 (App Router)
- TypeScript throughout
- Tailwind CSS for styling
- Recharts for radar and trend charts
- Custom ML scoring engine (no external ML library — pure math)
- localStorage for session history (no backend, no database)
- Vercel for deployment
No Auth. No database. No backend API. Everything meaningful happens on the client or at build time.
The Core Problem: Static Tests Are Meaningless
Here's why standard online IQ tests fail:
Standard test:
User A: 14 years old, high school → gets "What is 30% of 200?"
User B: 26 years old, PhD CS → gets "What is 30% of 200?"
Both answer correctly. Both score the same.
But the question told you nothing useful about User B.
The solution is adaptive difficulty — selecting questions based on who is actually taking the test. This is how real psychometric assessments like the WAIS-IV work. They calibrate to the individual.
Part 1: The Question Bank (147 Questions, 6 Domains)
The question bank covers six cognitive domains:
| Domain | What it measures |
|---|---|
| Numerical | Arithmetic, algebra, ratios, percentages, logarithms |
| Verbal | Vocabulary, analogies, anagrams, antonyms |
| Pattern | Number series, letter sequences, figurate numbers |
| Logical | Syllogisms, seating arrangements, propositional logic |
| Memory | Sequence recall, list recognition, position memory |
| Spatial | 3D geometry, rotations, Euler's formula, clock angles |
Each question has a difficulty rating from 1 (school level) to 5 (postgraduate/doctorate level), plus optional minAge, maxAge, and tags (STEM, arts, language, general):
export interface Question {
id: string
category: QuestionCategory
difficulty: 1 | 2 | 3 | 4 | 5
minAge?: number
maxAge?: number
tags?: string[] // 'stem' | 'arts' | 'language' | 'general'
text: string
options: string[]
correctIndex: number
explanation: string
timeLimit: number // seconds per question
}
A difficulty-5 Logical question looks like this:
{
id: 'l8', category: 'Logical', difficulty: 5, tags: ['stem'],
text: 'Cards A, D, 4, 7. Which to flip to verify "vowel → even on other side"?',
options: ['A and 4', 'A and 7', 'D and 4', 'A, D and 4'],
correctIndex: 1, timeLimit: 40,
explanation: 'Flip A (check even) and 7 (check no vowel). Wason selection task.'
}
While a difficulty-1 version looks like:
{
id: 'l1', category: 'Logical', difficulty: 1, minAge: 10, maxAge: 16,
text: 'All cats are animals. Whiskers is a cat. Therefore:',
options: ['Whiskers is an animal', 'All animals are cats', ...],
correctIndex: 0, timeLimit: 15,
explanation: 'Simple syllogism: cat → animal'
}
Part 2: Profile-Aware Adaptive Selection
When a user fills in their profile, the system computes a difficulty target based on three inputs:
export const EDU_DIFFICULTY: Record<string, number> = {
school: 1.5,
diploma: 2.2,
undergraduate: 2.8,
postgraduate: 3.5,
doctorate: 4.2,
}
export function getDifficultyRange(profile: UserProfile) {
let target = EDU_DIFFICULTY[profile.education] ?? 2.5
// Age-based adjustment
if (profile.age <= 13) target = Math.min(target, 1.5)
else if (profile.age <= 16) target = Math.min(target, 2.2)
else if (profile.age <= 19) target = Math.min(target, 3.0)
else if (profile.age >= 30 && isStem(profile.occupation)) {
target = Math.max(target, 3.2)
}
return {
min: Math.max(1, Math.floor(target - 1)),
max: Math.min(5, Math.ceil(target + 1.5)),
target
}
}
So a 22-year-old B.Tech CS student gets a difficulty window of 2–4, while a 14-year-old high school student gets 1–2. The InfoScreen even shows a live difficulty tier preview as the user fills out the form — "Advanced tier — challenging questions requiring deeper reasoning" — updating in real time.
The isStem() helper checks the occupation string for keywords:
function isStem(occ: string): boolean {
return ['engineer', 'developer', 'programmer', 'data', 'scientist',
'computer', 'software', 'hardware', 'cs', 'iot', 'electronics',
'ai', 'ml', 'cyber', 'network']
.some(k => occ.toLowerCase().includes(k))
}
STEM occupations get Numerical, Pattern, and Logical questions boosted. Arts/language occupations get Verbal questions prioritised.
Part 3: The Anti-Repeat System
This was the feature that prompted the whole project. After 2–3 retakes, users were memorising answers.
The solution: track recently seen question IDs in localStorage and always serve unseen questions first.
const SEEN_KEY = 'iq_platform_seen_questions_v2'
const MAX_SEEN = 80 // remember last 80 question IDs
function getSeenIds(): Set<string> {
try {
const raw = localStorage.getItem(SEEN_KEY)
return raw ? new Set(JSON.parse(raw)) : new Set()
} catch { return new Set() }
}
function markAsSeen(ids: string[]): void {
const existing = [...getSeenIds()]
const updated = [...ids, ...existing].slice(0, MAX_SEEN)
localStorage.setItem(SEEN_KEY, JSON.stringify(updated))
}
During selection, unseen questions are sorted to the front:
const unseen = pool
.filter(q => !seenIds.has(q.id))
.sort(() => Math.random() - 0.5)
const seen = pool
.filter(q => seenIds.has(q.id))
.sort(() => Math.random() - 0.5)
// Always prefer unseen
const prioritised = [...unseen, ...seen]
selected.push(...prioritised.slice(0, perCategory))
With 147 questions and 20 per test, a user can take 7+ tests before any question repeats — and when repeats do happen, they're shuffled into different positions and contexts.
Part 4: The ML Scoring Engine
This is the most interesting part technically. Instead of just calculating a percentage, I built a 7-feature weighted linear regression model that maps raw performance to an IQ estimate.
Feature Extraction
const features = {
accuracyScore, // proportion of correct answers
speedScore, // based on average time per correct answer
consistencyScore, // penalises random correct answers (guessing)
difficultyWeighted, // harder correct answers worth more
categoryBalance, // uniform performance > spikey
adaptiveScore, // penalises timed-out questions
educationNorm, // calibrates for education baseline
ageNorm, // peak window (16-35) = 1.0
}
Speed scoring — not just "was it right" but "how quickly":
function calcSpeed(answers: TestAnswer[]): number {
const correct = answers.filter(a => a.correct && !a.timedOut)
const avg = correct.reduce((s, a) => s + a.timeSpent, 0) / correct.length
if (avg < 6) return 1.0
if (avg < 10) return 0.9
if (avg < 15) return 0.8
if (avg < 20) return 0.7
if (avg < 25) return 0.55
return 0.4
}
Consistency scoring — detects guessing by measuring variance within each domain:
function calcConsistency(answers: TestAnswer[]): number {
const byCategory: Record<string, boolean[]> = {}
answers.forEach(a => {
if (!byCategory[a.category]) byCategory[a.category] = []
byCategory[a.category].push(a.correct)
})
let totalVariance = 0, categories = 0
for (const cat in byCategory) {
const results = byCategory[cat]
if (results.length < 2) continue
const mean = results.filter(Boolean).length / results.length
const variance = results.reduce(
(s, r) => s + Math.pow((r ? 1 : 0) - mean, 2), 0
) / results.length
totalVariance += variance
categories++
}
return clamp(1 - (totalVariance / categories) * 1.5, 0.3, 1.0)
}
If you get 3 correct and 3 wrong alternating in the same category, that high variance lowers your consistency score — which reduces your final IQ estimate.
The Regression
const WEIGHTS = {
accuracy: 0.42,
speed: 0.12,
consistency: 0.10,
difficulty: 0.22,
balance: 0.08,
adaptive: 0.06,
}
const rawLinear =
WEIGHTS.accuracy * accuracyScore +
WEIGHTS.speed * speedScore +
WEIGHTS.consistency * consistencyScore +
WEIGHTS.difficulty * difficultyWeighted +
WEIGHTS.balance * categoryBalance +
WEIGHTS.adaptive * adaptiveScore
Mapping to IQ Scale
Raw score (0–1) maps to IQ (70–145) using a z-score approach with the normal distribution:
// Map raw score to z-score, scale by education and age norms
const zScore = (rawScore - 0.5) * 4.2 * educationNorm * ageNorm
// IQ = mean + z * SD (mean=100, SD=15)
const iqEstimate = Math.round(clamp(100 + zScore * 15, 70, 145))
// 90% confidence interval
const ciMargin = Math.round(8 + (1 - consistencyScore) * 6)
const confidenceInterval = [iqEstimate - ciMargin, iqEstimate + ciMargin]
A raw score of 0.5 maps to exactly IQ 100 (average). A raw score of 1.0 maps to approximately IQ 135. Demographic norms adjust the z-score — a doctorate-level user (educationNorm = 1.07) needs a higher raw performance to achieve the same IQ estimate as an undergraduate (educationNorm = 1.0), reflecting calibration to education-group baselines.
The percentile is calculated using a proper normal distribution CDF approximated via the Abramowitz and Stegun error function:
function erf(x: number): number {
const sign = x >= 0 ? 1 : -1; x = Math.abs(x)
const a1=0.254829592, a2=-0.284496736, a3=1.421413741,
a4=-1.453152027, a5=1.061405429, p=0.3275911
const t = 1/(1+p*x)
const y = 1-(((((a5*t+a4)*t)+a3)*t+a2)*t+a1)*t*Math.exp(-x*x)
return sign*y
}
const percentile = Math.round(
0.5 * (1 + erf((iqEstimate - 100) / (15 * Math.sqrt(2)))) * 100
)
Part 5: Per-User History Without a Backend
Session history is stored entirely in localStorage, keyed by a normalised username:
export function normaliseUserKey(name: string): string {
return name.trim().toLowerCase().replace(/\s+/g, ' ')
}
This means "Om", "om", and "OM" all map to the same history bucket — but "Om" and "Rahul" are completely separate. No login required.
export function saveSession(profile, result, totalQ, correct, avgTime, diffRange) {
const session: TestSession = {
id: `session_${Date.now()}_${Math.random().toString(36).slice(2, 7)}`,
timestamp: Date.now(),
userKey: normaliseUserKey(profile.name),
iqEstimate: result.iqEstimate,
categoryScores: result.categoryScores,
mlFeatures: result.mlFeatures,
difficultyRange: diffRange,
// ...
}
const existing = getAllSessions()
localStorage.setItem(STORAGE_KEY, JSON.stringify(
[session, ...existing].slice(0, 100)
))
}
The History page fetches all sessions, groups them by userKey, and renders per-user filter tabs. The IQ trend chart (Recharts LineChart) shows your score trajectory across sessions. Domain averages accumulate lifetime category accuracy.
Part 6: The Neural Network Homepage
The homepage hero is a fully custom animated neural network rendered on a <canvas> element — no library, pure browser APIs.
38 nodes drift slowly across the canvas. Synaptic connections form between nodes within 160px of each other, with opacity proportional to distance. Signal particles spawn on random connections and travel along them with a glow effect.
function spawnSignal() {
const from = Math.floor(Math.random() * nodes.length)
// Find nearest neighbour within 180px
let best = -1, bestDist = Infinity
nodes.forEach((n, i) => {
if (i === from) return
const d = Math.hypot(n.x - nodes[from].x, n.y - nodes[from].y)
if (d < 180 && d < bestDist) { bestDist = d; best = i }
})
if (best !== -1) signals.push({ fromIdx: from, toIdx: best, t: 0, ... })
}
// Signal particle rendering
signals.forEach(sig => {
sig.t += sig.speed
const from = nodes[sig.fromIdx], to = nodes[sig.toIdx]
const x = from.x + (to.x - from.x) * sig.t
const y = from.y + (to.y - from.y) * sig.t
// Radial gradient glow
const grad = ctx.createRadialGradient(x, y, 0, x, y, 8)
grad.addColorStop(0, sig.color + 'dd')
grad.addColorStop(1, sig.color + '00')
ctx.fillStyle = grad
ctx.arc(x, y, 8, 0, Math.PI * 2)
ctx.fill()
})
It runs at 60fps on modern hardware and falls back gracefully on mobile.
What I Learned
1. Defining "adaptive" is harder than it sounds. My first version just randomised questions. Then I added difficulty tiers. Then age ranges. Then occupation tags. Each layer felt necessary once I had the previous one. The question is always: what dimensions of the user actually matter for question selection?
2. localStorage is surprisingly capable for personal data. No backend, no auth, no GDPR nightmare. For a tool that's genuinely private by design, browser storage is the right call — not a compromise.
3. Weighted regression beats percentage scoring immediately. When I switched from correct/total * 100 to the 7-feature model, the results suddenly felt more meaningful. A user who answers 12/20 quickly on hard questions scores very differently from one who answers 12/20 slowly on easy questions — as they should.
4. The canvas animation took longer than the ML engine. The neural network hero is purely cosmetic but it's what people notice first and share. Don't underestimate the value of a visually distinctive entry point.
5. People want to retake immediately. The most common feedback in the first week was "I want to try again." The anti-repeat system was added within 48 hours of launching. Build for the thing people actually do, not the thing you imagined they'd do.
Live Demo & Source
🔗 Try it: https://iq-platform-plum.vercel.app
Built during my final year B.Tech in Computer Science at COER University, Roorkee.
The platform is free, has no login, and stores nothing on a server. Questions are selected fresh for each session based on your profile, and the ML engine shows you a full breakdown of every feature that contributed to your score.
Would love your feedback — especially on the ML model weights and whether the adaptive selection feels right across different education levels. Drop a comment or connect on LinkedIn.
If you found this useful, consider leaving a reaction — it helps other developers find it. 🦄
Top comments (0)