<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: miad</title>
    <description>The latest articles on DEV Community by miad (@miad_ea7faef80e5125861119).</description>
    <link>https://dev.to/miad_ea7faef80e5125861119</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2718242%2Feb94286d-9e06-4ca8-98d0-05ece93d8800.png</url>
      <title>DEV Community: miad</title>
      <link>https://dev.to/miad_ea7faef80e5125861119</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/miad_ea7faef80e5125861119"/>
    <language>en</language>
    <item>
      <title>How adaptive testing converges on cert readiness in 25 questions</title>
      <dc:creator>miad</dc:creator>
      <pubDate>Fri, 15 May 2026 16:33:02 +0000</pubDate>
      <link>https://dev.to/miad_ea7faef80e5125861119/how-adaptive-testing-converges-on-cert-readiness-in-25-questions-32gl</link>
      <guid>https://dev.to/miad_ea7faef80e5125861119/how-adaptive-testing-converges-on-cert-readiness-in-25-questions-32gl</guid>
      <description>&lt;h1&gt;
  
  
  How adaptive testing converges on cert readiness in 25 questions
&lt;/h1&gt;

&lt;p&gt;A well-built adaptive test is binary search for skill level.&lt;/p&gt;

&lt;p&gt;You start at the middle of the difficulty range. Get one right,&lt;br&gt;
  the next question shifts harder. Get it wrong, it drops back.&lt;br&gt;
  Each answer halves the uncertainty band around your real skill&lt;br&gt;
  estimate. By question 15, you know more about a learner than a&lt;br&gt;
  fixed 50-question test does at the end.&lt;/p&gt;

&lt;p&gt;This is computerized adaptive testing (CAT), and it's the most&lt;br&gt;
  underused idea in cert prep.&lt;/p&gt;

&lt;p&gt;## The IRT engine underneath&lt;/p&gt;

&lt;p&gt;Item response theory (IRT) is what makes it work. Every question&lt;br&gt;
  in the bank has a calibrated difficulty on a continuous scale.&lt;br&gt;
  Every learner has a latent skill value on the same scale. The&lt;br&gt;
  algorithm's job is to estimate that value as fast as possible.&lt;/p&gt;

&lt;p&gt;After each answer, it does two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Updates the point estimate of your skill (Bayesian update,
 roughly)&lt;/li&gt;
&lt;li&gt;Picks the next question whose calibrated difficulty sits closest
 to the current estimate&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That second step is the key. A question that's far above your&lt;br&gt;
  current estimate is mostly noise. A question at your current&lt;br&gt;
  estimate is maximally informative. The algorithm isn't picking&lt;br&gt;
  "the next hard question" or "the next easy question." It's picking&lt;br&gt;
  the one most likely to shrink the confidence interval.&lt;/p&gt;

&lt;p&gt;## Why convergence is geometric, not linear&lt;/p&gt;

&lt;p&gt;The uncertainty band doesn't shrink uniformly. It shrinks fast&lt;br&gt;
  early, then the gains flatten.&lt;/p&gt;

&lt;p&gt;After 8 questions, the band is wide but already useful. After 15,&lt;br&gt;
  it's narrow enough to act on for most learners. After 25, the&lt;br&gt;
  marginal information per question has dropped close to zero.&lt;br&gt;
  Asking question 26 is roughly a coin flip on whether it tells&lt;br&gt;
  you anything new.&lt;/p&gt;

&lt;p&gt;This is why stopping at 25 isn't a shortcut. It's the point&lt;br&gt;
  where continuing would add fatigue and noise, not signal. Fixed&lt;br&gt;
  50-question pretests are a holdover from paper testing. They&lt;br&gt;
  survived in software because they're easier to build and look&lt;br&gt;
  more thorough. They aren't.&lt;/p&gt;

&lt;p&gt;## The output is per-domain, not a percentage&lt;/p&gt;

&lt;p&gt;Most cert pretests return a single number. 74%. Which tells you&lt;br&gt;
  almost nothing useful about what to study next.&lt;/p&gt;

&lt;p&gt;A real CAT returns a skill estimate per domain, because failure&lt;br&gt;
  modes are domain-specific. AWS SAA-C03 might return:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Networking: Proficient (82)&lt;/li&gt;
&lt;li&gt;Storage: Developing (47)&lt;/li&gt;
&lt;li&gt;Security: Novice (24)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;"Security: Novice" means start the roadmap on Security.&lt;br&gt;
  "Networking: Proficient" means one validation milestone, not&lt;br&gt;
  six. The prep plan is different for each learner. That's the&lt;br&gt;
  whole point.&lt;/p&gt;

&lt;p&gt;## Where it breaks down&lt;/p&gt;

&lt;p&gt;Three failure modes worth knowing if you're building something&lt;br&gt;
  similar:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Narrow item bank.&lt;/strong&gt; If the bank doesn't have well-calibrated&lt;br&gt;
  items at the high end, confidence can't push past a ceiling&lt;br&gt;
  no matter how well the learner answers. They cap at Competent&lt;br&gt;
  on a domain they actually own at Proficient. Fix: bank breadth,&lt;br&gt;
  tracked per cert.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intentional gaming.&lt;/strong&gt; Deliberately answering easy items wrong&lt;br&gt;
  to see harder ones. The algorithm obliges, then climbs back.&lt;br&gt;
  The estimate converges on the gamed pattern. Can't distinguish&lt;br&gt;
  intent from skill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sparse domain coverage.&lt;/strong&gt; On certs with many domains, the&lt;br&gt;
  CAT sometimes stops before sampling all of them. Untouched&lt;br&gt;
  domains report as the lowest level by default. Not a failure,&lt;br&gt;
  but an absence of signal.&lt;/p&gt;

&lt;p&gt;None of these are unique to CAT. Fixed-length tests have worse&lt;br&gt;
  versions of each, plus the personalization cost on top.&lt;/p&gt;




&lt;p&gt;I wrote a longer breakdown of the exact stopping conditions,&lt;br&gt;
  the 95% confidence threshold, and how the domain-level output&lt;br&gt;
  drives a personalized roadmap in the ClaudeLab docs:&lt;br&gt;
  &lt;a href="https://doc.claudelab.me/articles/cat-evaluation-explained" rel="noopener noreferrer"&gt;CAT evaluation explained&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're building something that needs to assess knowledge&lt;br&gt;
  quickly and accurately, the 25-question ceiling is worth&lt;br&gt;
  understanding before you default to "just make the test longer."&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>testing</category>
      <category>learning</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
