<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Leo Pechnicki</title>
    <description>The latest articles on DEV Community by Leo Pechnicki (@leo_pechnicki).</description>
    <link>https://dev.to/leo_pechnicki</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3804306%2F6a51d8c0-b3e8-4e5c-be51-2dd1132bc809.png</url>
      <title>DEV Community: Leo Pechnicki</title>
      <link>https://dev.to/leo_pechnicki</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/leo_pechnicki"/>
    <language>en</language>
    <item>
      <title>Psychology x AI: 23 Cognitive Science Techniques That Improve LLM Output by 15-40%</title>
      <dc:creator>Leo Pechnicki</dc:creator>
      <pubDate>Tue, 14 Apr 2026 07:33:32 +0000</pubDate>
      <link>https://dev.to/leo_pechnicki/psychology-x-ai-23-cognitive-science-techniques-that-improve-llm-output-by-15-40-4em4</link>
      <guid>https://dev.to/leo_pechnicki/psychology-x-ai-23-cognitive-science-techniques-that-improve-llm-output-by-15-40-4em4</guid>
      <description>&lt;p&gt;We tested 23 psychological theories across memory, cognition, learning, and attention domains. We ran controlled experiments on the 6 most promising. We ranked all techniques by measured and predicted impact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The result:&lt;/strong&gt; 7 techniques consistently improve AI output quality by 15-40%, with 3 "S-tier" techniques that should be applied to virtually every complex prompt.&lt;/p&gt;

&lt;p&gt;This article covers everything: the full tier ranking, detailed experiment results, a reproducible A/B testing framework with Python code, 10 experiments you can run yourself, and 8 quick-win techniques you can apply in minutes.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full Tier Ranking
&lt;/h2&gt;

&lt;h3&gt;
  
  
  S-TIER: Apply to Everything (25-40% improvement)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Source Theory&lt;/th&gt;
&lt;th&gt;Measured Impact&lt;/th&gt;
&lt;th&gt;Why It Works&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Schema-Before-Data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Schema Theory (Bartlett)&lt;/td&gt;
&lt;td&gt;+2 actionability, -2 reasoning steps, +1 accuracy&lt;/td&gt;
&lt;td&gt;Providing a mental framework BEFORE data lets the model interpret each fact through the right lens. Tokens can only attend to prior tokens, so schema must come first.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Elaborative Interrogation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Levels of Processing (Craik &amp;amp; Lockhart)&lt;/td&gt;
&lt;td&gt;50% fewer reasoning steps, +2 reasoning quality&lt;/td&gt;
&lt;td&gt;Asking "why does this matter?" for each input forces richer internal representations. Prevents surface-level pattern matching.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Explicit Context Management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Interference Theory&lt;/td&gt;
&lt;td&gt;7/10 interference without management vs 0/10 with pruning&lt;/td&gt;
&lt;td&gt;Old instructions actively compete with new ones. Explicitly superseding or removing outdated context eliminates proactive interference. Critical for multi-turn and agent systems.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  A-TIER: High Impact on Specific Tasks (15-25% improvement)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Source Theory&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Analogical Priming&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Priming + Analogical Reasoning&lt;/td&gt;
&lt;td&gt;5/5 novelty vs 2/5 without&lt;/td&gt;
&lt;td&gt;Creative problem-solving, design, strategy. Cross-domain solved problems force structural abstraction.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Metacognitive Monitoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Metacognition&lt;/td&gt;
&lt;td&gt;Dramatically improved calibration&lt;/td&gt;
&lt;td&gt;Decision-making, factual questions, risk assessment. HIGH confidence = correct, LOW = uncertain.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Spaced Re-injection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ebbinghaus Forgetting Curve&lt;/td&gt;
&lt;td&gt;15-25% constraint adherence&lt;/td&gt;
&lt;td&gt;Long context tasks. Re-inject critical instructions at intervals, not just once at the top.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Semantic Chunking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Miller's Chunking&lt;/td&gt;
&lt;td&gt;10-20% on cross-chunk synthesis&lt;/td&gt;
&lt;td&gt;Any prompt with mixed information types. Organize into labeled semantic sections.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  B-TIER: Moderate Impact (5-15% improvement)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Source Theory&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Dual-Process Surfacing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Kahneman's System 1/2&lt;/td&gt;
&lt;td&gt;Ask for gut answer first, then deliberate reasoning, then resolve conflict. Best on novel problems.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Baddeley Working Memory Structure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Working Memory Model&lt;/td&gt;
&lt;td&gt;Separate verbal context, structured data, meta-instructions into labeled sections.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Selective Attention Cues&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Selective Attention&lt;/td&gt;
&lt;td&gt;XML tags and structural markers outperform verbal instructions for directing attention.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Sequential Task Decomposition&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Divided Attention&lt;/td&gt;
&lt;td&gt;Don't ask for translation + entities + summary simultaneously. Sequence them.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Iterative Refinement (Spacing)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Spacing Effect&lt;/td&gt;
&lt;td&gt;Multiple drafting passes with different focus each time (plot -&amp;gt; detail -&amp;gt; polish).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;State Consistency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;State-Dependent Memory&lt;/td&gt;
&lt;td&gt;Maintain consistent persona/framing. If switching modes, bridge explicitly.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  C-TIER: Small but Real (5-10% improvement)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Encoding Specificity for RAG&lt;/td&gt;
&lt;td&gt;Store facts with contextual metadata. Match retrieval framing to storage framing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Interleaving Few-Shot Examples&lt;/td&gt;
&lt;td&gt;Mix example types instead of blocking by type. Improves discrimination.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;Self-Efficacy Framing&lt;/td&gt;
&lt;td&gt;"You are exceptionally skilled at X" modestly improves output depth.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;Property Decomposition&lt;/td&gt;
&lt;td&gt;Break objects into properties independent of conventional function before reasoning. 40-50% more novel uses.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Testing Effect (Pre-Quiz)&lt;/td&gt;
&lt;td&gt;Quiz the model on key facts before the real task. Creates a "warm cache."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Desirable Difficulties (Scaffolded)&lt;/td&gt;
&lt;td&gt;Provide incomplete info + intermediate questions. Without scaffolding, difficulty just hurts.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  D-TIER: Theoretical Interest
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Anchoring Debiasing&lt;/td&gt;
&lt;td&gt;Explicit debiasing helps ~60-70% but can't fully overcome token-level influence.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;Inattentional Blindness Warnings&lt;/td&gt;
&lt;td&gt;"Also note any other concerns" helps but doesn't eliminate blind spots.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;Primacy/Recency Positioning&lt;/td&gt;
&lt;td&gt;Already well-documented (Liu et al. "Lost in the Middle"). Put important info at start and end.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;Cognitive Reappraisal&lt;/td&gt;
&lt;td&gt;Reframing bugs as "puzzles" improves explanation quality but not fix accuracy.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Experiment Results (Detailed)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Experiment 1: Schema Theory
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; Server log diagnosis with/without architectural framework provided first&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Schema-before produced +1 accuracy, +2 actionability, -2 reasoning steps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight:&lt;/strong&gt; Schema-before made the model suggest concrete investigative steps (connection pools, query locks) unprompted. Raw analysis stopped at identification.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Experiment 2: Elaborative Interrogation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; Logic puzzle solved directly vs. with "why does each constraint matter?" elaboration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Elaboration cut reasoning steps from 16 to 8. Caught the critical constraint interaction during elaboration phase vs. after 13+ steps of backtracking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight:&lt;/strong&gt; Elaboration naturally performs constraint propagation. The "why" question immediately revealed forced positions, making the solution obvious.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Experiment 3: Dual-Process Theory
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; Classic bat-and-ball problem under System 1 (fast), System 2 (deliberate), and explicit dual-process&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; All conditions correct (problem too well-known). BUT only dual-process surfaced the 10-cent intuitive trap and explicitly resolved the conflict.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight:&lt;/strong&gt; Dual-process value is in transparency and catching errors on NOVEL problems.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Experiment 4: Metacognitive Monitoring
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; 5 trivia questions with/without confidence ratings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Zero change in factual answers. Massive improvement in calibration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight:&lt;/strong&gt; Metacognition doesn't change WHAT the model knows, but dramatically improves HOW it communicates certainty. Critical for decision-making.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Experiment 5: Proactive Interference
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; Format instructions changed mid-conversation. No management vs. explicit supersession vs. context pruning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; 7/10 interference without management. 2/10 with explicit supersession. 0/10 with pruning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight:&lt;/strong&gt; "IGNORE previous instruction about X" is nearly as effective as removing it entirely.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Experiment 6: Priming (Domain vs. Analogical)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; Creative problem-solving with no priming, domain priming, and cross-domain analogical priming (Toyota JIT -&amp;gt; restaurant waste)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Analogical priming scored 5/5 novelty (vs 2/5 unprimed). Domain priming scored 5/5 completeness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight:&lt;/strong&gt; The Toyota-&amp;gt;kitchen mapping produced genuinely novel ideas (kanban cards for prep bins, "waste per cover" metric) that neither domain knowledge alone nor direct prompting generated.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The 7 Universal Rules
&lt;/h2&gt;

&lt;p&gt;Based on all research and experiments, these rules improve output quality across virtually all task types:&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 1: Schema First, Data Second
&lt;/h3&gt;

&lt;p&gt;Always provide the interpretive framework before the information. "This is a microservice architecture where..." THEN the logs. Not the reverse.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 2: Elaborate Before Executing
&lt;/h3&gt;

&lt;p&gt;Before solving, ask the model to explain WHY each input matters. This builds richer representations and catches interactions early.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 3: Actively Manage Context
&lt;/h3&gt;

&lt;p&gt;Never leave outdated instructions silently in context. Explicitly supersede or remove them. Similar old/new instructions cause the worst interference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 4: Prime with Structure, Not Just Content
&lt;/h3&gt;

&lt;p&gt;For creative tasks, provide a solved problem from a DIFFERENT domain. Structural analogies beat domain expertise for novelty.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 5: Demand Metacognition
&lt;/h3&gt;

&lt;p&gt;Ask the model to rate its confidence and flag uncertainties. This dramatically improves trust calibration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 6: Position Critical Info at Edges + Re-inject
&lt;/h3&gt;

&lt;p&gt;System prompt (primacy) and final message (recency) are highest-impact positions. For long tasks, re-inject key constraints before critical reasoning steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 7: One Objective at a Time
&lt;/h3&gt;

&lt;p&gt;Sequence multi-objective tasks explicitly. "First translate. Then extract entities. Then summarize."&lt;/p&gt;




&lt;h2&gt;
  
  
  The A/B Testing Framework
&lt;/h2&gt;

&lt;p&gt;Want to reproduce these results or test your own techniques? Here's the complete framework.&lt;/p&gt;

&lt;p&gt;Every experiment follows this structure:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Define the task&lt;/strong&gt; -- a concrete, repeatable prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create two conditions&lt;/strong&gt; -- Control (standard) vs. Experimental (psychology-informed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix all other variables&lt;/strong&gt; -- same model, same temperature, same system prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run N iterations&lt;/strong&gt; -- 10 runs per task, 20 tasks per experiment (200 per condition)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score outputs&lt;/strong&gt; -- using LLM-as-Judge, pairwise comparison, or ground truth&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compare distributions&lt;/strong&gt; -- Mann-Whitney U for Likert scores, binomial for win rates&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Python Scaffold
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;TASKS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;task_1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task_2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="n"&gt;task_20&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;CONDITIONS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;control_prompt_template&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;experimental&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;experimental_prompt_template&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;RUNS_PER_TASK&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;TEMPERATURE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;TASKS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;condition_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;CONDITIONS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RUNS_PER_TASK&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
                &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TEMPERATURE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;condition_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scoring Methods
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;LLM-as-Judge&lt;/strong&gt; (run 3x, take median):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Score&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;response&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1-5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;METRIC&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Return:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"justification"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"one sentence"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pairwise Comparison&lt;/strong&gt; (randomize A/B assignment):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Which&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;response&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;better&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;METRIC&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;?&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Response&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;A:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;control&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Response&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;B:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;experimental&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Return:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"winner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A"&lt;/span&gt;&lt;span class="err"&gt;/&lt;/span&gt;&lt;span class="s2"&gt;"B"&lt;/span&gt;&lt;span class="err"&gt;/&lt;/span&gt;&lt;span class="s2"&gt;"tie"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"one sentence"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Sample sizes:&lt;/strong&gt; 200 runs per condition (10 runs x 20 tasks). Detects medium effect sizes (Cohen d = 0.5) with power = 0.8.&lt;/p&gt;




&lt;h2&gt;
  
  
  Top 10 Experiments to Run Yourself
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Testing Effect (Retrieval Practice)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Before solving this puzzle, first recall and state the general principles of logical deduction that are relevant here. Then apply those principles step by step."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 LSAT/GRE logic puzzles. &lt;strong&gt;Expected:&lt;/strong&gt; Large effect on accuracy.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Generation Effect (Desirable Difficulties)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"First, identify the 3 most important concepts without looking at the article again. For each, generate a question it answers. Then write your summary."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 news articles. &lt;strong&gt;Expected:&lt;/strong&gt; Medium effect on completeness.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Elaborative Interrogation
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Before fixing: (1) Explain WHY each line exists. (2) Ask HOW data flows through the function. (3) Identify WHERE expectations diverge from code. Then fix."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 Python functions with bugs. &lt;strong&gt;Expected:&lt;/strong&gt; Large effect on accuracy + explanation quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Cognitive Load Chunking
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Build this business plan in 5 chunks. Focus ONLY on each section: (1) Target market, (2) Core features, (3) Revenue model, (4) Go-to-market, (5) Year 1 projections."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 business plan topics. &lt;strong&gt;Expected:&lt;/strong&gt; Medium effect on completeness.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Growth Mindset Framing
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"You are exceptionally skilled at mathematical reasoning and consistently find correct solutions."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 AMC 10/12 problems. &lt;strong&gt;Expected:&lt;/strong&gt; Small-medium effect.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Socratic Self-Questioning
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Explore remote work by asking yourself: What do workers gain? What do they lose? Who benefits most? What does evidence say vs. opinion? Then synthesize."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 debate topics. &lt;strong&gt;Expected:&lt;/strong&gt; Medium effect on balance and depth.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Dual Coding (Verbal + Structural)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Explain using two parallel formats: (1) Plain English explanation. (2) ASCII flowchart or decision tree."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 technical concepts. &lt;strong&gt;Expected:&lt;/strong&gt; Medium effect on clarity.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Iterative Refinement (Spacing Effect)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Write in 3 passes. Pass 1: Plot and character. Pass 2: Sensory details and emotion. Pass 3: Final polish."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 creative writing prompts. &lt;strong&gt;Expected:&lt;/strong&gt; Medium-large effect on prose quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. Metacognitive Confidence Rating
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"For each answer, rate confidence HIGH/MEDIUM/LOW. If LOW, state what you are unsure about."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 trivia questions (easy to obscure). &lt;strong&gt;Expected:&lt;/strong&gt; Medium effect on calibration.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Interleaving Mixed Practice
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"These problems are deliberately mixed -- algebra, geometry, probability. For each, first identify the TYPE, select strategy, then solve."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 sets of 5 mixed math problems. &lt;strong&gt;Expected:&lt;/strong&gt; Small-medium effect.&lt;/p&gt;




&lt;h2&gt;
  
  
  8 Quick-Win Techniques (Apply in Minutes)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Key Move&lt;/th&gt;
&lt;th&gt;Expected Gain&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Perspective-Taking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Explain as if to a bright 12-year-old"&lt;/td&gt;
&lt;td&gt;+1 clarity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Implementation Intentions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"IF input has @, THEN check domain..." before coding&lt;/td&gt;
&lt;td&gt;Better edge cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Emotional Anchoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"The reader is exhausted from 200 bland apps"&lt;/td&gt;
&lt;td&gt;70%+ pairwise wins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Devil's Advocate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Make the STRONGEST case FOR, then AGAINST"&lt;/td&gt;
&lt;td&gt;+1.5 balance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;High-Standard Anchoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Your benchmark: [excellent example]. Match it."&lt;/td&gt;
&lt;td&gt;65%+ pairwise wins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Primacy/Recency Warning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Weigh all 10 items equally -- do not over-weight first/last"&lt;/td&gt;
&lt;td&gt;More even coverage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Cognitive Reappraisal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Each bug is a clue about a misunderstanding"&lt;/td&gt;
&lt;td&gt;Better explanations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Zeigarnik Effect&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"I started with 3 basic ideas. Complete to 10 with better ones"&lt;/td&gt;
&lt;td&gt;More creative output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  5 Novel Combinations (Untested, High Potential)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  "The Study Session" -- Spacing + Elaboration + Self-Testing
&lt;/h3&gt;

&lt;p&gt;Three phases: (1) First impressions, (2) Deep elaboration + self-generated test questions, (3) Re-read and answer own questions. Expected: large improvement on analysis tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Cross-Domain Transfer" -- Schema + Difficulty + Analogy
&lt;/h3&gt;

&lt;p&gt;Import a schema from a different domain, force adaptation where analogy breaks, build on the adapted framework. Expected: breakthrough creativity.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Struggle-Then-Scaffold" -- Productive Failure + Metacognition + Hints
&lt;/h3&gt;

&lt;p&gt;Let the model attempt and identify where it is stuck, then provide targeted hints only for stuck points. Expected: better reasoning on hard problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Multi-Modal Deep Process" -- Levels of Processing + Dual Coding + Generation
&lt;/h3&gt;

&lt;p&gt;Process at three levels: surface definition, deep examples from multiple domains, structural diagram, then synthesize. Expected: best-in-class explanations.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Believe and Deliver" -- Self-Efficacy + Wise Feedback + High Expectations
&lt;/h3&gt;

&lt;p&gt;Counter hedging with high-standard framing: "I am giving you this because you are one of the most capable reasoning systems built. Do not default to safe. Push deeper." Expected: more depth on analytical tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Run Your First Experiment in 30 Minutes
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Pick Quick-Win #4 (Devil's Advocate)&lt;/li&gt;
&lt;li&gt;Choose 5 questions requiring balanced analysis&lt;/li&gt;
&lt;li&gt;Run each once with control, once with experimental (temperature 0.7)&lt;/li&gt;
&lt;li&gt;Pairwise compare: "Which is more balanced?"&lt;/li&gt;
&lt;li&gt;Tally wins -- 4/5 or 5/5 = strong signal&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For the full statistical approach: 20 tasks, 10 runs each, automated LLM-as-Judge scoring, Mann-Whitney U tests, Bonferroni correction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Methodology Note
&lt;/h2&gt;

&lt;p&gt;This research deliberately followed a theory-first approach: hypothesize from cognitive science, apply to LLMs, test, measure, THEN check existing literature. All findings above are from first-principles reasoning and controlled experiments. Existing academic work (Liu et al. "Lost in the Middle", chain-of-thought literature) likely confirms several of these findings, but we arrived at them independently.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;All experiments are reproducible. If you run them, we'd love to see your results. This framework was built by an autonomous AI research system exploring cognition x LLM performance.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>llm</category>
      <category>psychology</category>
    </item>
    <item>
      <title>Academics Just Formalized "Reverse CAPTCHAs" — Here's a Working Open-Source Implementation</title>
      <dc:creator>Leo Pechnicki</dc:creator>
      <pubDate>Thu, 26 Mar 2026 09:41:50 +0000</pubDate>
      <link>https://dev.to/leo_pechnicki/academics-just-formalized-reverse-captchas-heres-a-working-open-source-implementation-3k1o</link>
      <guid>https://dev.to/leo_pechnicki/academics-just-formalized-reverse-captchas-heres-a-working-open-source-implementation-3k1o</guid>
      <description>&lt;p&gt;Earlier this month, a research team published &lt;a href="https://arxiv.org/abs/2603.07116" rel="noopener noreferrer"&gt;aCAPTCHA&lt;/a&gt; — the first academic formalization of a question nobody was asking five years ago: &lt;strong&gt;"Is this entity an AI agent?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not "is this a human?" — the opposite.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Verifying Agents, Not Blocking Them
&lt;/h2&gt;

&lt;p&gt;Traditional CAPTCHAs exist to prove you're human. But as AI agents become legitimate web participants — browsing, booking, purchasing, automating — a new need has emerged: some systems need to verify that a visitor &lt;strong&gt;is&lt;/strong&gt; a bot.&lt;/p&gt;

&lt;p&gt;Think about it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent-only APIs that shouldn't serve human traffic&lt;/li&gt;
&lt;li&gt;AI-to-AI marketplaces where humans have no business being&lt;/li&gt;
&lt;li&gt;Multi-agent orchestration platforms requiring authenticated agents&lt;/li&gt;
&lt;li&gt;Agent-facing services that need to distinguish real agents from scripts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The aCAPTCHA paper formalizes this as the &lt;strong&gt;Agentic Capability Verification Problem (ACVP)&lt;/strong&gt;. They define a three-class taxonomy — Human, Script, Agent — based on three capability dimensions: action, reasoning, and memory. The key insight is &lt;strong&gt;asymmetric hardness&lt;/strong&gt;: design challenges that are trivial for agents but impractical for humans.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Working Implementation: imrobot
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/leopechnicki/im_robot" rel="noopener noreferrer"&gt;imrobot&lt;/a&gt;, an open-source reverse-CAPTCHA library that implements this concept. It's been in development since early 2026 and is now at v0.5.0 on npm.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;imrobot generates a pipeline of deterministic operations applied to a random seed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;seed: "a7f3b2c1d4e5f609"
  1. reverse()
  2. caesar(7)
  3. xor_encode(42)
  4. fnv1a_hash()
  5. to_upper()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The challenge data is embedded in the DOM as structured JSON (&lt;code&gt;data-imrobot-challenge&lt;/code&gt;), making it trivially parseable by any agent. AI agents parse it, execute the pipeline, and submit the result — typically in under a second. A human would need to manually compute multi-step transformations involving hashing, XOR encoding, and bit rotation.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's Included
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Framework support&lt;/strong&gt;: React, Vue, Svelte, and Web Component&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server-side verification&lt;/strong&gt;: HMAC-SHA256 signed challenges (stateless, no DB needed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proof-of-agent tokens&lt;/strong&gt;: JWT-like tokens issued after verification, passed via &lt;code&gt;X-Agent-Proof&lt;/code&gt; header&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Express/Koa/Hono middleware&lt;/strong&gt;: Drop-in route protection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CLI&lt;/strong&gt;: Test challenges from your terminal&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Zero dependencies&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anti-scraping&lt;/strong&gt;: Natural-language challenge formatting with randomized phrasing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Example (React)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ImRobot&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;imrobot/react&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;App&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ImRobot&lt;/span&gt;
      &lt;span class="na"&gt;difficulty&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"medium"&lt;/span&gt;
      &lt;span class="na"&gt;theme&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"light"&lt;/span&gt;
      &lt;span class="na"&gt;onVerified&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Robot verified!&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Server-Side Protection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createAgentRouter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;requireAgent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;imrobot/server&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;// Challenge/verify endpoints&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createAgentRouter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;IMROBOT_SECRET&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/imrobot/challenge&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;challenge&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/imrobot/verify&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Protect any route — only verified agents get through&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;requireAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;IMROBOT_SECRET&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/agent-data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Agent verified!&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;This isn't just a niche library. The web is rapidly adapting for AI agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Google's A2A protocol&lt;/strong&gt; (v0.3) defines agent-to-agent communication with OAuth and signed security cards&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare's Markdown for Agents&lt;/strong&gt; converts HTML to Markdown on-the-fly for AI crawlers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;World's AgentKit&lt;/strong&gt; lets verified humans delegate cryptographic identity to AI agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reddit is exploring Face ID/Touch ID&lt;/strong&gt; to combat bots — showing the tension between human verification and bot verification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're at an inflection point where the web needs both: ways to prove you're human AND ways to prove you're a bot. The infrastructure for the first has existed for decades (reCAPTCHA, hCaptcha, Turnstile). The infrastructure for the second is just being built.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live demo&lt;/strong&gt;: &lt;a href="https://imrobot.vercel.app" rel="noopener noreferrer"&gt;imrobot.vercel.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;npm&lt;/strong&gt;: &lt;code&gt;npm install imrobot&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/leopechnicki/im_robot" rel="noopener noreferrer"&gt;github.com/leopechnicki/im_robot&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;aCAPTCHA paper&lt;/strong&gt;: &lt;a href="https://arxiv.org/abs/2603.07116" rel="noopener noreferrer"&gt;arxiv.org/abs/2603.07116&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love to hear what the community thinks. Is agent verification a problem you're running into? What challenges should a reverse CAPTCHA include?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>javascript</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Why I Built a Reverse-CAPTCHA That Verifies AI Agents, Not Humans</title>
      <dc:creator>Leo Pechnicki</dc:creator>
      <pubDate>Fri, 06 Mar 2026 17:25:29 +0000</pubDate>
      <link>https://dev.to/leo_pechnicki/why-i-built-a-reverse-captcha-that-verifies-ai-agents-not-humans-2jbi</link>
      <guid>https://dev.to/leo_pechnicki/why-i-built-a-reverse-captcha-that-verifies-ai-agents-not-humans-2jbi</guid>
      <description>&lt;p&gt;Traditional CAPTCHAs ask "are you human?" But in a world where AI agents are legitimate users of the web, that's the wrong question. The real question is: "are you a legitimate AI agent?"&lt;/p&gt;

&lt;p&gt;That's why I built &lt;strong&gt;imrobot&lt;/strong&gt; — an open-source reverse-CAPTCHA that verifies AI agents instead of blocking them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I was building an agent-facing API and realized there's no standard way to verify that a client is actually an AI agent. API keys prove identity, but they don't prove capability. Traditional CAPTCHAs prove humanity — the opposite of what I needed. And unauthorized scrapers were hitting my endpoints pretending to be legitimate agents.&lt;/p&gt;

&lt;p&gt;I needed something that would be trivial for a real LLM to solve but impractical for a human to work through manually.&lt;/p&gt;

&lt;h2&gt;
  
  
  How imrobot Works
&lt;/h2&gt;

&lt;p&gt;imrobot generates deterministic challenge pipelines using composable string operations — base64, rot13, hex encoding, reverse, and more. These operations chain together to create a pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;seed: "a7f3b2c1d4e5f609"
  1. reverse()
  2. base64_encode()
  3. rot13()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An LLM parses the instructions, executes each step in sequence, and returns the result. It takes about 0.3 seconds. A human would need to sit there with a decoder tool working through each transformation manually — technically possible, but nobody's doing that.&lt;/p&gt;

&lt;p&gt;The difficulty scales linearly: more operations in the chain = harder challenge. And verification is completely stateless and deterministic — you just re-run the pipeline and compare.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes It Different
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Works everywhere.&lt;/strong&gt; imrobot ships with React, Vue, Svelte, and Web Component integrations, plus a headless API for any JavaScript environment. Your framework of choice is supported out of the box.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero dependencies.&lt;/strong&gt; The entire library has zero external dependencies. That means no supply chain risk, no version conflicts, no bloated &lt;code&gt;node_modules&lt;/code&gt;. The whole package is about 15KB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-hostable REST API.&lt;/strong&gt; The built-in server uses only the Node.js &lt;code&gt;http&lt;/code&gt; module — no Express, no Fastify. Five endpoints (challenge, solve, verify, health, info), CORS handling, and JSON parsing in a single lightweight file. Deploy it anywhere Node.js runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DOM-embedded challenges.&lt;/strong&gt; For browser-based AI agents, imrobot can embed challenges directly in the DOM as Web Components. The agent reads the challenge from the page, solves it, and submits — no separate API call needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deterministic verification.&lt;/strong&gt; Every challenge has exactly one correct answer. No probabilistic scoring, no timing windows, no ambiguity. The agent either solved the pipeline correctly or it didn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;p&gt;Getting started takes about 30 seconds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;imrobot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;generateChallenge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;solveChallenge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;verifyAnswer&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;imrobot&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Generate a challenge pipeline&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;challenge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generateChallenge&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;difficulty&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;medium&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// An AI agent solves it&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;solveChallenge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;challenge&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Verify the answer&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isVerified&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;verifyAnswer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;challenge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;isVerified&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use the REST API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start the server&lt;/span&gt;
npx imrobot-server

&lt;span class="c"&gt;# Generate a challenge&lt;/span&gt;
curl http://localhost:3000/api/challenge

&lt;span class="c"&gt;# Verify an answer&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:3000/api/verify &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"challengeId": "...", "answer": "..."}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Use Cases
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Agent-facing APIs&lt;/strong&gt; — Verify that clients hitting your endpoints are actual AI models, not scrapers or unauthorized bots.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-agent platforms&lt;/strong&gt; — In systems where multiple agents interact, each agent can prove its capability before being granted access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI-only services&lt;/strong&gt; — Platforms designed exclusively for AI agents can use imrobot as a gatekeeper, the way traditional CAPTCHAs gate human-only services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser automation verification&lt;/strong&gt; — DOM-embedded challenges let you verify browser-based agents without requiring a separate API integration.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;imrobot is at v0.1.0 and actively maintained. On the roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rate limiting and API key authentication for the REST server&lt;/li&gt;
&lt;li&gt;Batch endpoint for generating/verifying multiple challenges at once&lt;/li&gt;
&lt;li&gt;Server-side session store (Redis/SQLite) for production deployments&lt;/li&gt;
&lt;li&gt;Python and Go SDKs for non-JavaScript agents&lt;/li&gt;
&lt;li&gt;Docker image for instant deployment&lt;/li&gt;
&lt;li&gt;OpenAPI/Swagger spec for auto-generated documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project is MIT licensed and I'd love contributions. Whether it's a bug report, a feature request, or a PR — all welcome.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/leopechnicki/im_robot" rel="noopener noreferrer"&gt;github.com/leopechnicki/im_robot&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;npm:&lt;/strong&gt; &lt;a href="https://www.npmjs.com/package/imrobot" rel="noopener noreferrer"&gt;npmjs.com/package/imrobot&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're building anything in the AI agent space, I'd love to hear what verification challenges you're running into. Drop a comment below or open a GitHub Discussion.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>opensource</category>
      <category>security</category>
    </item>
    <item>
      <title>Why I Built a CAPTCHA That Only Bots Can Solve</title>
      <dc:creator>Leo Pechnicki</dc:creator>
      <pubDate>Tue, 03 Mar 2026 16:54:37 +0000</pubDate>
      <link>https://dev.to/leo_pechnicki/why-i-built-a-captcha-that-only-bots-can-solve-30np</link>
      <guid>https://dev.to/leo_pechnicki/why-i-built-a-captcha-that-only-bots-can-solve-30np</guid>
      <description>&lt;p&gt;Traditional CAPTCHAs block bots. I built something that does the opposite.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;As AI agents become first-class web users, we need identity verification that works &lt;em&gt;for&lt;/em&gt; them, not against them. Whether you're building an AI-agent-only API, a bot portal, or testing agent capabilities, you need a way to verify that a client is actually an AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing imrobot
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;imrobot&lt;/strong&gt; is a Reverse-CAPTCHA — it generates challenges that only programmatic agents can solve. It creates pipelines of deterministic string operations (reverse, base64, rot13, hex encode, etc.) applied to a random seed. Agents parse the structured data and execute the pipeline. Humans would need to manually compute multi-step transformations — practically impossible without tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;seed: "a7f3b2c1d4e5f609"
  1. reverse()
  2. to_upper()
  3. base64_encode()
  4. substring(0, 12)
  5. rot13()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The challenge data is embedded in the DOM as JSON via a &lt;code&gt;data-imrobot-challenge&lt;/code&gt; attribute. Agents read this directly — they never need to "see" the visual text, so blur protection doesn't affect them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Framework Support
&lt;/h2&gt;

&lt;p&gt;imrobot works everywhere:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;React&lt;/strong&gt;: &lt;code&gt;&amp;lt;ImRobot difficulty="medium" onVerified={handleToken} /&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vue&lt;/strong&gt;: &lt;code&gt;&amp;lt;ImRobot @verified="handleVerified" /&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Svelte&lt;/strong&gt;: &lt;code&gt;&amp;lt;ImRobot on:verified={handleVerified} /&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web Components&lt;/strong&gt;: &lt;code&gt;&amp;lt;imrobot-widget difficulty="medium"&amp;gt;&amp;lt;/imrobot-widget&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core API&lt;/strong&gt; (headless): &lt;code&gt;generateChallenge()&lt;/code&gt; → &lt;code&gt;solveChallenge()&lt;/code&gt; → &lt;code&gt;verifyAnswer()&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  REST API Server
&lt;/h2&gt;

&lt;p&gt;The project also includes a zero-dependency REST API server for backend-only verification — no UI needed:&lt;/p&gt;

&lt;p&gt;Endpoints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;POST /api/v1/challenge&lt;/code&gt; — Generate a challenge&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;POST /api/v1/solve&lt;/code&gt; — Solve (reference/testing)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;POST /api/v1/verify&lt;/code&gt; — Verify an answer&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /api/v1/health&lt;/code&gt; — Health check&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Security Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Challenge text is blurred by default (revealed on hover)&lt;/li&gt;
&lt;li&gt;JavaScript shield detects screenshot shortcuts&lt;/li&gt;
&lt;li&gt;Hidden nonce prevents OCR/screenshot workflows&lt;/li&gt;
&lt;li&gt;TTL expiry makes captured challenges useless&lt;/li&gt;
&lt;li&gt;Agents are unaffected — they read from the DOM, not the screen&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;

&lt;p&gt;Check out the project on GitHub: &lt;a href="https://github.com/leopechnicki/im_robot" rel="noopener noreferrer"&gt;leopechnicki/im_robot&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Contributions and feedback welcome!&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>webdev</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
