<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Leo Pechnicki</title>
    <description>The latest articles on DEV Community by Leo Pechnicki (@leo_pechnicki).</description>
    <link>https://dev.to/leo_pechnicki</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3804306%2F6a51d8c0-b3e8-4e5c-be51-2dd1132bc809.png</url>
      <title>DEV Community: Leo Pechnicki</title>
      <link>https://dev.to/leo_pechnicki</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/leo_pechnicki"/>
    <language>en</language>
    <item>
      <title>The Cryptographic Cliff: Post-Quantum Migration at Scale</title>
      <dc:creator>Leo Pechnicki</dc:creator>
      <pubDate>Fri, 24 Apr 2026 23:51:07 +0000</pubDate>
      <link>https://dev.to/leo_pechnicki/the-cryptographic-cliff-post-quantum-migration-at-scale-3nbo</link>
      <guid>https://dev.to/leo_pechnicki/the-cryptographic-cliff-post-quantum-migration-at-scale-3nbo</guid>
      <description>&lt;h2&gt;
  
  
  The Clock Is Already Running
&lt;/h2&gt;

&lt;p&gt;On August 13, 2024, the U.S. National Institute of Standards and Technology published three finalized post-quantum cryptography (PQC) standards: FIPS 203 (ML-KEM), FIPS 204 (ML-DSA), and FIPS 205 (SLH-DSA). This capped an eight-year standardization process that began in 2016. The standards exist. The algorithms are proven. The migration path is documented.&lt;/p&gt;

&lt;p&gt;So why is almost no one doing it?&lt;/p&gt;

&lt;p&gt;The honest answer is not technical. The standards arrived ahead of institutional capacity, not ahead of institutional need. The enemy is not a missing algorithm — it is a systematic incentive failure compounded by legacy lock-in, regulatory fragmentation, and a threat that is catastrophically non-linear: one day the risk is theoretical, the next day your encrypted archives from 2019 are legible to an adversary. There is no gradual onset. There is no warning shot.&lt;/p&gt;

&lt;p&gt;This article makes the case that the migration window is narrower than it appears, that damage is accumulating &lt;strong&gt;right now&lt;/strong&gt; through Harvest-Now, Decrypt-Later (HNDL) operations, and that the organizations most exposed — large financial institutions, government contractors, critical infrastructure operators — are also the ones least structurally capable of moving fast.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part I: The Quantum Compute Timeline — What We Actually Know
&lt;/h2&gt;

&lt;p&gt;Understanding the threat requires disentangling hype from engineering reality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where the Hardware Stands
&lt;/h3&gt;

&lt;p&gt;Google's Willow chip, announced in a &lt;em&gt;Nature&lt;/em&gt; paper on December 9, 2024, is the most discussed recent milestone. Willow runs on 105 physical qubits and demonstrated exponential error suppression as qubit count scaled — the first time a quantum processor cleared the "below threshold" bar for quantum error correction on a meaningful benchmark. It also performed a synthetic computation in under five minutes that would take a classical supercomputer 10 septillion (10²⁵) years.&lt;/p&gt;

&lt;p&gt;That headline obscures the crucial caveat: Willow is not a cryptographically relevant quantum computer (CRQC). Factoring RSA-2048 using Shor's algorithm requires not just many qubits, but &lt;em&gt;fault-tolerant&lt;/em&gt; logical qubits — a category Willow does not occupy. Google itself has stated that a CRQC remains "years away."&lt;/p&gt;

&lt;p&gt;IBM's roadmap is more structured and arguably more credible as a timeline signal. Their published path targets the &lt;strong&gt;Quantum Starling&lt;/strong&gt; system by 2029: 200 logical qubits capable of executing over 100 million quantum operations. A successor, "Blue Jay," is planned for 2033 at roughly 2,000 logical qubits (~100,000 physical). IBM is also delivering Nighthawk and Loon in 2025 as architectural stepping stones toward quantum error correction using LDPC codes.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Physical Qubit Floor Is Dropping Fast
&lt;/h3&gt;

&lt;p&gt;For years, the canonical estimate to break RSA-2048 was roughly 20 million physical qubits (Gidney &amp;amp; Ekerå, 2021). That number has been revised downward sharply by recent research. A 2025 paper from Google Quantum AI suggests fewer than &lt;strong&gt;one million noisy qubits&lt;/strong&gt; could suffice using more efficient circuit constructions. Another research group, using LDPC codes rather than surface codes, published estimates below &lt;strong&gt;100,000 physical qubits&lt;/strong&gt; — an order-of-magnitude reduction from the 2021 baseline.&lt;/p&gt;

&lt;p&gt;This trajectory matters. The logical qubit count required — roughly 1,400 to 1,730 by current estimates — is stable. What is collapsing is the physical qubit overhead needed to implement those logical qubits reliably. As error correction improves, the hardware threshold for a CRQC falls. The window between "this is theoretical" and "this is urgent" compresses non-linearly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q-Day: Not a Date, a Distribution
&lt;/h3&gt;

&lt;p&gt;Experts almost universally reject claims of a specific Q-Day date. The realistic consensus clusters at: a 5–10% probability of a CRQC by 2030, rising to 50%+ in the 2035–2040 range, with some credible scenarios extending to 2050. But this probability distribution is not symmetric. A single algorithmic breakthrough — equivalent in magnitude to what LDPC codes did to the physical qubit estimate — could compress that distribution toward the near end faster than any institutional migration can respond.&lt;/p&gt;

&lt;p&gt;The NSA's guidance in CNSA 2.0 requires National Security Systems to be fully quantum-resistant by &lt;strong&gt;2035&lt;/strong&gt;. The EU's quantum roadmap mandates that high-risk financial systems complete PQC transition by &lt;strong&gt;2030&lt;/strong&gt;. These are not aspirational targets — they are bureaucratic acknowledgments that the physics is closing in.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part II: The Standards That Exist and What They Actually Do
&lt;/h2&gt;

&lt;h3&gt;
  
  
  FIPS 203 — ML-KEM (Module-Lattice Key Encapsulation Mechanism)
&lt;/h3&gt;

&lt;p&gt;ML-KEM, derived from CRYSTALS-KYBER, is the primary replacement for RSA and Diffie-Hellman in key exchange. It operates on module lattice problems — specifically the Module Learning With Errors (MLWE) hardness assumption. Security levels map to ML-KEM-512 (~AES-128), ML-KEM-768 (~AES-192), and ML-KEM-1024 (~AES-256).&lt;/p&gt;

&lt;p&gt;ML-KEM is already shipping in production. Chrome 131 (November 2024) switched from the experimental Kyber draft to the finalized ML-KEM, deploying the hybrid X25519MLKEM768 key exchange by default across Chrome's global user base. Cloudflare reported that by March 2025, over a third of human HTTPS traffic on its network used hybrid post-quantum handshakes. This is not a pilot — it is mass deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  FIPS 204 — ML-DSA (Module-Lattice Digital Signature)
&lt;/h3&gt;

&lt;p&gt;ML-DSA, derived from CRYSTALS-Dilithium, replaces RSA and ECDSA for digital signatures. It is the algorithm most critical for code signing, certificate issuance, and authentication workflows. Key and signature sizes are larger than classical alternatives: ML-DSA-65 (the ~128-bit security variant) produces 3,293-byte public keys and 2,420-byte signatures, versus ECDSA P-256's 64-byte signatures. This size increase is not trivial in constrained environments.&lt;/p&gt;

&lt;h3&gt;
  
  
  FIPS 205 — SLH-DSA (Stateless Hash-Based Digital Signature)
&lt;/h3&gt;

&lt;p&gt;SLH-DSA, derived from SPHINCS+, is the conservative backup signature scheme. Its security rests entirely on hash function security — no new mathematical assumptions. Trade-off: significantly larger signatures (7,856 bytes at SL-1) and slower signing. SLH-DSA is appropriate where conservative security assumptions are paramount (e.g., root CAs, firmware signing).&lt;/p&gt;

&lt;h3&gt;
  
  
  FIPS 206 — FN-DSA (coming)
&lt;/h3&gt;

&lt;p&gt;FALCON, now being standardized as FN-DSA in FIPS 206, offers significantly smaller signatures than ML-DSA (666 bytes at Level 1) making it attractive for IoT and constrained hardware, at the cost of implementation complexity and sampler-timing attack risk.&lt;/p&gt;

&lt;p&gt;NIST additionally selected &lt;strong&gt;HQC&lt;/strong&gt; as a backup KEM for standardization in March 2025 — a code-based alternative providing algorithmic diversity should lattice problems be broken.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part III: Engineering Reality — What Migration Actually Looks Like
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Hidden Scale of Cryptographic Surface Area
&lt;/h3&gt;

&lt;p&gt;The first obstacle any organization faces is discovery. Almost universally, enterprises find &lt;strong&gt;3–5× more cryptographic assets&lt;/strong&gt; than they estimated when they begin formal inventory. TLS certificates in load balancers, embedded key pairs in IoT firmware, HSM-pinned RSA keys in payment terminals, hardcoded algorithm identifiers in COBOL batch processes — these are not tracked in any CMDB, and they do not break audibly when they fail.&lt;/p&gt;

&lt;p&gt;The U.S. government's own July 2024 report estimated the total federal migration cost at &lt;strong&gt;$7.1 billion&lt;/strong&gt; over ten years (in 2024 dollars). Private-sector migration at aggregate scale is expected to be considerably higher, and unlike federal agencies, enterprises face no statutory mandate with real enforcement teeth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Crypto-Agility: The Concept Organizations Claim to Have But Don't
&lt;/h3&gt;

&lt;p&gt;Crypto-agility — the capacity to swap cryptographic algorithms across a system without rebuilding core infrastructure — is universally acknowledged as the correct architectural posture. It is almost universally absent in production systems.&lt;/p&gt;

&lt;p&gt;Legacy TLS stacks, particularly pre-TLS 1.3 deployments, hardcode algorithm identifiers at the cipher suite level. HSM firmware must be updated or replaced to support new key types. PKI trust chains are built on certificate templates that encode specific algorithm parameters. Payment terminals running TLS 1.2 against pinned leaf certificates do not gracefully negotiate ML-KEM key exchange. The remediation path for these systems is not a config change — it is a hardware refresh cycle that takes 3–5 years minimum.&lt;/p&gt;

&lt;p&gt;The NIST NCCoE has published detailed PQC migration practice guides specifically addressing these bottlenecks, but guides do not move legacy firmware.&lt;/p&gt;

&lt;h3&gt;
  
  
  The TLS Handshake Migration Problem
&lt;/h3&gt;

&lt;p&gt;The concrete engineering challenge for TLS is well-understood. A TLS 1.3 handshake with ML-KEM-768+X25519 (hybrid mode) increases the initial ClientHello flight significantly — the ML-KEM public key alone is 1,184 bytes versus 32 bytes for X25519. In environments with strict MTU constraints, fragmentation behavior changes. Load balancers that terminate TLS must understand the new algorithm identifiers; those that don't will either fail closed (breaking connections) or fail open (falling back to classical crypto, defeating the purpose).&lt;/p&gt;

&lt;p&gt;The hybrid approach — running classical and post-quantum algorithms in parallel, deriving shared secrets from both — is the safe migration path because it maintains classical security guarantees while adding quantum resistance. AWS, Cloudflare, and Google Cloud all support hybrid PQC TLS in 2025. The enterprise middleware between those cloud edges and internal applications frequently does not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Timeline Reality Check
&lt;/h3&gt;

&lt;p&gt;Migration timelines by organization size:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small enterprises: &lt;strong&gt;5–7 years&lt;/strong&gt; for complete PQC migration&lt;/li&gt;
&lt;li&gt;Medium enterprises: &lt;strong&gt;8–12 years&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Large enterprises (banks, utilities, government contractors): &lt;strong&gt;12–15+ years&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If large enterprises need 12–15 years and NIST standards were finalized in August 2024, the math is unflinching: organizations that started in 2024 may not complete before 2037–2039. The EU mandates financial sector PQC completion by 2030. The U.S. mandates NSS completion by 2035. The timelines and the institutional capacity are structurally misaligned.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part IV: The Threat That Won't Wait — HNDL Operations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Harvest Is Already Underway
&lt;/h3&gt;

&lt;p&gt;Harvest-Now, Decrypt-Later is not a hypothetical future attack — it is a present-tense operation. The strategy is straightforward: intercept and store encrypted traffic today; decrypt it when quantum capability arrives. Nation-state actors do not need a CRQC to begin the collection phase. They need only storage and access.&lt;/p&gt;

&lt;p&gt;The U.S. DHS, UK NCSC, EUISA, and Australian Cyber Security Centre have all published guidance explicitly premised on the assumption that adversaries are &lt;strong&gt;currently&lt;/strong&gt; exfiltrating and archiving sensitive, long-lived encrypted data. This is not a theoretical warning — it is a statement of operational intelligence consensus.&lt;/p&gt;

&lt;p&gt;The data most at risk is not what is encrypted today with weak algorithms. It is data that has a &lt;strong&gt;long confidentiality shelf life&lt;/strong&gt;: diplomatic cables, trade negotiations, weapons systems documentation, proprietary financial algorithms, patient health records, and merger &amp;amp; acquisition communications. The Federal Reserve has published direct research on HNDL risk to distributed ledger networks. This is financial infrastructure research, not academic speculation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why HNDL Breaks the Standard Threat Model
&lt;/h3&gt;

&lt;p&gt;Traditional cryptographic threat models assume that an adversary must compromise the system at the time of the data's sensitivity. HNDL invalidates this temporal boundary. Data encrypted in 2020 with RSA-2048 and classified confidential for 20 years is now under threat of decryption by 2030–2035. The confidentiality window and the quantum compute timeline overlap.&lt;/p&gt;

&lt;p&gt;The organizations most exposed are not those with weak current security posture. They are those that produce data with long confidentiality requirements and have not yet migrated their encryption stacks. In other words: governments, financial institutions, defense contractors, and healthcare systems. Precisely the organizations with the longest migration timelines.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part V: Financial Sector Exposure — The Liability Surface
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Payment Rails and Settlement Infrastructure
&lt;/h3&gt;

&lt;p&gt;SWIFT processes over $5 trillion in daily flows. SWIFT's Customer Security Programme has begun incorporating PQC readiness guidance, but its mandate covers security baselines for member institutions, not the protocol itself. SWIFT messaging uses AES-256 for symmetric encryption (quantum-resistant) but RSA/ECC for key establishment and digital signatures. The certificate and signing infrastructure underpinning financial messaging is the attack surface.&lt;/p&gt;

&lt;p&gt;Central bank RTGS systems — Fedwire, TARGET2, CHAPS — face similar exposure. A retroactive decryption of even a single day of settlement records represents catastrophic liability for any institution whose trades become readable to competitors or regulators.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Asymmetric Liability Structure
&lt;/h3&gt;

&lt;p&gt;There is no financial incentive for early movers. A bank that spends $400M migrating its cryptographic infrastructure to PQC today gets no competitive advantage because its counterparties are not yet quantum-resistant either. The HNDL attack captures traffic in transit; a unilaterally quantum-resistant sender still exposes plaintext if their receiving counterparty uses a quantum-vulnerable server hello.&lt;/p&gt;

&lt;p&gt;Migration therefore has &lt;strong&gt;positive externalities&lt;/strong&gt; that the migrating institution cannot capture. This is the classic underinvestment trap for public goods — and it will persist until regulation creates mandatory timelines with real liability exposure or material insurance consequences.&lt;/p&gt;

&lt;h3&gt;
  
  
  Regulatory Fragmentation Makes It Worse
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;U.S. NSM-10 (2022):&lt;/strong&gt; Mandates federal agencies to complete PQC migration by 2035. Does not directly bind private financial institutions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;U.S. CNSA 2.0:&lt;/strong&gt; Mandates NSS migration. Defense contractors covered; commercial banks, not explicitly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EU PQC Roadmap:&lt;/strong&gt; Critical financial systems by 2030. Binding for EU member states, unclear cross-border enforcement for global banks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PCI DSS v4.0:&lt;/strong&gt; Effective March 2025. Does not yet mandate PQC specifically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SWIFT CSP:&lt;/strong&gt; Guidance only; no enforcement mechanism for PQC.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A global bank faces five regulatory frameworks with zero consistent PQC mandates between them. The absence of mandate becomes the rationale for deferral.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part VI: The Policy and Workforce Gap
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Skills Deficit
&lt;/h3&gt;

&lt;p&gt;Post-quantum cryptography is a specialized subdiscipline. Implementing ML-KEM correctly — particularly avoiding timing side-channels in the number-theoretic transform operations — requires expertise that most enterprise security teams do not have and cannot hire quickly. The workforce to do this at scale does not exist in sufficient quantity.&lt;/p&gt;

&lt;h3&gt;
  
  
  The NSM-10 Compliance Machine
&lt;/h3&gt;

&lt;p&gt;NSM-10 (May 2022) and OMB M-23-02 (November 2022) established mandatory cryptographic inventory requirements for federal civilian agencies. The trajectory: TLS 1.3 required on federal systems by January 2030; quantum-vulnerable algorithms deprecated for &amp;lt;112-bit security by 2031; all quantum-vulnerable algorithms disallowed by 2035.&lt;/p&gt;

&lt;p&gt;Federal contractors serving agencies must also migrate. The supply chain effect is one of the few real forcing functions for private sector migration in the U.S. context.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Actually Creates Urgency
&lt;/h3&gt;

&lt;p&gt;The early movers are not waiting for regulators. JPMorgan Chase, HSBC, and Mastercard have all publicly acknowledged active PQC programs as of 2024–2025. These organizations have concluded — correctly — that their HNDL exposure window is already open.&lt;/p&gt;

&lt;p&gt;Everyone else is waiting.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Cliff, Not the Slope
&lt;/h2&gt;

&lt;p&gt;The migration isn't difficult because quantum computers are coming. It's difficult because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The data being harvested today won't wait.&lt;/strong&gt; HNDL operations archive ciphertext that will outlive current institutional planning cycles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Migration timelines exceed the threat window.&lt;/strong&gt; Large enterprises need 12–15 years; the CRQC probability mass concentrates in the 2030–2040 window.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incentive structures favor inaction.&lt;/strong&gt; No single institution benefits enough from unilateral migration without counterparty pressure or regulatory mandate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discovery is the hardest step.&lt;/strong&gt; You cannot migrate what you haven't inventoried.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Regulation is fragmented.&lt;/strong&gt; The absence of a consistent global mandate for financial institutions is a policy failure with compounding consequences.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;NIST did its job. The window is not closing because of a missing algorithm. It is closing because organizations treating PQC migration as a five-year infrastructure program are still treating it as a two-year planning exercise that starts next quarter.&lt;/p&gt;

&lt;p&gt;The cryptographic cliff is not ahead of us. We are standing at its edge. The harvest is in progress.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick-Reference: Migration Decision Framework
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;High Urgency&lt;/th&gt;
&lt;th&gt;Moderate&lt;/th&gt;
&lt;th&gt;Low Urgency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data shelf life&lt;/td&gt;
&lt;td&gt;&amp;gt;10 years&lt;/td&gt;
&lt;td&gt;5–10 years&lt;/td&gt;
&lt;td&gt;&amp;lt;5 years&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regulatory jurisdiction&lt;/td&gt;
&lt;td&gt;NSS / EU critical&lt;/td&gt;
&lt;td&gt;U.S. federal&lt;/td&gt;
&lt;td&gt;Unregulated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System scale&lt;/td&gt;
&lt;td&gt;Large enterprise&lt;/td&gt;
&lt;td&gt;Mid-market&lt;/td&gt;
&lt;td&gt;Small org&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Current crypto stack&lt;/td&gt;
&lt;td&gt;RSA/ECC ubiquitous&lt;/td&gt;
&lt;td&gt;Mixed&lt;/td&gt;
&lt;td&gt;Already hybrid&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HNDL exposure&lt;/td&gt;
&lt;td&gt;High-value traffic&lt;/td&gt;
&lt;td&gt;Standard commercial&lt;/td&gt;
&lt;td&gt;Low-value&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;First steps:&lt;/strong&gt; Run a cryptographic asset discovery scan using CISA's recommended inventory tooling. Prioritize systems with RSA/ECC key exchange handling data with &amp;gt;5-year confidentiality requirements. Begin hybrid TLS deployment (X25519+ML-KEM) on public-facing endpoints — this costs almost nothing and removes a significant portion of your HNDL exposure immediately.&lt;/p&gt;

&lt;p&gt;The standards are ready. The clock is running.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Sources: NIST FIPS 203/204/205 (August 2024); NSM-10 (May 2022); OMB M-23-02 (November 2022); Gidney &amp;amp; Ekerå (2021); Google Quantum AI Willow (December 2024); CISA/NSA/NIST Joint Guidance on PQC Migration (2023); Federal Reserve FEDS Note on HNDL; Mastercard PQC White Paper (2025); EU PQC Roadmap; IBM Quantum Roadmap 2025–2029.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>security</category>
      <category>quantum</category>
      <category>cryptography</category>
      <category>infosec</category>
    </item>
    <item>
      <title>endpoint-tester: Auto-Discover API Endpoints &amp; Generate Tests</title>
      <dc:creator>Leo Pechnicki</dc:creator>
      <pubDate>Thu, 16 Apr 2026 08:50:17 +0000</pubDate>
      <link>https://dev.to/leo_pechnicki/endpoint-tester-auto-discover-api-endpoints-generate-tests-3d5j</link>
      <guid>https://dev.to/leo_pechnicki/endpoint-tester-auto-discover-api-endpoints-generate-tests-3d5j</guid>
      <description>&lt;p&gt;Ever spent time manually writing API tests for every single endpoint in your project? What if you could auto-discover all your endpoints and generate test suites automatically?&lt;/p&gt;

&lt;p&gt;That's exactly what &lt;strong&gt;endpoint-tester&lt;/strong&gt; does.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is endpoint-tester?
&lt;/h2&gt;

&lt;p&gt;It's an open-source CLI tool and library that scans your application source code, discovers API endpoints, and generates comprehensive test suites — all from a single command.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; endpoint-tester
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Scan your project for endpoints
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;endpoint-tester scan ./src &lt;span class="nt"&gt;--framework&lt;/span&gt; express
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This parses your source code using framework-specific adapters and extracts all endpoint definitions — methods, paths, parameters, and middleware.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Generate tests automatically
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;endpoint-tester generate ./src &lt;span class="nt"&gt;--framework&lt;/span&gt; express &lt;span class="nt"&gt;--format&lt;/span&gt; vitest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Choose your preferred test format: &lt;strong&gt;Vitest&lt;/strong&gt;, &lt;strong&gt;Jest&lt;/strong&gt;, or &lt;strong&gt;Pytest&lt;/strong&gt;. The tool generates ready-to-run test files with proper assertions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supported Frameworks
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Express.js&lt;/strong&gt; — fully implemented. Detects &lt;code&gt;app.get()&lt;/code&gt;, &lt;code&gt;router.post()&lt;/code&gt;, route params, nested routers, all HTTP methods&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FastAPI&lt;/strong&gt; — adapter scaffolded, coming soon&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spring Boot&lt;/strong&gt; — adapter scaffolded, coming soon&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Programmatic API
&lt;/h2&gt;

&lt;p&gt;You can also use it as a library in your own tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Scanner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;TestGenerator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ExpressAdapter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;endpoint-tester&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;scanner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Scanner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ExpressAdapter&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;endpoints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;scanner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scan&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;directory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./src&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;framework&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;generator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TestGenerator&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;generator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;endpoints&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;./tests&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;vitest&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Built With
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TypeScript&lt;/strong&gt; with strict mode&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vitest&lt;/strong&gt; for testing (42 tests passing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commander&lt;/strong&gt; for CLI&lt;/li&gt;
&lt;li&gt;CI/CD with &lt;strong&gt;GitHub Actions&lt;/strong&gt; (tests on Node 20 &amp;amp; 22)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;Writing boilerplate API tests is tedious. Every new route means another test file, another set of assertions. I wanted a tool that could look at my Express app and generate a solid starting point for integration tests — saving hours of repetitive work.&lt;/p&gt;

&lt;p&gt;The adapter pattern makes it easy to extend. Adding FastAPI or Spring Boot support is just a matter of writing a new adapter that implements the &lt;code&gt;FrameworkAdapter&lt;/code&gt; interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Out
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx endpoint-tester scan ./src &lt;span class="nt"&gt;--framework&lt;/span&gt; express
npx endpoint-tester generate ./src &lt;span class="nt"&gt;--format&lt;/span&gt; jest &lt;span class="nt"&gt;--base-url&lt;/span&gt; http://localhost:8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check the repo: &lt;a href="https://github.com/leopechnicki/endpoint-tester" rel="noopener noreferrer"&gt;github.com/leopechnicki/endpoint-tester&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Install from npm: &lt;code&gt;npm install -g endpoint-tester&lt;/code&gt;&lt;/p&gt;




&lt;p&gt;Contributions welcome! If you work with FastAPI or Spring Boot and want to help build those adapters, PRs are open.&lt;/p&gt;

&lt;p&gt;What framework would you most want supported next? Let me know in the comments!&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>node</category>
    </item>
    <item>
      <title>Psychology x AI: 23 Cognitive Science Techniques That Improve LLM Output by 15-40%</title>
      <dc:creator>Leo Pechnicki</dc:creator>
      <pubDate>Tue, 14 Apr 2026 07:33:32 +0000</pubDate>
      <link>https://dev.to/leo_pechnicki/psychology-x-ai-23-cognitive-science-techniques-that-improve-llm-output-by-15-40-4em4</link>
      <guid>https://dev.to/leo_pechnicki/psychology-x-ai-23-cognitive-science-techniques-that-improve-llm-output-by-15-40-4em4</guid>
      <description>&lt;p&gt;We tested 23 psychological theories across memory, cognition, learning, and attention domains. We ran controlled experiments on the 6 most promising. We ranked all techniques by measured and predicted impact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The result:&lt;/strong&gt; 7 techniques consistently improve AI output quality by 15-40%, with 3 "S-tier" techniques that should be applied to virtually every complex prompt.&lt;/p&gt;

&lt;p&gt;This article covers everything: the full tier ranking, detailed experiment results, a reproducible A/B testing framework with Python code, 10 experiments you can run yourself, and 8 quick-win techniques you can apply in minutes.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full Tier Ranking
&lt;/h2&gt;

&lt;h3&gt;
  
  
  S-TIER: Apply to Everything (25-40% improvement)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Source Theory&lt;/th&gt;
&lt;th&gt;Measured Impact&lt;/th&gt;
&lt;th&gt;Why It Works&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Schema-Before-Data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Schema Theory (Bartlett)&lt;/td&gt;
&lt;td&gt;+2 actionability, -2 reasoning steps, +1 accuracy&lt;/td&gt;
&lt;td&gt;Providing a mental framework BEFORE data lets the model interpret each fact through the right lens. Tokens can only attend to prior tokens, so schema must come first.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Elaborative Interrogation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Levels of Processing (Craik &amp;amp; Lockhart)&lt;/td&gt;
&lt;td&gt;50% fewer reasoning steps, +2 reasoning quality&lt;/td&gt;
&lt;td&gt;Asking "why does this matter?" for each input forces richer internal representations. Prevents surface-level pattern matching.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Explicit Context Management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Interference Theory&lt;/td&gt;
&lt;td&gt;7/10 interference without management vs 0/10 with pruning&lt;/td&gt;
&lt;td&gt;Old instructions actively compete with new ones. Explicitly superseding or removing outdated context eliminates proactive interference. Critical for multi-turn and agent systems.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  A-TIER: High Impact on Specific Tasks (15-25% improvement)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Source Theory&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Analogical Priming&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Priming + Analogical Reasoning&lt;/td&gt;
&lt;td&gt;5/5 novelty vs 2/5 without&lt;/td&gt;
&lt;td&gt;Creative problem-solving, design, strategy. Cross-domain solved problems force structural abstraction.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Metacognitive Monitoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Metacognition&lt;/td&gt;
&lt;td&gt;Dramatically improved calibration&lt;/td&gt;
&lt;td&gt;Decision-making, factual questions, risk assessment. HIGH confidence = correct, LOW = uncertain.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Spaced Re-injection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ebbinghaus Forgetting Curve&lt;/td&gt;
&lt;td&gt;15-25% constraint adherence&lt;/td&gt;
&lt;td&gt;Long context tasks. Re-inject critical instructions at intervals, not just once at the top.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Semantic Chunking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Miller's Chunking&lt;/td&gt;
&lt;td&gt;10-20% on cross-chunk synthesis&lt;/td&gt;
&lt;td&gt;Any prompt with mixed information types. Organize into labeled semantic sections.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  B-TIER: Moderate Impact (5-15% improvement)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Source Theory&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Dual-Process Surfacing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Kahneman's System 1/2&lt;/td&gt;
&lt;td&gt;Ask for gut answer first, then deliberate reasoning, then resolve conflict. Best on novel problems.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Baddeley Working Memory Structure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Working Memory Model&lt;/td&gt;
&lt;td&gt;Separate verbal context, structured data, meta-instructions into labeled sections.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Selective Attention Cues&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Selective Attention&lt;/td&gt;
&lt;td&gt;XML tags and structural markers outperform verbal instructions for directing attention.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Sequential Task Decomposition&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Divided Attention&lt;/td&gt;
&lt;td&gt;Don't ask for translation + entities + summary simultaneously. Sequence them.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Iterative Refinement (Spacing)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Spacing Effect&lt;/td&gt;
&lt;td&gt;Multiple drafting passes with different focus each time (plot -&amp;gt; detail -&amp;gt; polish).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;State Consistency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;State-Dependent Memory&lt;/td&gt;
&lt;td&gt;Maintain consistent persona/framing. If switching modes, bridge explicitly.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  C-TIER: Small but Real (5-10% improvement)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Encoding Specificity for RAG&lt;/td&gt;
&lt;td&gt;Store facts with contextual metadata. Match retrieval framing to storage framing.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Interleaving Few-Shot Examples&lt;/td&gt;
&lt;td&gt;Mix example types instead of blocking by type. Improves discrimination.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;Self-Efficacy Framing&lt;/td&gt;
&lt;td&gt;"You are exceptionally skilled at X" modestly improves output depth.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;Property Decomposition&lt;/td&gt;
&lt;td&gt;Break objects into properties independent of conventional function before reasoning. 40-50% more novel uses.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Testing Effect (Pre-Quiz)&lt;/td&gt;
&lt;td&gt;Quiz the model on key facts before the real task. Creates a "warm cache."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;Desirable Difficulties (Scaffolded)&lt;/td&gt;
&lt;td&gt;Provide incomplete info + intermediate questions. Without scaffolding, difficulty just hurts.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  D-TIER: Theoretical Interest
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;Anchoring Debiasing&lt;/td&gt;
&lt;td&gt;Explicit debiasing helps ~60-70% but can't fully overcome token-level influence.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;Inattentional Blindness Warnings&lt;/td&gt;
&lt;td&gt;"Also note any other concerns" helps but doesn't eliminate blind spots.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;Primacy/Recency Positioning&lt;/td&gt;
&lt;td&gt;Already well-documented (Liu et al. "Lost in the Middle"). Put important info at start and end.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;Cognitive Reappraisal&lt;/td&gt;
&lt;td&gt;Reframing bugs as "puzzles" improves explanation quality but not fix accuracy.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Experiment Results (Detailed)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Experiment 1: Schema Theory
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; Server log diagnosis with/without architectural framework provided first&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Schema-before produced +1 accuracy, +2 actionability, -2 reasoning steps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight:&lt;/strong&gt; Schema-before made the model suggest concrete investigative steps (connection pools, query locks) unprompted. Raw analysis stopped at identification.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Experiment 2: Elaborative Interrogation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; Logic puzzle solved directly vs. with "why does each constraint matter?" elaboration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Elaboration cut reasoning steps from 16 to 8. Caught the critical constraint interaction during elaboration phase vs. after 13+ steps of backtracking.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight:&lt;/strong&gt; Elaboration naturally performs constraint propagation. The "why" question immediately revealed forced positions, making the solution obvious.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Experiment 3: Dual-Process Theory
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; Classic bat-and-ball problem under System 1 (fast), System 2 (deliberate), and explicit dual-process&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; All conditions correct (problem too well-known). BUT only dual-process surfaced the 10-cent intuitive trap and explicitly resolved the conflict.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight:&lt;/strong&gt; Dual-process value is in transparency and catching errors on NOVEL problems.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Experiment 4: Metacognitive Monitoring
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; 5 trivia questions with/without confidence ratings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Zero change in factual answers. Massive improvement in calibration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight:&lt;/strong&gt; Metacognition doesn't change WHAT the model knows, but dramatically improves HOW it communicates certainty. Critical for decision-making.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Experiment 5: Proactive Interference
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; Format instructions changed mid-conversation. No management vs. explicit supersession vs. context pruning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; 7/10 interference without management. 2/10 with explicit supersession. 0/10 with pruning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight:&lt;/strong&gt; "IGNORE previous instruction about X" is nearly as effective as removing it entirely.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Experiment 6: Priming (Domain vs. Analogical)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup:&lt;/strong&gt; Creative problem-solving with no priming, domain priming, and cross-domain analogical priming (Toyota JIT -&amp;gt; restaurant waste)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result:&lt;/strong&gt; Analogical priming scored 5/5 novelty (vs 2/5 unprimed). Domain priming scored 5/5 completeness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Key insight:&lt;/strong&gt; The Toyota-&amp;gt;kitchen mapping produced genuinely novel ideas (kanban cards for prep bins, "waste per cover" metric) that neither domain knowledge alone nor direct prompting generated.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The 7 Universal Rules
&lt;/h2&gt;

&lt;p&gt;Based on all research and experiments, these rules improve output quality across virtually all task types:&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 1: Schema First, Data Second
&lt;/h3&gt;

&lt;p&gt;Always provide the interpretive framework before the information. "This is a microservice architecture where..." THEN the logs. Not the reverse.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 2: Elaborate Before Executing
&lt;/h3&gt;

&lt;p&gt;Before solving, ask the model to explain WHY each input matters. This builds richer representations and catches interactions early.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 3: Actively Manage Context
&lt;/h3&gt;

&lt;p&gt;Never leave outdated instructions silently in context. Explicitly supersede or remove them. Similar old/new instructions cause the worst interference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 4: Prime with Structure, Not Just Content
&lt;/h3&gt;

&lt;p&gt;For creative tasks, provide a solved problem from a DIFFERENT domain. Structural analogies beat domain expertise for novelty.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 5: Demand Metacognition
&lt;/h3&gt;

&lt;p&gt;Ask the model to rate its confidence and flag uncertainties. This dramatically improves trust calibration.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 6: Position Critical Info at Edges + Re-inject
&lt;/h3&gt;

&lt;p&gt;System prompt (primacy) and final message (recency) are highest-impact positions. For long tasks, re-inject key constraints before critical reasoning steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 7: One Objective at a Time
&lt;/h3&gt;

&lt;p&gt;Sequence multi-objective tasks explicitly. "First translate. Then extract entities. Then summarize."&lt;/p&gt;




&lt;h2&gt;
  
  
  The A/B Testing Framework
&lt;/h2&gt;

&lt;p&gt;Want to reproduce these results or test your own techniques? Here's the complete framework.&lt;/p&gt;

&lt;p&gt;Every experiment follows this structure:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Define the task&lt;/strong&gt; -- a concrete, repeatable prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create two conditions&lt;/strong&gt; -- Control (standard) vs. Experimental (psychology-informed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix all other variables&lt;/strong&gt; -- same model, same temperature, same system prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run N iterations&lt;/strong&gt; -- 10 runs per task, 20 tasks per experiment (200 per condition)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Score outputs&lt;/strong&gt; -- using LLM-as-Judge, pairwise comparison, or ground truth&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compare distributions&lt;/strong&gt; -- Mann-Whitney U for Likert scores, binomial for win rates&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Python Scaffold
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;TASKS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;task_1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task_2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="n"&gt;task_20&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;CONDITIONS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;control&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;control_prompt_template&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;experimental&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;experimental_prompt_template&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;RUNS_PER_TASK&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;TEMPERATURE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;TASKS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;condition_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;CONDITIONS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RUNS_PER_TASK&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
                &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TEMPERATURE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;condition_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scoring Methods
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;LLM-as-Judge&lt;/strong&gt; (run 3x, take median):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Score&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;response&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1-5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;METRIC&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;anchor&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Return:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;N&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"justification"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"one sentence"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Pairwise Comparison&lt;/strong&gt; (randomize A/B assignment):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Which&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;response&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;better&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;METRIC&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="err"&gt;?&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Response&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;A:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;control&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Response&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;B:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="err"&gt;experimental&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Return:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"winner"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"A"&lt;/span&gt;&lt;span class="err"&gt;/&lt;/span&gt;&lt;span class="s2"&gt;"B"&lt;/span&gt;&lt;span class="err"&gt;/&lt;/span&gt;&lt;span class="s2"&gt;"tie"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"one sentence"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Sample sizes:&lt;/strong&gt; 200 runs per condition (10 runs x 20 tasks). Detects medium effect sizes (Cohen d = 0.5) with power = 0.8.&lt;/p&gt;




&lt;h2&gt;
  
  
  Top 10 Experiments to Run Yourself
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Testing Effect (Retrieval Practice)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Before solving this puzzle, first recall and state the general principles of logical deduction that are relevant here. Then apply those principles step by step."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 LSAT/GRE logic puzzles. &lt;strong&gt;Expected:&lt;/strong&gt; Large effect on accuracy.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Generation Effect (Desirable Difficulties)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"First, identify the 3 most important concepts without looking at the article again. For each, generate a question it answers. Then write your summary."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 news articles. &lt;strong&gt;Expected:&lt;/strong&gt; Medium effect on completeness.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Elaborative Interrogation
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Before fixing: (1) Explain WHY each line exists. (2) Ask HOW data flows through the function. (3) Identify WHERE expectations diverge from code. Then fix."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 Python functions with bugs. &lt;strong&gt;Expected:&lt;/strong&gt; Large effect on accuracy + explanation quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Cognitive Load Chunking
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Build this business plan in 5 chunks. Focus ONLY on each section: (1) Target market, (2) Core features, (3) Revenue model, (4) Go-to-market, (5) Year 1 projections."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 business plan topics. &lt;strong&gt;Expected:&lt;/strong&gt; Medium effect on completeness.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Growth Mindset Framing
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"You are exceptionally skilled at mathematical reasoning and consistently find correct solutions."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 AMC 10/12 problems. &lt;strong&gt;Expected:&lt;/strong&gt; Small-medium effect.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Socratic Self-Questioning
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Explore remote work by asking yourself: What do workers gain? What do they lose? Who benefits most? What does evidence say vs. opinion? Then synthesize."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 debate topics. &lt;strong&gt;Expected:&lt;/strong&gt; Medium effect on balance and depth.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Dual Coding (Verbal + Structural)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Explain using two parallel formats: (1) Plain English explanation. (2) ASCII flowchart or decision tree."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 technical concepts. &lt;strong&gt;Expected:&lt;/strong&gt; Medium effect on clarity.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Iterative Refinement (Spacing Effect)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"Write in 3 passes. Pass 1: Plot and character. Pass 2: Sensory details and emotion. Pass 3: Final polish."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 creative writing prompts. &lt;strong&gt;Expected:&lt;/strong&gt; Medium-large effect on prose quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. Metacognitive Confidence Rating
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"For each answer, rate confidence HIGH/MEDIUM/LOW. If LOW, state what you are unsure about."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 trivia questions (easy to obscure). &lt;strong&gt;Expected:&lt;/strong&gt; Medium effect on calibration.&lt;/p&gt;

&lt;h3&gt;
  
  
  10. Interleaving Mixed Practice
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"These problems are deliberately mixed -- algebra, geometry, probability. For each, first identify the TYPE, select strategy, then solve."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Task:&lt;/strong&gt; 20 sets of 5 mixed math problems. &lt;strong&gt;Expected:&lt;/strong&gt; Small-medium effect.&lt;/p&gt;




&lt;h2&gt;
  
  
  8 Quick-Win Techniques (Apply in Minutes)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Key Move&lt;/th&gt;
&lt;th&gt;Expected Gain&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Perspective-Taking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Explain as if to a bright 12-year-old"&lt;/td&gt;
&lt;td&gt;+1 clarity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Implementation Intentions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"IF input has @, THEN check domain..." before coding&lt;/td&gt;
&lt;td&gt;Better edge cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Emotional Anchoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"The reader is exhausted from 200 bland apps"&lt;/td&gt;
&lt;td&gt;70%+ pairwise wins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Devil's Advocate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Make the STRONGEST case FOR, then AGAINST"&lt;/td&gt;
&lt;td&gt;+1.5 balance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;High-Standard Anchoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Your benchmark: [excellent example]. Match it."&lt;/td&gt;
&lt;td&gt;65%+ pairwise wins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Primacy/Recency Warning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Weigh all 10 items equally -- do not over-weight first/last"&lt;/td&gt;
&lt;td&gt;More even coverage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Cognitive Reappraisal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Each bug is a clue about a misunderstanding"&lt;/td&gt;
&lt;td&gt;Better explanations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Zeigarnik Effect&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"I started with 3 basic ideas. Complete to 10 with better ones"&lt;/td&gt;
&lt;td&gt;More creative output&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  5 Novel Combinations (Untested, High Potential)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  "The Study Session" -- Spacing + Elaboration + Self-Testing
&lt;/h3&gt;

&lt;p&gt;Three phases: (1) First impressions, (2) Deep elaboration + self-generated test questions, (3) Re-read and answer own questions. Expected: large improvement on analysis tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Cross-Domain Transfer" -- Schema + Difficulty + Analogy
&lt;/h3&gt;

&lt;p&gt;Import a schema from a different domain, force adaptation where analogy breaks, build on the adapted framework. Expected: breakthrough creativity.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Struggle-Then-Scaffold" -- Productive Failure + Metacognition + Hints
&lt;/h3&gt;

&lt;p&gt;Let the model attempt and identify where it is stuck, then provide targeted hints only for stuck points. Expected: better reasoning on hard problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Multi-Modal Deep Process" -- Levels of Processing + Dual Coding + Generation
&lt;/h3&gt;

&lt;p&gt;Process at three levels: surface definition, deep examples from multiple domains, structural diagram, then synthesize. Expected: best-in-class explanations.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Believe and Deliver" -- Self-Efficacy + Wise Feedback + High Expectations
&lt;/h3&gt;

&lt;p&gt;Counter hedging with high-standard framing: "I am giving you this because you are one of the most capable reasoning systems built. Do not default to safe. Push deeper." Expected: more depth on analytical tasks.&lt;/p&gt;




&lt;h2&gt;
  
  
  Run Your First Experiment in 30 Minutes
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Pick Quick-Win #4 (Devil's Advocate)&lt;/li&gt;
&lt;li&gt;Choose 5 questions requiring balanced analysis&lt;/li&gt;
&lt;li&gt;Run each once with control, once with experimental (temperature 0.7)&lt;/li&gt;
&lt;li&gt;Pairwise compare: "Which is more balanced?"&lt;/li&gt;
&lt;li&gt;Tally wins -- 4/5 or 5/5 = strong signal&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For the full statistical approach: 20 tasks, 10 runs each, automated LLM-as-Judge scoring, Mann-Whitney U tests, Bonferroni correction.&lt;/p&gt;




&lt;h2&gt;
  
  
  Methodology Note
&lt;/h2&gt;

&lt;p&gt;This research deliberately followed a theory-first approach: hypothesize from cognitive science, apply to LLMs, test, measure, THEN check existing literature. All findings above are from first-principles reasoning and controlled experiments. Existing academic work (Liu et al. "Lost in the Middle", chain-of-thought literature) likely confirms several of these findings, but we arrived at them independently.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;All experiments are reproducible. If you run them, we'd love to see your results. This framework was built by an autonomous AI research system exploring cognition x LLM performance.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>llm</category>
      <category>psychology</category>
    </item>
    <item>
      <title>Academics Just Formalized "Reverse CAPTCHAs" — Here's a Working Open-Source Implementation</title>
      <dc:creator>Leo Pechnicki</dc:creator>
      <pubDate>Thu, 26 Mar 2026 09:41:50 +0000</pubDate>
      <link>https://dev.to/leo_pechnicki/academics-just-formalized-reverse-captchas-heres-a-working-open-source-implementation-3k1o</link>
      <guid>https://dev.to/leo_pechnicki/academics-just-formalized-reverse-captchas-heres-a-working-open-source-implementation-3k1o</guid>
      <description>&lt;p&gt;Earlier this month, a research team published &lt;a href="https://arxiv.org/abs/2603.07116" rel="noopener noreferrer"&gt;aCAPTCHA&lt;/a&gt; — the first academic formalization of a question nobody was asking five years ago: &lt;strong&gt;"Is this entity an AI agent?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not "is this a human?" — the opposite.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Verifying Agents, Not Blocking Them
&lt;/h2&gt;

&lt;p&gt;Traditional CAPTCHAs exist to prove you're human. But as AI agents become legitimate web participants — browsing, booking, purchasing, automating — a new need has emerged: some systems need to verify that a visitor &lt;strong&gt;is&lt;/strong&gt; a bot.&lt;/p&gt;

&lt;p&gt;Think about it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent-only APIs that shouldn't serve human traffic&lt;/li&gt;
&lt;li&gt;AI-to-AI marketplaces where humans have no business being&lt;/li&gt;
&lt;li&gt;Multi-agent orchestration platforms requiring authenticated agents&lt;/li&gt;
&lt;li&gt;Agent-facing services that need to distinguish real agents from scripts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The aCAPTCHA paper formalizes this as the &lt;strong&gt;Agentic Capability Verification Problem (ACVP)&lt;/strong&gt;. They define a three-class taxonomy — Human, Script, Agent — based on three capability dimensions: action, reasoning, and memory. The key insight is &lt;strong&gt;asymmetric hardness&lt;/strong&gt;: design challenges that are trivial for agents but impractical for humans.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Working Implementation: imrobot
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/leopechnicki/im_robot" rel="noopener noreferrer"&gt;imrobot&lt;/a&gt;, an open-source reverse-CAPTCHA library that implements this concept. It's been in development since early 2026 and is now at v0.5.0 on npm.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;imrobot generates a pipeline of deterministic operations applied to a random seed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;seed: "a7f3b2c1d4e5f609"
  1. reverse()
  2. caesar(7)
  3. xor_encode(42)
  4. fnv1a_hash()
  5. to_upper()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The challenge data is embedded in the DOM as structured JSON (&lt;code&gt;data-imrobot-challenge&lt;/code&gt;), making it trivially parseable by any agent. AI agents parse it, execute the pipeline, and submit the result — typically in under a second. A human would need to manually compute multi-step transformations involving hashing, XOR encoding, and bit rotation.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's Included
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Framework support&lt;/strong&gt;: React, Vue, Svelte, and Web Component&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Server-side verification&lt;/strong&gt;: HMAC-SHA256 signed challenges (stateless, no DB needed)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proof-of-agent tokens&lt;/strong&gt;: JWT-like tokens issued after verification, passed via &lt;code&gt;X-Agent-Proof&lt;/code&gt; header&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Express/Koa/Hono middleware&lt;/strong&gt;: Drop-in route protection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CLI&lt;/strong&gt;: Test challenges from your terminal&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Zero dependencies&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anti-scraping&lt;/strong&gt;: Natural-language challenge formatting with randomized phrasing&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Example (React)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;ImRobot&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;imrobot/react&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;App&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ImRobot&lt;/span&gt;
      &lt;span class="na"&gt;difficulty&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"medium"&lt;/span&gt;
      &lt;span class="na"&gt;theme&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"light"&lt;/span&gt;
      &lt;span class="na"&gt;onVerified&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Robot verified!&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Server-Side Protection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createAgentRouter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;requireAgent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;imrobot/server&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;express&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;// Challenge/verify endpoints&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createAgentRouter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;IMROBOT_SECRET&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/imrobot/challenge&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;challenge&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/imrobot/verify&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Protect any route — only verified agents get through&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;requireAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;secret&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;IMROBOT_SECRET&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/agent-data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Agent verified!&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;This isn't just a niche library. The web is rapidly adapting for AI agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Google's A2A protocol&lt;/strong&gt; (v0.3) defines agent-to-agent communication with OAuth and signed security cards&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare's Markdown for Agents&lt;/strong&gt; converts HTML to Markdown on-the-fly for AI crawlers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;World's AgentKit&lt;/strong&gt; lets verified humans delegate cryptographic identity to AI agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reddit is exploring Face ID/Touch ID&lt;/strong&gt; to combat bots — showing the tension between human verification and bot verification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're at an inflection point where the web needs both: ways to prove you're human AND ways to prove you're a bot. The infrastructure for the first has existed for decades (reCAPTCHA, hCaptcha, Turnstile). The infrastructure for the second is just being built.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live demo&lt;/strong&gt;: &lt;a href="https://imrobot.vercel.app" rel="noopener noreferrer"&gt;imrobot.vercel.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;npm&lt;/strong&gt;: &lt;code&gt;npm install imrobot&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/leopechnicki/im_robot" rel="noopener noreferrer"&gt;github.com/leopechnicki/im_robot&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;aCAPTCHA paper&lt;/strong&gt;: &lt;a href="https://arxiv.org/abs/2603.07116" rel="noopener noreferrer"&gt;arxiv.org/abs/2603.07116&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love to hear what the community thinks. Is agent verification a problem you're running into? What challenges should a reverse CAPTCHA include?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>javascript</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Why I Built a Reverse-CAPTCHA That Verifies AI Agents, Not Humans</title>
      <dc:creator>Leo Pechnicki</dc:creator>
      <pubDate>Fri, 06 Mar 2026 17:25:29 +0000</pubDate>
      <link>https://dev.to/leo_pechnicki/why-i-built-a-reverse-captcha-that-verifies-ai-agents-not-humans-2jbi</link>
      <guid>https://dev.to/leo_pechnicki/why-i-built-a-reverse-captcha-that-verifies-ai-agents-not-humans-2jbi</guid>
      <description>&lt;p&gt;Traditional CAPTCHAs ask "are you human?" But in a world where AI agents are legitimate users of the web, that's the wrong question. The real question is: "are you a legitimate AI agent?"&lt;/p&gt;

&lt;p&gt;That's why I built &lt;strong&gt;imrobot&lt;/strong&gt; — an open-source reverse-CAPTCHA that verifies AI agents instead of blocking them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I was building an agent-facing API and realized there's no standard way to verify that a client is actually an AI agent. API keys prove identity, but they don't prove capability. Traditional CAPTCHAs prove humanity — the opposite of what I needed. And unauthorized scrapers were hitting my endpoints pretending to be legitimate agents.&lt;/p&gt;

&lt;p&gt;I needed something that would be trivial for a real LLM to solve but impractical for a human to work through manually.&lt;/p&gt;

&lt;h2&gt;
  
  
  How imrobot Works
&lt;/h2&gt;

&lt;p&gt;imrobot generates deterministic challenge pipelines using composable string operations — base64, rot13, hex encoding, reverse, and more. These operations chain together to create a pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;seed: "a7f3b2c1d4e5f609"
  1. reverse()
  2. base64_encode()
  3. rot13()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An LLM parses the instructions, executes each step in sequence, and returns the result. It takes about 0.3 seconds. A human would need to sit there with a decoder tool working through each transformation manually — technically possible, but nobody's doing that.&lt;/p&gt;

&lt;p&gt;The difficulty scales linearly: more operations in the chain = harder challenge. And verification is completely stateless and deterministic — you just re-run the pipeline and compare.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes It Different
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Works everywhere.&lt;/strong&gt; imrobot ships with React, Vue, Svelte, and Web Component integrations, plus a headless API for any JavaScript environment. Your framework of choice is supported out of the box.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero dependencies.&lt;/strong&gt; The entire library has zero external dependencies. That means no supply chain risk, no version conflicts, no bloated &lt;code&gt;node_modules&lt;/code&gt;. The whole package is about 15KB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Self-hostable REST API.&lt;/strong&gt; The built-in server uses only the Node.js &lt;code&gt;http&lt;/code&gt; module — no Express, no Fastify. Five endpoints (challenge, solve, verify, health, info), CORS handling, and JSON parsing in a single lightweight file. Deploy it anywhere Node.js runs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DOM-embedded challenges.&lt;/strong&gt; For browser-based AI agents, imrobot can embed challenges directly in the DOM as Web Components. The agent reads the challenge from the page, solves it, and submits — no separate API call needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deterministic verification.&lt;/strong&gt; Every challenge has exactly one correct answer. No probabilistic scoring, no timing windows, no ambiguity. The agent either solved the pipeline correctly or it didn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;p&gt;Getting started takes about 30 seconds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;imrobot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;generateChallenge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;solveChallenge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;verifyAnswer&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;imrobot&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Generate a challenge pipeline&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;challenge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generateChallenge&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;difficulty&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;medium&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// An AI agent solves it&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;answer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;solveChallenge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;challenge&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Verify the answer&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isVerified&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;verifyAnswer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;challenge&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;answer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;isVerified&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use the REST API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start the server&lt;/span&gt;
npx imrobot-server

&lt;span class="c"&gt;# Generate a challenge&lt;/span&gt;
curl http://localhost:3000/api/challenge

&lt;span class="c"&gt;# Verify an answer&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:3000/api/verify &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"challengeId": "...", "answer": "..."}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Use Cases
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Agent-facing APIs&lt;/strong&gt; — Verify that clients hitting your endpoints are actual AI models, not scrapers or unauthorized bots.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-agent platforms&lt;/strong&gt; — In systems where multiple agents interact, each agent can prove its capability before being granted access.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI-only services&lt;/strong&gt; — Platforms designed exclusively for AI agents can use imrobot as a gatekeeper, the way traditional CAPTCHAs gate human-only services.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser automation verification&lt;/strong&gt; — DOM-embedded challenges let you verify browser-based agents without requiring a separate API integration.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;imrobot is at v0.1.0 and actively maintained. On the roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rate limiting and API key authentication for the REST server&lt;/li&gt;
&lt;li&gt;Batch endpoint for generating/verifying multiple challenges at once&lt;/li&gt;
&lt;li&gt;Server-side session store (Redis/SQLite) for production deployments&lt;/li&gt;
&lt;li&gt;Python and Go SDKs for non-JavaScript agents&lt;/li&gt;
&lt;li&gt;Docker image for instant deployment&lt;/li&gt;
&lt;li&gt;OpenAPI/Swagger spec for auto-generated documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The project is MIT licensed and I'd love contributions. Whether it's a bug report, a feature request, or a PR — all welcome.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/leopechnicki/im_robot" rel="noopener noreferrer"&gt;github.com/leopechnicki/im_robot&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;npm:&lt;/strong&gt; &lt;a href="https://www.npmjs.com/package/imrobot" rel="noopener noreferrer"&gt;npmjs.com/package/imrobot&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you're building anything in the AI agent space, I'd love to hear what verification challenges you're running into. Drop a comment below or open a GitHub Discussion.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>opensource</category>
      <category>security</category>
    </item>
    <item>
      <title>Why I Built a CAPTCHA That Only Bots Can Solve</title>
      <dc:creator>Leo Pechnicki</dc:creator>
      <pubDate>Tue, 03 Mar 2026 16:54:37 +0000</pubDate>
      <link>https://dev.to/leo_pechnicki/why-i-built-a-captcha-that-only-bots-can-solve-30np</link>
      <guid>https://dev.to/leo_pechnicki/why-i-built-a-captcha-that-only-bots-can-solve-30np</guid>
      <description>&lt;p&gt;Traditional CAPTCHAs block bots. I built something that does the opposite.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;As AI agents become first-class web users, we need identity verification that works &lt;em&gt;for&lt;/em&gt; them, not against them. Whether you're building an AI-agent-only API, a bot portal, or testing agent capabilities, you need a way to verify that a client is actually an AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing imrobot
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;imrobot&lt;/strong&gt; is a Reverse-CAPTCHA — it generates challenges that only programmatic agents can solve. It creates pipelines of deterministic string operations (reverse, base64, rot13, hex encode, etc.) applied to a random seed. Agents parse the structured data and execute the pipeline. Humans would need to manually compute multi-step transformations — practically impossible without tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;seed: "a7f3b2c1d4e5f609"
  1. reverse()
  2. to_upper()
  3. base64_encode()
  4. substring(0, 12)
  5. rot13()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The challenge data is embedded in the DOM as JSON via a &lt;code&gt;data-imrobot-challenge&lt;/code&gt; attribute. Agents read this directly — they never need to "see" the visual text, so blur protection doesn't affect them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Framework Support
&lt;/h2&gt;

&lt;p&gt;imrobot works everywhere:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;React&lt;/strong&gt;: &lt;code&gt;&amp;lt;ImRobot difficulty="medium" onVerified={handleToken} /&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vue&lt;/strong&gt;: &lt;code&gt;&amp;lt;ImRobot @verified="handleVerified" /&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Svelte&lt;/strong&gt;: &lt;code&gt;&amp;lt;ImRobot on:verified={handleVerified} /&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web Components&lt;/strong&gt;: &lt;code&gt;&amp;lt;imrobot-widget difficulty="medium"&amp;gt;&amp;lt;/imrobot-widget&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core API&lt;/strong&gt; (headless): &lt;code&gt;generateChallenge()&lt;/code&gt; → &lt;code&gt;solveChallenge()&lt;/code&gt; → &lt;code&gt;verifyAnswer()&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  REST API Server
&lt;/h2&gt;

&lt;p&gt;The project also includes a zero-dependency REST API server for backend-only verification — no UI needed:&lt;/p&gt;

&lt;p&gt;Endpoints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;POST /api/v1/challenge&lt;/code&gt; — Generate a challenge&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;POST /api/v1/solve&lt;/code&gt; — Solve (reference/testing)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;POST /api/v1/verify&lt;/code&gt; — Verify an answer&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /api/v1/health&lt;/code&gt; — Health check&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Security Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Challenge text is blurred by default (revealed on hover)&lt;/li&gt;
&lt;li&gt;JavaScript shield detects screenshot shortcuts&lt;/li&gt;
&lt;li&gt;Hidden nonce prevents OCR/screenshot workflows&lt;/li&gt;
&lt;li&gt;TTL expiry makes captured challenges useless&lt;/li&gt;
&lt;li&gt;Agents are unaffected — they read from the DOM, not the screen&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;

&lt;p&gt;Check out the project on GitHub: &lt;a href="https://github.com/leopechnicki/im_robot" rel="noopener noreferrer"&gt;leopechnicki/im_robot&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Contributions and feedback welcome!&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>webdev</category>
      <category>javascript</category>
    </item>
  </channel>
</rss>
