<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ty Wells</title>
    <description>The latest articles on DEV Community by Ty Wells (@ty_wells_7d0b523d1a02a496).</description>
    <link>https://dev.to/ty_wells_7d0b523d1a02a496</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3210882%2F1af088c0-82af-4f4b-91df-d0a5094dda0f.png</url>
      <title>DEV Community: Ty Wells</title>
      <link>https://dev.to/ty_wells_7d0b523d1a02a496</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ty_wells_7d0b523d1a02a496"/>
    <language>en</language>
    <item>
      <title>We found 250 semantic bugs in popular open-source projects that linters completely missed</title>
      <dc:creator>Ty Wells</dc:creator>
      <pubDate>Thu, 19 Feb 2026 14:06:19 +0000</pubDate>
      <link>https://dev.to/ty_wells_7d0b523d1a02a496/we-found-250-semantic-bugs-in-popular-open-source-projects-that-linters-completely-missed-1bli</link>
      <guid>https://dev.to/ty_wells_7d0b523d1a02a496/we-found-250-semantic-bugs-in-popular-open-source-projects-that-linters-completely-missed-1bli</guid>
      <description>&lt;p&gt;AI coding assistants generate code that compiles clean but contains &lt;strong&gt;semantic bugs&lt;/strong&gt; — SQL injection, auth bypasses, null dereferences. Linters and type checkers miss them because the bugs are in &lt;em&gt;what the code claims to do&lt;/em&gt;, not how it's structured.&lt;/p&gt;

&lt;p&gt;I built &lt;a href="https://tryassay.ai" rel="noopener noreferrer"&gt;Assay&lt;/a&gt; to catch what static tools can't. Then I ran it on popular open-source projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  The results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Project&lt;/th&gt;
&lt;th&gt;Stars&lt;/th&gt;
&lt;th&gt;Claims Verified&lt;/th&gt;
&lt;th&gt;Bugs&lt;/th&gt;
&lt;th&gt;Critical&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LiteLLM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;18K&lt;/td&gt;
&lt;td&gt;1,381&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;185&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;78/100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chatbot UI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;28K&lt;/td&gt;
&lt;td&gt;476&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;41&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;12&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;91/100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LobeChat&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;50K&lt;/td&gt;
&lt;td&gt;205&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;14&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;87/100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open Interpreter&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;55K&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;60/100&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Total: 2,400+ claims verified. 250 bugs found.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every finding links to an interactive dashboard with file paths, line numbers, and code evidence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tryassay.ai/reports/0bccf817-1cb6-43ff-b724-866f14539073" rel="noopener noreferrer"&gt;LiteLLM report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tryassay.ai/reports/cc8c0c61-9b5a-4774-aed1-f99cc4f6991b" rel="noopener noreferrer"&gt;Chatbot UI report&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tryassay.ai/reports/915dfc1a-64ec-483d-b4b5-effb53a86553" rel="noopener noreferrer"&gt;LobeChat report&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;p&gt;Assay extracts every &lt;strong&gt;testable claim&lt;/strong&gt; from a codebase:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"this validates auth tokens"&lt;/li&gt;
&lt;li&gt;"this handles null input"&lt;/li&gt;
&lt;li&gt;"this query prevents injection"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then it uses an adversarial AI pass to verify each claim against the actual code. Think red team for code, not code review.&lt;/p&gt;

&lt;p&gt;The approach is based on a formal framework we published: &lt;a href="https://doi.org/10.5281/zenodo.18522644" rel="noopener noreferrer"&gt;DOI 10.5281/zenodo.18522644&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark results ($638 total experiment cost)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  HumanEval (164 coding tasks) — $220
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Baseline: 86.6% pass rate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assay: 100% at pass@5&lt;/strong&gt; (164/164)&lt;/li&gt;
&lt;li&gt;Self-refine: 87.2% (barely above baseline)&lt;/li&gt;
&lt;li&gt;LLM-as-judge: peaks at 99.4%, then drops to 97.2% at k=5 (more review = worse code)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  SWE-bench (300 real GitHub bugs) — $246
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Baseline: 18.3% resolved&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Assay: 30.3% resolved&lt;/strong&gt; (+65.5% improvement)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I learned building this
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The biggest projects have the most bugs.&lt;/strong&gt; LiteLLM (52 API routes) had 185 bugs. Smaller, more focused projects scored higher.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Critical bugs hide in plain sight.&lt;/strong&gt; These projects have thousands of stars, active communities, and regular releases. The bugs aren't in obscure corners — they're in core functionality.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Traditional tools don't catch semantic bugs.&lt;/strong&gt; Linters check syntax. Type checkers check types. Nothing checks whether the code actually does what it claims to do. That's the gap Assay fills.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;LLM-as-judge gets worse with more attempts.&lt;/strong&gt; At k=5, it starts approving code that actually fails tests. Verification needs to be adversarial, not just "ask the AI if it looks good."&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx tryassay assess /path/to/your/project
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Free, open source. Uses the Anthropic API (~$2-3 for a small project, ~$30-50 for a large codebase). Add &lt;code&gt;--publish&lt;/code&gt; for an interactive dashboard at tryassay.ai.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/gtsbahamas/hallucination-reversing-system" rel="noopener noreferrer"&gt;gtsbahamas/hallucination-reversing-system&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;npm:&lt;/strong&gt; &lt;a href="https://www.npmjs.com/package/tryassay" rel="noopener noreferrer"&gt;tryassay&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live dashboards:&lt;/strong&gt; &lt;a href="https://tryassay.ai" rel="noopener noreferrer"&gt;tryassay.ai&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Free offer:&lt;/strong&gt; Drop a repo link in the comments and I'll run Assay on it and share the dashboard. No charge — I want the data.&lt;/p&gt;




&lt;p&gt;Have you caught semantic bugs in AI-generated code that linters missed? What tools do you use?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>opensource</category>
      <category>programming</category>
    </item>
    <item>
      <title>How to Use AI Hallucination to Generate Your Software Spec</title>
      <dc:creator>Ty Wells</dc:creator>
      <pubDate>Sun, 08 Feb 2026 07:04:13 +0000</pubDate>
      <link>https://dev.to/ty_wells_7d0b523d1a02a496/how-to-use-ai-hallucination-to-generate-your-software-spec-1eja</link>
      <guid>https://dev.to/ty_wells_7d0b523d1a02a496/how-to-use-ai-hallucination-to-generate-your-software-spec-1eja</guid>
      <description>&lt;h2&gt;
  
  
  What if the most hated property of AI models is actually their most useful feature for software development?
&lt;/h2&gt;

&lt;p&gt;Every AI coding tool fights hallucination. LUCID exploits it. This tutorial shows you how to use deliberate AI hallucination to generate a comprehensive, testable software specification for your application -- then verify it against your actual code.&lt;/p&gt;

&lt;p&gt;By the end, you will have extracted 80-150 testable requirements spanning functionality, security, privacy, performance, and compliance from a single LLM prompt. Total cost: about $3 per iteration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Node.js 20+&lt;/li&gt;
&lt;li&gt;An Anthropic API key (set as ANTHROPIC_API_KEY)&lt;/li&gt;
&lt;li&gt;A codebase you want to specify (any language, any framework)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/gtsbahamas/hallucination-reversing-system.git
cd hallucination-reversing-system
npm install
npm run build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;h2&gt;
  
  
  Step 1: Initialize Your Project
&lt;/h2&gt;

&lt;p&gt;Navigate to your application's root directory and initialize LUCID:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lucid init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This creates a .lucid/ directory to store iterations, claims, and verification results.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: Describe Your App (Loosely)
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lucid describe
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;LUCID will prompt you for a description of your application. The key here is to be deliberately vague. Do not write a detailed spec. Write what you would tell a friend at a bar:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"It's a career development platform. Users set goals, get AI coaching, manage their finances, upload documents. There's a subscription tier."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The vagueness is the point. Every gap you leave is a gap the AI will fill with its own hallucinated requirements. That is the raw material.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Hallucinate
&lt;/h2&gt;

&lt;p&gt;This is where the magic happens:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lucid hallucinate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;LUCID prompts the LLM to write a full Terms of Service and Acceptable Use Policy for your application as if it is already live in production with paying customers. The model does not know your app doesn't match its description. It confabulates.&lt;/p&gt;

&lt;p&gt;The output is saved to .lucid/iterations/1/hallucinated-tos.md. Open it up and read it. You will find the LLM has invented:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Specific features you never mentioned&lt;/li&gt;
&lt;li&gt;Data handling procedures&lt;/li&gt;
&lt;li&gt;Security measures&lt;/li&gt;
&lt;li&gt;Performance guarantees&lt;/li&gt;
&lt;li&gt;User rights and limitations&lt;/li&gt;
&lt;li&gt;Account lifecycle rules&lt;/li&gt;
&lt;li&gt;SLA commitments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All in precise, legally-styled declarative language. A typical hallucination runs 400-600 lines.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: Extract Claims
&lt;/h2&gt;

&lt;p&gt;Now parse every declarative statement into a testable requirement:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lucid extract
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This produces a structured JSON file at .lucid/iterations/1/claims.json. Each claim looks like:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "id": "CLAIM-042",
  "section": "Data Handling",
  "category": "security",
  "severity": "critical",
  "text": "User data is encrypted at rest using AES-256",
  "testable": true
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;On our test run, this produced 91 claims across five categories:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Count&lt;/th&gt;
&lt;th&gt;Examples&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Functionality&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;Feature capabilities, user workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;Encryption, access control, auth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Privacy&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Data retention, deletion, portability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operational&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;Uptime, rate limits, backups&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Legal&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Liability, modifications, termination&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No human requirements session produces this breadth in 30 seconds.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: Verify Against Your Codebase
&lt;/h2&gt;

&lt;p&gt;This is where hallucination meets reality:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lucid verify
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;LUCID reads your codebase and checks each claim against what actually exists in your code. Each claim receives a verdict:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PASS -- Code fully implements the claim&lt;/li&gt;
&lt;li&gt;PARTIAL -- Code partially implements it&lt;/li&gt;
&lt;li&gt;FAIL -- Code does not implement or contradicts it&lt;/li&gt;
&lt;li&gt;N/A -- Cannot be verified from code alone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The output goes to .lucid/iterations/1/verification-results.json.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 6: Generate Your Gap Report
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lucid report
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This generates a human-readable gap analysis. The compliance score formula is:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Score = (PASS + 0.5 * PARTIAL) / (Total - N/A) * 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Our first verifiable iteration scored 57.3%. The report shows exactly which claims failed and why -- your development backlog writes itself.&lt;/p&gt;

&lt;p&gt;Example report output:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LUCID Gap Report - Iteration 3
===============================
Compliance Score: 57.3%

PASS:    38 claims (44.7%)
PARTIAL: 15 claims (17.6%)
FAIL:    32 claims (37.6%)
N/A:      6 claims

TOP FAILURES (Critical):
- CLAIM-012: Rate limiting not enforced server-side
- CLAIM-027: No malware scanning for file uploads
- CLAIM-041: Account lockout parameters don't match spec
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;h2&gt;
  
  
  Step 7: Fix, Then Remediate
&lt;/h2&gt;

&lt;p&gt;After addressing gaps in your code, generate specific fix tasks:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lucid remediate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This converts FAIL and PARTIAL verdicts into actionable remediation tasks, sorted by severity:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "id": "REM-001",
  "claimId": "CLAIM-012",
  "title": "Add rate limiting middleware",
  "action": "add",
  "targetFiles": ["src/middleware/rate-limit.ts"],
  "estimatedEffort": "medium",
  "codeGuidance": "Implement express-rate-limit with..."
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;h2&gt;
  
  
  Step 8: Regenerate and Loop
&lt;/h2&gt;

&lt;p&gt;After implementing fixes, feed the updated reality back to the model:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;lucid regenerate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This generates a new ToS that incorporates what now exists, while hallucinating new capabilities built on the verified foundation. Extract, verify, report again. Each iteration, the score climbs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Iteration&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;57.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;69.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;83.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;90.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The loop converges because each regeneration is grounded in more reality. New hallucinations become more contextually appropriate. The gap shrinks.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Stop
&lt;/h2&gt;

&lt;p&gt;Stop when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All critical claims are verified&lt;/li&gt;
&lt;li&gt;Remaining gaps are intentionally deferred&lt;/li&gt;
&lt;li&gt;New hallucinations offer diminishing returns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On our test run, we stopped at 90.8% after 6 iterations. The 5 remaining failures were genuine missing functionality (rate limiting, malware scanning, data retention logic). The hallucinated ToS correctly identified them as requirements a production app should have.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Cost
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Approximate Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hallucinate&lt;/td&gt;
&lt;td&gt;$0.15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extract&lt;/td&gt;
&lt;td&gt;$0.25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Verify&lt;/td&gt;
&lt;td&gt;$1.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Remediate&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Regenerate&lt;/td&gt;
&lt;td&gt;$0.40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Per iteration&lt;/td&gt;
&lt;td&gt;~$2.90&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Six iterations cost about $17 total. For a verified specification with 91 claims, a gap report, and a prioritized remediation plan, that is the cheapest spec you will ever produce.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Works
&lt;/h2&gt;

&lt;p&gt;The theoretical basis is not hand-waving. Transformer self-attention is mathematically equivalent to Hopfield network pattern completion -- the same computation the hippocampus uses for memory retrieval (Ramsauer et al., 2020). When the LLM hallucinates, it is performing pattern completion from partial cues against its training data. The output includes both accurate completions (real patterns) and confabulated completions (plausible extensions).&lt;/p&gt;

&lt;p&gt;The Terms of Service format forces precision because legal language cannot be vague. And external verification (against the codebase, not the model's own assessment) provides the reality-checking that LLMs provably cannot perform on themselves (Huang et al., ICLR 2024).&lt;/p&gt;

&lt;p&gt;The closest precedent: protein hallucination from the Baker Lab, where neural network "dreams" served as blueprints for novel proteins. That won the 2024 Nobel Prize in Chemistry.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/gtsbahamas/lucid.git
cd hallucination-reversing-system
npm install &amp;amp;&amp;amp; npm run build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Full paper with neuroscience grounding: &lt;a href="https://github.com/gtsbahamas/lucid/blob/main/docs/paper.md" rel="noopener noreferrer"&gt;https://github.com/gtsbahamas/lucid/blob/main/docs/paper.md&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Questions, issues, and contributions welcome.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>softwareengineering</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
