DEV Community

Ty Wells
Ty Wells

Posted on

How to Use AI Hallucination to Generate Your Software Spec

What if the most hated property of AI models is actually their most useful feature for software development?

Every AI coding tool fights hallucination. LUCID exploits it. This tutorial shows you how to use deliberate AI hallucination to generate a comprehensive, testable software specification for your application -- then verify it against your actual code.

By the end, you will have extracted 80-150 testable requirements spanning functionality, security, privacy, performance, and compliance from a single LLM prompt. Total cost: about $3 per iteration.


Prerequisites

  • Node.js 20+
  • An Anthropic API key (set as ANTHROPIC_API_KEY)
  • A codebase you want to specify (any language, any framework)

Installation

git clone https://github.com/gtsbahamas/hallucination-reversing-system.git
cd hallucination-reversing-system
npm install
npm run build
Enter fullscreen mode Exit fullscreen mode

Step 1: Initialize Your Project

Navigate to your application's root directory and initialize LUCID:

lucid init
Enter fullscreen mode Exit fullscreen mode

This creates a .lucid/ directory to store iterations, claims, and verification results.


Step 2: Describe Your App (Loosely)

lucid describe
Enter fullscreen mode Exit fullscreen mode

LUCID will prompt you for a description of your application. The key here is to be deliberately vague. Do not write a detailed spec. Write what you would tell a friend at a bar:

"It's a career development platform. Users set goals, get AI coaching, manage their finances, upload documents. There's a subscription tier."

The vagueness is the point. Every gap you leave is a gap the AI will fill with its own hallucinated requirements. That is the raw material.


Step 3: Hallucinate

This is where the magic happens:

lucid hallucinate
Enter fullscreen mode Exit fullscreen mode

LUCID prompts the LLM to write a full Terms of Service and Acceptable Use Policy for your application as if it is already live in production with paying customers. The model does not know your app doesn't match its description. It confabulates.

The output is saved to .lucid/iterations/1/hallucinated-tos.md. Open it up and read it. You will find the LLM has invented:

  • Specific features you never mentioned
  • Data handling procedures
  • Security measures
  • Performance guarantees
  • User rights and limitations
  • Account lifecycle rules
  • SLA commitments

All in precise, legally-styled declarative language. A typical hallucination runs 400-600 lines.


Step 4: Extract Claims

Now parse every declarative statement into a testable requirement:

lucid extract
Enter fullscreen mode Exit fullscreen mode

This produces a structured JSON file at .lucid/iterations/1/claims.json. Each claim looks like:

{
  "id": "CLAIM-042",
  "section": "Data Handling",
  "category": "security",
  "severity": "critical",
  "text": "User data is encrypted at rest using AES-256",
  "testable": true
}
Enter fullscreen mode Exit fullscreen mode

On our test run, this produced 91 claims across five categories:

Category Count Examples
Functionality 34 Feature capabilities, user workflows
Security 18 Encryption, access control, auth
Data Privacy 15 Data retention, deletion, portability
Operational 14 Uptime, rate limits, backups
Legal 10 Liability, modifications, termination

No human requirements session produces this breadth in 30 seconds.


Step 5: Verify Against Your Codebase

This is where hallucination meets reality:

lucid verify
Enter fullscreen mode Exit fullscreen mode

LUCID reads your codebase and checks each claim against what actually exists in your code. Each claim receives a verdict:

  • PASS -- Code fully implements the claim
  • PARTIAL -- Code partially implements it
  • FAIL -- Code does not implement or contradicts it
  • N/A -- Cannot be verified from code alone

The output goes to .lucid/iterations/1/verification-results.json.


Step 6: Generate Your Gap Report

lucid report
Enter fullscreen mode Exit fullscreen mode

This generates a human-readable gap analysis. The compliance score formula is:

Score = (PASS + 0.5 * PARTIAL) / (Total - N/A) * 100
Enter fullscreen mode Exit fullscreen mode

Our first verifiable iteration scored 57.3%. The report shows exactly which claims failed and why -- your development backlog writes itself.

Example report output:

LUCID Gap Report - Iteration 3
===============================
Compliance Score: 57.3%

PASS:    38 claims (44.7%)
PARTIAL: 15 claims (17.6%)
FAIL:    32 claims (37.6%)
N/A:      6 claims

TOP FAILURES (Critical):
- CLAIM-012: Rate limiting not enforced server-side
- CLAIM-027: No malware scanning for file uploads
- CLAIM-041: Account lockout parameters don't match spec
Enter fullscreen mode Exit fullscreen mode

Step 7: Fix, Then Remediate

After addressing gaps in your code, generate specific fix tasks:

lucid remediate
Enter fullscreen mode Exit fullscreen mode

This converts FAIL and PARTIAL verdicts into actionable remediation tasks, sorted by severity:

{
  "id": "REM-001",
  "claimId": "CLAIM-012",
  "title": "Add rate limiting middleware",
  "action": "add",
  "targetFiles": ["src/middleware/rate-limit.ts"],
  "estimatedEffort": "medium",
  "codeGuidance": "Implement express-rate-limit with..."
}
Enter fullscreen mode Exit fullscreen mode

Step 8: Regenerate and Loop

After implementing fixes, feed the updated reality back to the model:

lucid regenerate
Enter fullscreen mode Exit fullscreen mode

This generates a new ToS that incorporates what now exists, while hallucinating new capabilities built on the verified foundation. Extract, verify, report again. Each iteration, the score climbs:

Iteration Score
3 57.3%
4 69.8%
5 83.2%
6 90.8%

The loop converges because each regeneration is grounded in more reality. New hallucinations become more contextually appropriate. The gap shrinks.


When to Stop

Stop when:

  • All critical claims are verified
  • Remaining gaps are intentionally deferred
  • New hallucinations offer diminishing returns

On our test run, we stopped at 90.8% after 6 iterations. The 5 remaining failures were genuine missing functionality (rate limiting, malware scanning, data retention logic). The hallucinated ToS correctly identified them as requirements a production app should have.


The Cost

Phase Approximate Cost
Hallucinate $0.15
Extract $0.25
Verify $1.50
Remediate $0.60
Regenerate $0.40
Per iteration ~$2.90

Six iterations cost about $17 total. For a verified specification with 91 claims, a gap report, and a prioritized remediation plan, that is the cheapest spec you will ever produce.


Why This Works

The theoretical basis is not hand-waving. Transformer self-attention is mathematically equivalent to Hopfield network pattern completion -- the same computation the hippocampus uses for memory retrieval (Ramsauer et al., 2020). When the LLM hallucinates, it is performing pattern completion from partial cues against its training data. The output includes both accurate completions (real patterns) and confabulated completions (plausible extensions).

The Terms of Service format forces precision because legal language cannot be vague. And external verification (against the codebase, not the model's own assessment) provides the reality-checking that LLMs provably cannot perform on themselves (Huang et al., ICLR 2024).

The closest precedent: protein hallucination from the Baker Lab, where neural network "dreams" served as blueprints for novel proteins. That won the 2024 Nobel Prize in Chemistry.


Get Started

git clone https://github.com/gtsbahamas/lucid.git
cd hallucination-reversing-system
npm install && npm run build
Enter fullscreen mode Exit fullscreen mode

Full paper with neuroscience grounding: https://github.com/gtsbahamas/lucid/blob/main/docs/paper.md

Questions, issues, and contributions welcome.

Top comments (0)