Ty Wells

Posted on Feb 8

How to Use AI Hallucination to Generate Your Software Spec

#ai #machinelearning #softwareengineering #devtools

What if the most hated property of AI models is actually their most useful feature for software development?

Every AI coding tool fights hallucination. LUCID exploits it. This tutorial shows you how to use deliberate AI hallucination to generate a comprehensive, testable software specification for your application -- then verify it against your actual code.

By the end, you will have extracted 80-150 testable requirements spanning functionality, security, privacy, performance, and compliance from a single LLM prompt. Total cost: about $3 per iteration.

Prerequisites

Node.js 20+
An Anthropic API key (set as ANTHROPIC_API_KEY)
A codebase you want to specify (any language, any framework)

Installation

git clone https://github.com/gtsbahamas/hallucination-reversing-system.git
cd hallucination-reversing-system
npm install
npm run build

Step 1: Initialize Your Project

Navigate to your application's root directory and initialize LUCID:

lucid init

This creates a .lucid/ directory to store iterations, claims, and verification results.

Step 2: Describe Your App (Loosely)

lucid describe

LUCID will prompt you for a description of your application. The key here is to be deliberately vague. Do not write a detailed spec. Write what you would tell a friend at a bar:

"It's a career development platform. Users set goals, get AI coaching, manage their finances, upload documents. There's a subscription tier."

The vagueness is the point. Every gap you leave is a gap the AI will fill with its own hallucinated requirements. That is the raw material.

Step 3: Hallucinate

This is where the magic happens:

lucid hallucinate

LUCID prompts the LLM to write a full Terms of Service and Acceptable Use Policy for your application as if it is already live in production with paying customers. The model does not know your app doesn't match its description. It confabulates.

The output is saved to .lucid/iterations/1/hallucinated-tos.md. Open it up and read it. You will find the LLM has invented:

Specific features you never mentioned
Data handling procedures
Security measures
Performance guarantees
User rights and limitations
Account lifecycle rules
SLA commitments

All in precise, legally-styled declarative language. A typical hallucination runs 400-600 lines.

Step 4: Extract Claims

Now parse every declarative statement into a testable requirement:

lucid extract

This produces a structured JSON file at .lucid/iterations/1/claims.json. Each claim looks like:

{
  "id": "CLAIM-042",
  "section": "Data Handling",
  "category": "security",
  "severity": "critical",
  "text": "User data is encrypted at rest using AES-256",
  "testable": true
}

On our test run, this produced 91 claims across five categories:

Category	Count	Examples
Functionality	34	Feature capabilities, user workflows
Security	18	Encryption, access control, auth
Data Privacy	15	Data retention, deletion, portability
Operational	14	Uptime, rate limits, backups
Legal	10	Liability, modifications, termination

No human requirements session produces this breadth in 30 seconds.

Step 5: Verify Against Your Codebase

This is where hallucination meets reality:

lucid verify

LUCID reads your codebase and checks each claim against what actually exists in your code. Each claim receives a verdict:

PASS -- Code fully implements the claim
PARTIAL -- Code partially implements it
FAIL -- Code does not implement or contradicts it
N/A -- Cannot be verified from code alone

The output goes to .lucid/iterations/1/verification-results.json.

Step 6: Generate Your Gap Report

lucid report

This generates a human-readable gap analysis. The compliance score formula is:

Score = (PASS + 0.5 * PARTIAL) / (Total - N/A) * 100

Our first verifiable iteration scored 57.3%. The report shows exactly which claims failed and why -- your development backlog writes itself.

Example report output:

LUCID Gap Report - Iteration 3
===============================
Compliance Score: 57.3%

PASS:    38 claims (44.7%)
PARTIAL: 15 claims (17.6%)
FAIL:    32 claims (37.6%)
N/A:      6 claims

TOP FAILURES (Critical):
- CLAIM-012: Rate limiting not enforced server-side
- CLAIM-027: No malware scanning for file uploads
- CLAIM-041: Account lockout parameters don't match spec

Step 7: Fix, Then Remediate

After addressing gaps in your code, generate specific fix tasks:

lucid remediate

This converts FAIL and PARTIAL verdicts into actionable remediation tasks, sorted by severity:

{
  "id": "REM-001",
  "claimId": "CLAIM-012",
  "title": "Add rate limiting middleware",
  "action": "add",
  "targetFiles": ["src/middleware/rate-limit.ts"],
  "estimatedEffort": "medium",
  "codeGuidance": "Implement express-rate-limit with..."
}

Step 8: Regenerate and Loop

After implementing fixes, feed the updated reality back to the model:

lucid regenerate

This generates a new ToS that incorporates what now exists, while hallucinating new capabilities built on the verified foundation. Extract, verify, report again. Each iteration, the score climbs:

Iteration	Score
3	57.3%
4	69.8%
5	83.2%
6	90.8%

The loop converges because each regeneration is grounded in more reality. New hallucinations become more contextually appropriate. The gap shrinks.

When to Stop

Stop when:

All critical claims are verified
Remaining gaps are intentionally deferred
New hallucinations offer diminishing returns

On our test run, we stopped at 90.8% after 6 iterations. The 5 remaining failures were genuine missing functionality (rate limiting, malware scanning, data retention logic). The hallucinated ToS correctly identified them as requirements a production app should have.

The Cost

Phase	Approximate Cost
Hallucinate	$0.15
Extract	$0.25
Verify	$1.50
Remediate	$0.60
Regenerate	$0.40
Per iteration	~$2.90

Six iterations cost about $17 total. For a verified specification with 91 claims, a gap report, and a prioritized remediation plan, that is the cheapest spec you will ever produce.

Why This Works

The theoretical basis is not hand-waving. Transformer self-attention is mathematically equivalent to Hopfield network pattern completion -- the same computation the hippocampus uses for memory retrieval (Ramsauer et al., 2020). When the LLM hallucinates, it is performing pattern completion from partial cues against its training data. The output includes both accurate completions (real patterns) and confabulated completions (plausible extensions).

The Terms of Service format forces precision because legal language cannot be vague. And external verification (against the codebase, not the model's own assessment) provides the reality-checking that LLMs provably cannot perform on themselves (Huang et al., ICLR 2024).

The closest precedent: protein hallucination from the Baker Lab, where neural network "dreams" served as blueprints for novel proteins. That won the 2024 Nobel Prize in Chemistry.

Get Started

git clone https://github.com/gtsbahamas/lucid.git
cd hallucination-reversing-system
npm install && npm run build

Full paper with neuroscience grounding: https://github.com/gtsbahamas/lucid/blob/main/docs/paper.md

Questions, issues, and contributions welcome.

DEV Community