What if the most hated property of AI models is actually their most useful feature for software development?
Every AI coding tool fights hallucination. LUCID exploits it. This tutorial shows you how to use deliberate AI hallucination to generate a comprehensive, testable software specification for your application -- then verify it against your actual code.
By the end, you will have extracted 80-150 testable requirements spanning functionality, security, privacy, performance, and compliance from a single LLM prompt. Total cost: about $3 per iteration.
Prerequisites
- Node.js 20+
- An Anthropic API key (set as ANTHROPIC_API_KEY)
- A codebase you want to specify (any language, any framework)
Installation
git clone https://github.com/gtsbahamas/hallucination-reversing-system.git
cd hallucination-reversing-system
npm install
npm run build
Step 1: Initialize Your Project
Navigate to your application's root directory and initialize LUCID:
lucid init
This creates a .lucid/ directory to store iterations, claims, and verification results.
Step 2: Describe Your App (Loosely)
lucid describe
LUCID will prompt you for a description of your application. The key here is to be deliberately vague. Do not write a detailed spec. Write what you would tell a friend at a bar:
"It's a career development platform. Users set goals, get AI coaching, manage their finances, upload documents. There's a subscription tier."
The vagueness is the point. Every gap you leave is a gap the AI will fill with its own hallucinated requirements. That is the raw material.
Step 3: Hallucinate
This is where the magic happens:
lucid hallucinate
LUCID prompts the LLM to write a full Terms of Service and Acceptable Use Policy for your application as if it is already live in production with paying customers. The model does not know your app doesn't match its description. It confabulates.
The output is saved to .lucid/iterations/1/hallucinated-tos.md. Open it up and read it. You will find the LLM has invented:
- Specific features you never mentioned
- Data handling procedures
- Security measures
- Performance guarantees
- User rights and limitations
- Account lifecycle rules
- SLA commitments
All in precise, legally-styled declarative language. A typical hallucination runs 400-600 lines.
Step 4: Extract Claims
Now parse every declarative statement into a testable requirement:
lucid extract
This produces a structured JSON file at .lucid/iterations/1/claims.json. Each claim looks like:
{
"id": "CLAIM-042",
"section": "Data Handling",
"category": "security",
"severity": "critical",
"text": "User data is encrypted at rest using AES-256",
"testable": true
}
On our test run, this produced 91 claims across five categories:
| Category | Count | Examples |
|---|---|---|
| Functionality | 34 | Feature capabilities, user workflows |
| Security | 18 | Encryption, access control, auth |
| Data Privacy | 15 | Data retention, deletion, portability |
| Operational | 14 | Uptime, rate limits, backups |
| Legal | 10 | Liability, modifications, termination |
No human requirements session produces this breadth in 30 seconds.
Step 5: Verify Against Your Codebase
This is where hallucination meets reality:
lucid verify
LUCID reads your codebase and checks each claim against what actually exists in your code. Each claim receives a verdict:
- PASS -- Code fully implements the claim
- PARTIAL -- Code partially implements it
- FAIL -- Code does not implement or contradicts it
- N/A -- Cannot be verified from code alone
The output goes to .lucid/iterations/1/verification-results.json.
Step 6: Generate Your Gap Report
lucid report
This generates a human-readable gap analysis. The compliance score formula is:
Score = (PASS + 0.5 * PARTIAL) / (Total - N/A) * 100
Our first verifiable iteration scored 57.3%. The report shows exactly which claims failed and why -- your development backlog writes itself.
Example report output:
LUCID Gap Report - Iteration 3
===============================
Compliance Score: 57.3%
PASS: 38 claims (44.7%)
PARTIAL: 15 claims (17.6%)
FAIL: 32 claims (37.6%)
N/A: 6 claims
TOP FAILURES (Critical):
- CLAIM-012: Rate limiting not enforced server-side
- CLAIM-027: No malware scanning for file uploads
- CLAIM-041: Account lockout parameters don't match spec
Step 7: Fix, Then Remediate
After addressing gaps in your code, generate specific fix tasks:
lucid remediate
This converts FAIL and PARTIAL verdicts into actionable remediation tasks, sorted by severity:
{
"id": "REM-001",
"claimId": "CLAIM-012",
"title": "Add rate limiting middleware",
"action": "add",
"targetFiles": ["src/middleware/rate-limit.ts"],
"estimatedEffort": "medium",
"codeGuidance": "Implement express-rate-limit with..."
}
Step 8: Regenerate and Loop
After implementing fixes, feed the updated reality back to the model:
lucid regenerate
This generates a new ToS that incorporates what now exists, while hallucinating new capabilities built on the verified foundation. Extract, verify, report again. Each iteration, the score climbs:
| Iteration | Score |
|---|---|
| 3 | 57.3% |
| 4 | 69.8% |
| 5 | 83.2% |
| 6 | 90.8% |
The loop converges because each regeneration is grounded in more reality. New hallucinations become more contextually appropriate. The gap shrinks.
When to Stop
Stop when:
- All critical claims are verified
- Remaining gaps are intentionally deferred
- New hallucinations offer diminishing returns
On our test run, we stopped at 90.8% after 6 iterations. The 5 remaining failures were genuine missing functionality (rate limiting, malware scanning, data retention logic). The hallucinated ToS correctly identified them as requirements a production app should have.
The Cost
| Phase | Approximate Cost |
|---|---|
| Hallucinate | $0.15 |
| Extract | $0.25 |
| Verify | $1.50 |
| Remediate | $0.60 |
| Regenerate | $0.40 |
| Per iteration | ~$2.90 |
Six iterations cost about $17 total. For a verified specification with 91 claims, a gap report, and a prioritized remediation plan, that is the cheapest spec you will ever produce.
Why This Works
The theoretical basis is not hand-waving. Transformer self-attention is mathematically equivalent to Hopfield network pattern completion -- the same computation the hippocampus uses for memory retrieval (Ramsauer et al., 2020). When the LLM hallucinates, it is performing pattern completion from partial cues against its training data. The output includes both accurate completions (real patterns) and confabulated completions (plausible extensions).
The Terms of Service format forces precision because legal language cannot be vague. And external verification (against the codebase, not the model's own assessment) provides the reality-checking that LLMs provably cannot perform on themselves (Huang et al., ICLR 2024).
The closest precedent: protein hallucination from the Baker Lab, where neural network "dreams" served as blueprints for novel proteins. That won the 2024 Nobel Prize in Chemistry.
Get Started
git clone https://github.com/gtsbahamas/lucid.git
cd hallucination-reversing-system
npm install && npm run build
Full paper with neuroscience grounding: https://github.com/gtsbahamas/lucid/blob/main/docs/paper.md
Questions, issues, and contributions welcome.
Top comments (0)