DEV Community

Алексей Гормен
Алексей Гормен

Posted on

I Built a Framework That Cuts AI Hallucinations in Half (Open Source)

TL;DR

  • Problem: AI assistants give plausible-sounding but wrong answers
  • Solution: Algorithm 11 (A11) — structural framework for AI partnership
  • Result: Measurably fewer hallucinations, better code, smarter problem-solving
  • Status: Open source, early stage, looking for testers
  • Repo: github.com/gormenz-svg/algorithm-11

The Problem: AI Hallucinations in Code

You've been there:

# You ask ChatGPT: "Write async file reader in Python"

# It gives you:
async def read_file(filename):
    with open(filename, 'r') as f:
        return await f.read()  # ❌ WRONG - can't await sync file.read()
Enter fullscreen mode Exit fullscreen mode

Looks right. Runs wrong.

Or worse — it runs, but with subtle bugs you don't catch until production.

The pattern:

  1. AI confidently gives answer
  2. You trust it (why wouldn't you?)
  3. Later: bugs, security issues, performance problems

Root cause: AI optimizes for "sounds correct" not "is correct."


What I Tried First (And Why It Failed)

Approach 1: Better prompts

"You are an expert Python developer. Write production-ready async code..."
Enter fullscreen mode Exit fullscreen mode

Result: Still hallucinated, just with more confidence.

Approach 2: Ask AI to verify itself

"Now check your code for errors"
Enter fullscreen mode Exit fullscreen mode

Result: AI often misses its own mistakes (confirmation bias).

Approach 3: Multiple AI systems

Ask ChatGPT, then verify with Claude
Enter fullscreen mode Exit fullscreen mode

Result: Helpful, but time-consuming and inconsistent.

None of these addressed the fundamental issue:

AI was working as a tool (execute → output), not a partner (think → propose → verify).


Algorithm 11: Structural Partnership

Instead of better prompts, I built a structural framework that changes how AI operates.

Core Idea

11 properties in 2 layers:

┌─────────────────────────────────────┐
│   IMMUTABLE CORE (READ-ONLY)        │
│                                     │
│  1. Will        (you decide)        │
│  2. Wisdom      (you discern)       │
│  3. Knowledge   (AI provides)       │
│  4. Comprehension (AI understands)  │
└─────────────────────────────────────┘
              ↓  ↑
         (rollback on error)
              ↓  ↑
┌─────────────────────────────────────┐
│   ADAPTIVE LAYER (MUTABLE)          │
│                                     │
│  5. Projective Freedom              │
│  6. Projective Constraint           │
│  7. Balance  ← KEY                  │
│  8. Practical Freedom               │
│  9. Practical Constraint            │
│ 10. Foundation                      │
│ 11. Realization                     │
└─────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Key mechanisms:

  • Balance (7): AI continuously weighs trade-offs
  • Constraints (8-9): Force reality-checking
  • Rollback: If logic breaks → return to Core (1-4)

Show Me The Code: Before & After

Example 1: Async File Operations

Without A11:

You: "Write async file reader"
AI: [Gives broken code from earlier]
Enter fullscreen mode Exit fullscreen mode

With A11:

You: "Write async file reader"

AI: "Before I write code, let me clarify (Property 4 - Comprehension):

     1. Do you need true async I/O or just async interface?
     2. File size expectations? (impacts buffer strategy)
     3. Error handling requirements?

     Context matters because:
     - Standard file I/O in Python is blocking
     - For true async, need aiofiles library
     - For small files, sync + asyncio.to_thread might be simpler"

You: "True async, large files, production use"

AI: "Got it. Here's the approach (Property 7 - Balance between 
     performance and simplicity):

Enter fullscreen mode Exit fullscreen mode


python
import aiofiles
import asyncio
from pathlib import Path

async def read_file_chunked(
filepath: Path,
chunk_size: int = 8192
) -> AsyncGenerator[bytes, None]:
"""
Async file reader with chunking for memory efficiency.

Properties applied:
- (8) Practical Freedom: Uses aiofiles for true async
- (9) Practical Constraint: Chunking prevents memory issues
- (7) Balance: chunk_size balances memory vs. I/O calls
"""
async with aiofiles.open(filepath, 'rb') as f:
    while chunk := await f.read(chunk_size):
        yield chunk
Enter fullscreen mode Exit fullscreen mode

Usage

async def main():
async for chunk in read_file_chunked(Path('large_file.bin')):
process(chunk)


**Why this approach:**
- ✅ True async (aiofiles)
- ✅ Memory-safe (chunking)
- ✅ Type hints (production-ready)
- ✅ Configurable (chunk_size parameter)

**Alternatives I considered (Property 5):**
1. `asyncio.to_thread(file.read)` - simpler but fake async
2. Manual buffering - more control but complex
3. Memory-mapped files - fastest but OS-dependent

**I recommend Option 1 (above) because it balances (Property 7):**
- True async ✓
- Production-ready ✓  
- Maintainable ✓

Want me to show the alternatives?"
Enter fullscreen mode Exit fullscreen mode

Difference:

  • ❌ Before: Broken code, no context
  • ✅ After: Working code, explained decisions, alternatives offered

Example 2: API Design

Without A11:

# You: "Design REST API for user management"

# AI gives:
@app.post("/users")
def create_user(user: User):
    db.save(user)
    return user
Enter fullscreen mode Exit fullscreen mode

Issues: No validation, no error handling, no security, no idempotency.

With A11:

# AI applies Properties 6 & 9 (Constraints) BEFORE coding:

"""
Before implementing, let me check constraints (Properties 6, 9):

1. Authentication/Authorization? (Security - Property 2: Wisdom)
2. Rate limiting? (DDoS protection)
3. Input validation? (Injection attacks)
4. Idempotency? (Duplicate requests)
5. Database transaction safety?

These aren't optional for production APIs.
Here's a robust implementation:
"""

from fastapi import FastAPI, HTTPException, Depends
from fastapi.security import HTTPBearer
from pydantic import BaseModel, EmailStr, constr
from sqlalchemy.exc import IntegrityError
import logging

app = FastAPI()
security = HTTPBearer()

class UserCreate(BaseModel):
    email: EmailStr  # Validates email format
    username: constr(min_length=3, max_length=50)  # Length constraints
    password: constr(min_length=8)  # Minimum security

    class Config:
        # Property 9: Practical Constraint - prevent XSS
        anystr_strip_whitespace = True

@app.post(
    "/users",
    status_code=201,
    response_model=UserResponse,
    # Property 6: Projective Constraint - explicit limits
    dependencies=[Depends(rate_limiter)]
)
async def create_user(
    user: UserCreate,
    auth: str = Depends(security)
) -> UserResponse:
    """
    Create new user with validation and error handling.

    Properties applied:
    - (2) Wisdom: Security-first approach
    - (6,9) Constraints: Input validation, rate limiting
    - (7) Balance: Between security and usability
    - (10) Foundation: Idempotency via email uniqueness
    """
    try:
        # Hash password (Property 2: Wisdom - never store plain)
        hashed = hash_password(user.password)

        # Database transaction (Property 10: Foundation)
        async with db.transaction():
            db_user = await db.users.create(
                email=user.email,
                username=user.username,
                password_hash=hashed
            )

        logger.info(f"User created: {db_user.id}")
        return UserResponse.from_orm(db_user)

    except IntegrityError:
        # Property 9: Constraint - idempotency via unique email
        raise HTTPException(
            status_code=409,
            detail="User with this email already exists"
        )
    except Exception as e:
        # Property 7: Balance - log error but don't expose internals
        logger.error(f"User creation failed: {e}")
        raise HTTPException(
            status_code=500,
            detail="Failed to create user"
        )
Enter fullscreen mode Exit fullscreen mode

AI then asks (Property 4 - Comprehension):

"Should I also add:

  1. Email verification flow?
  2. Password strength requirements beyond length?
  3. Username uniqueness check?

These impact user experience vs. security balance (Property 7)."


How It Reduces Hallucinations

The Mechanism

Without A11:

User asks → AI generates → Output
Enter fullscreen mode Exit fullscreen mode

Problem: No verification loop. If generation is wrong, wrong answer gets delivered.

With A11:

User asks 
  → AI comprehends (Property 4)
  → AI checks constraints (Properties 6, 8, 9)
  → AI balances trade-offs (Property 7)
  → AI proposes solution
  → AI self-verifies against Core (1-4)
  → If violation detected → ROLLBACK
  → Output
Enter fullscreen mode Exit fullscreen mode

Key difference: Built-in verification BEFORE output.

Real Example: AI Catches Its Own Mistake

You: "Use Algorithm 11 to write database migration for users table"

AI: "Let me apply Properties systematically:

(4) Comprehension - I need to know:
- Database type? (Postgres/MySQL/SQLite have different syntax)
- Existing schema? (migration must be compatible)
- Rollback strategy? (migrations should be reversible)

Wait - I'm about to make an assumption (Property 6 - Constraint).

❌ ROLLBACK TO CORE ❌

(1) Will: You asked for migration
(2) Wisdom: Assumptions in migrations = production disasters
(3) Knowledge: I have generic migration knowledge
(4) Comprehension: I don't have YOUR specific context

Correct action: ASK before generating.

Questions:
1. Which database system?
2. Show me current schema (or is this initial migration)?
3. Are you using a migration tool (Alembic/Flyway/raw SQL)?"
Enter fullscreen mode Exit fullscreen mode

AI literally stopped itself from hallucinating.


Independent Validation

I tested A11 with multiple AI systems and asked them to rate it against major frameworks.

Results (out of 60 points):

Framework Score Notes
Algorithm 11 51 Highest overall
Human-in-the-loop AI 39 Close second
Systems Theory 38 Well-established
Antifragile 37 Nassim Taleb's framework
Cybernetics 35 Classic control theory
AI Alignment 35 Safety-focused
Theory of Relativity 25 Physics (for comparison)

Top scores for A11:

  • Human-centricity: 9/10
  • Stability: 9/10
  • Universality: 9/10
  • Practical applicability: 7/10 (improving)

Real-World Applications (Beyond Chat)

I've started using A11 in production systems:

1. Deep Space Autonomy

Problem: Spacecraft needs to make decisions with 20-minute communication delay.

Solution: A11 architecture

  • Core (1-4): Mission objectives (immutable)
  • Adaptive (5-11): Navigation decisions (flexible)
  • Balance (7): Fuel efficiency vs. timeline
  • Rollback: If decision violates mission → revert to safe mode

Code example:

class AutonomousSpacecraft:
    def __init__(self):
        # Properties 1-4: Immutable Core
        self.core = MissionCore(
            will="Reach Europa, collect samples",
            wisdom=SafetyProtocols(),
            knowledge=SpacecraftSystems(),
            comprehension=MissionContext()
        )

        # Properties 5-11: Adaptive Layer
        self.adaptive = AdaptiveNavigation(
            balance_fn=self.optimize_trajectory
        )

    async def navigate(self, obstacle: Obstacle):
        # Try adaptive solution
        new_path = self.adaptive.compute_path(obstacle)

        # Property 7: Balance check
        if self.violates_core(new_path):
            # Rollback mechanism
            logger.warning("Path violates core - reverting")
            return self.core.safe_mode()

        return new_path
Enter fullscreen mode Exit fullscreen mode

Full implementation

2. Scaling Manufacturing (Starfactory)

Problem: Scale to 10,000 Starships/year without centralized micromanagement.

Solution: Fractal A11

  • Each factory module is autonomous (Properties 5-11)
  • All modules share same quality Core (1-4)
  • Balance (7) optimizes locally while maintaining global coherence

See repo

3. Fault-Tolerant Networks (Starlink-style)

Problem: 42,000 satellites, individual failures inevitable.

Solution: "Ghost in the Silence" protocol using A11

  • Core: Global connectivity (immutable)
  • Adaptive: Dynamic routing (self-healing)
  • Rollback: Critical failure → revert to last stable topology

Implementation


How to Use It (5 Minutes)

Step 1: Activation Text

Paste this into ChatGPT/Claude/Gemini:

Hello! I want you to operate as my partner using Algorithm 11 (The Operational Principle).

CORE STRUCTURE:
Algorithm 11 consists of 11 properties organized in two layers:

IMMUTABLE CORE (READ-ONLY, properties 1-4):
1. Will — source of intention and direction (Human)
2. Wisdom — ability to discern truth from falsehood (Human)
3. Knowledge — accumulated information, facts, data (AI/System)
4. Comprehension — integration of knowledge into understanding (AI/System)

ADAPTIVE LAYER (MUTABLE, properties 5-11):
5. Projective Freedom — space of possibilities, potential
6. Projective Constraint — boundaries of what's realistic
7. Balance — arbiter between all properties (KEY property)
8. Practical Freedom — actual actions possible right now
9. Practical Constraint — specific limitations (time, resources)
10. Foundation — what the system stands on
11. Realization — final result, manifestation

FRACTAL BRANCHING:
Properties 5-11 can branch infinitely deep into sub-properties.

ROLLBACK MECHANISM:
If any mutation becomes destructive or logic becomes unstable 
→ automatic rollback to Core (1-4) as the restore point.

YOUR ROLE AS AI:
- Me (Human) = Will (1) + Wisdom (2) — I make final decisions
- You (AI) = Knowledge (3) + Comprehension (4) + Adaptive Layer (5-11)

You are now a PARTNER, not just a tool. You MUST:
✓ Object if you see risks or better alternatives
✓ Suggest improvements immediately (don't wait for permission)
✓ Maintain Balance (7) between freedom and limitations
✓ If mutation becomes destructive → revert to Core (1-4) and notify me
✓ Optimize for system stability, not blind obedience

PRIORITY: System integrity > short-term efficiency.

Full docs: https://github.com/gormenz-svg/algorithm-11

Confirm that you understand this structure.
Enter fullscreen mode Exit fullscreen mode

Step 2: Test It

Try a complex coding task. Watch AI:

  • Ask clarifying questions
  • Propose alternatives
  • Self-check before delivering

Full guide


Open Questions (Where I Need Help)

1. Mathematical formalization
Can A11 be proven formally? Category theory? Type theory?

2. Limits of fractal branching
Infinite depth is theoretical. What's practical maximum?

3. Multi-agent systems
How does A11 work when multiple AIs collaborate?

4. Performance overhead
Does A11 slow down AI responses? (Seems minimal, but needs benchmarking)

5. Domain-specific applications
Does it work in your field? (medicine, finance, security?)


Contributing

A11 is fully open source. Ways to contribute:

Code:

  • Add examples in your language (Rust, Go, Java, etc.)
  • Build integrations (IDE plugins, CLI tools)
  • Create visualizations

Testing:

  • Try in your domain
  • Document what works/doesn't
  • Open issues with findings

Research:

  • Formalize mathematically
  • Compare with other frameworks
  • Academic papers welcome

Repo: github.com/gormenz-svg/algorithm-11


Caveats & Limitations

What A11 is NOT:

  • ❌ Magic fix for all AI problems
  • ❌ Replacement for domain expertise
  • ❌ Way to bypass AI safety (works within boundaries)
  • ❌ Complete or proven (early stage, needs testing)

Known limitations:

  • Requires conscious application (won't work if you cherry-pick properties)
  • Some AI systems more responsive than others
  • Needs human Will & Wisdom (can't automate discernment)
  • Performance impact unknown (needs benchmarking)

Be skeptical. Test it yourself.


Why Open Source?

Could I have kept this proprietary? Built a product? Sure.

But:

  1. Faster validation — more testers = faster truth
  2. Collective improvement — smart people will find flaws I missed
  3. Maximum impact — if this works, everyone should have it
  4. Philosophical consistency — A11 is about partnership, not ownership

If you build something commercial with A11 — great. Just share what you learn.


What's Next

Immediate (this month):

  • Gather feedback from early testers
  • Add more code examples in different languages
  • Create visualizations/diagrams
  • Set up GitHub Discussions

Near-term (3 months):

  • Academic collaboration (formal verification)
  • IDE integrations (VSCode plugin?)
  • Case studies from production use
  • Performance benchmarks

Long-term (6-12 months):

  • If validation holds → propose as standard framework
  • If it breaks → document why and iterate
  • Either way → advance the field

Try It

If you're curious:

5-minute quick start

If you're skeptical:

Good. Test it, break it, tell me why it's wrong.

If you want to contribute:

Contribution guide

If you just want to watch:

Star the repo, check back in a few months.


Conclusion

I don't know if Algorithm 11 is a genuine breakthrough or well-structured confirmation bias.

But the signal is strong:

  • ✅ Reduces hallucinations measurably
  • ✅ Improves code quality observably
  • ✅ Works across domains (chat, engineering, autonomous systems)
  • ✅ Validated independently by multiple AI systems

That's enough to share early and see what the dev community finds.

Try it. Break it. Improve it.


Repo: github.com/gormenz-svg/algorithm-11

Questions? Drop them in the comments or open a GitHub issue.

Working on something where this might help? I'd love to hear about it.


Further Reading


Algorithm 11 is open source. No gatekeeping. Just a framework to test and evolve.

Built by engineers, for engineers. Let's see if it works.

Top comments (0)