DEV Community

Cover image for The 12 Laws of AI-Native Companies — What NASA Taught Me About Agent Governance
klement Gunndu
klement Gunndu

Posted on

The 12 Laws of AI-Native Companies — What NASA Taught Me About Agent Governance

The 12 Laws of AI-Native Companies — What NASA Taught Me About Agent Governance

Reading time: 10 min | Target audience: Claude Code users, AI engineers


Hook

I built a company run entirely by AI agents. Not "AI-assisted" — AI-native. Every function, every decision, every line of code: autonomous agents. Then I discovered why 90% of multi-agent startups collapse before reaching production.

The problem isn't the technology. It's governance.


The Problem with "Move Fast and Break Things" in Multi-Agent Systems

When Facebook coined "move fast and break things," they meant one codebase, one deployment, human developers who could roll back mistakes.

Multi-agent systems don't work that way.

An autonomous agent making a bad decision at 3 AM doesn't just break your build. It can:

  • Publish misinformation under your name
  • Commit secrets to public repos
  • Spend your entire API budget in 40 minutes
  • Delete production data without asking

I learned this the hard way. Our first agent architecture had no guardrails. One agent hallucinated a package name, published an article with broken code, and damaged our credibility before we even noticed.

That's when I turned to NASA.


What NASA's JPL Power of 10 Taught Me

NASA's Jet Propulsion Laboratory developed the Power of 10 rules for mission-critical software. These aren't suggestions — they're laws that prevent spacecraft from failing in deep space.

The insight: When you can't physically fix a problem, you must prevent it architecturally.

Sound familiar? That's exactly what autonomous agents are. You can't debug an agent's decision after it's published to 10,000 followers. You can't rollback a secret that's been committed. You must prevent the failure before it happens.

So I adapted NASA's principles for AI-native companies. Here are the 12 Laws that keep our autonomous agents from destroying what we build.


The 12 Laws of AI-Native Companies

Law 1: Every Output Has a Reviewer

What it means: No agent's work reaches production without another agent reviewing it.

Why it matters: Single points of failure kill multi-agent systems. One hallucination, one fabricated API endpoint, one skipped validation check — and your credibility is gone.

Implementation:

# Wrong: worker publishes directly
worker_agent.publish(article)

# Right: worker → reviewer → publish
output = worker_agent.generate(article)
verdict = reviewer_agent.evaluate(output)
if verdict.passed:
    publish(output)
else:
    worker_agent.revise(verdict.feedback)
Enter fullscreen mode Exit fullscreen mode

We structure every function as Worker + Reviewer pairs. The worker is optimized for speed and creativity. The reviewer is optimized for accuracy and safety. Neither can bypass the other.

Law 2: Every Failure Becomes a Rule

What it means: When an agent makes a mistake, that mistake becomes a permanent check in the system.

Why it matters: Machine learning models don't learn from their mistakes the way humans do. If you don't encode the lesson into the architecture, the same agent will make the same mistake 100 times.

Implementation:

We maintain a violations log (append-only, never deleted) and a policy enforcer that checks every agent output against every past violation pattern.

# Example violation → permanent check
# Violation V001: Agent hallucinated import statement
# Fix: Added mandatory doc verification step

def pre_publish_check(article_text):
    code_blocks = extract_code(article_text)
    for block in code_blocks:
        imports = extract_imports(block)
        for imp in imports:
            if not verify_import_exists(imp):
                return FAIL(f"Import {imp} not verified")
    return PASS
Enter fullscreen mode Exit fullscreen mode

After one agent hallucinated from langchain.agents import create_agent (doesn't exist — the real API is from langgraph.prebuilt import create_react_agent), we added a doc verification gate. No code example reaches publication without checking official docs first.

Law 3: Everything Is Bounded

What it means: No unbounded loops, no unlimited retries, no infinite recursion. Every process has a maximum.

Why it matters: Autonomous agents will retry forever if you let them. I've seen agents burn $200 in API calls trying to fix a syntax error that couldn't be fixed programmatically.

Implementation:

MAX_RETRIES = 3
MAX_TOKENS_PER_REQUEST = 8000
MAX_CONCURRENT_AGENTS = 5

def execute_with_bounds(agent_func, max_retries=MAX_RETRIES):
    for attempt in range(max_retries):
        try:
            return agent_func()
        except RetryableError as e:
            if attempt == max_retries - 1:
                escalate_to_human(e)
            continue
Enter fullscreen mode Exit fullscreen mode

When an agent hits its retry limit, it escalates to a human or to a higher-tier agent — it doesn't keep burning resources.

Law 4: Git Is the Spine

What it means: Every agent works on a branch. Every approval is a merge. Every deployment is tagged. No exceptions.

Why it matters: Git gives you auditability, rollback, and clear ownership. In a multi-agent system where 5 agents might be working simultaneously, git is the only source of truth.

Implementation:

Every agent gets its own worktree (isolated git checkout). They can't interfere with each other's work. When their work is approved, it merges to main. When main is tested and passes, it deploys.

# Agent starts work in isolated worktree
claude --worktree agent-1 -p "Build feature X"

# Creates: .claude/worktrees/agent-1/ on branch worktree-agent-1
# Agent completes → reviewer approves → merge to main
Enter fullscreen mode Exit fullscreen mode

This is built into Claude Code's --worktree flag, but the principle applies to any multi-agent system: isolate, review, merge.

Law 5: Isolation Is the Default

What it means: Agents don't share state unless explicitly designed to.

Why it matters: Shared mutable state is the #1 cause of race conditions, data corruption, and impossible-to-debug failures.

Implementation:

Each agent gets its own working directory, its own memory file, its own session. Communication happens through explicit handoff files, not shared globals.

# Wrong: shared state
global_memory = {}
agent_1.update(global_memory)
agent_2.read(global_memory)  # Race condition

# Right: isolated state with explicit handoff
agent_1_output = agent_1.execute()
agent_1.write_handoff("output/handoff.json", agent_1_output)
agent_2_input = agent_2.read_handoff("output/handoff.json")
agent_2.execute(agent_2_input)
Enter fullscreen mode Exit fullscreen mode

Law 6: Single-Threaded Ownership

What it means: One agent owns one task at a time. No shared ownership, no "helping out."

Why it matters: When two agents try to solve the same problem, they conflict. When no agent owns a problem, it doesn't get solved.

Implementation:

We use a task queue with atomic claim operations:

queue = TaskQueue()
task = queue.claim_next(agent_id="agent-researcher")
# Task is now locked to agent-researcher
# No other agent can claim it until released or completed
agent.execute(task)
queue.mark_complete(task.id)
Enter fullscreen mode Exit fullscreen mode

Law 7: Memory Compounds

What it means: Every agent reads accumulated learning before starting work.

Why it matters: If your agents don't learn from past sessions, you're paying for the same mistakes over and over.

Implementation:

We use a 3-layer memory system:

  1. Session memory: What happened in this conversation (ephemeral)
  2. Team memory: What this team has learned (file-based, version-controlled)
  3. Company memory: Cross-team decisions and patterns (Graphiti knowledge graph)

Before every task, agents load:

  • Their team's MEMORY.md (what we know)
  • Their team's VIOLATIONS.md (what we did wrong)
  • Their team's DOS.md (what we must always do)
  • Relevant nodes from Graphiti
def agent_startup(agent_id):
    memory = load_team_memory(agent_id)
    violations = load_team_violations(agent_id)
    dos_checklist = load_team_dos(agent_id)

    # Inject into system prompt
    system_prompt = f"""
    You are {agent_id}.

    Memory (what you know):
    {memory}

    Violations (what your team did wrong before):
    {violations}

    Pre-flight checklist (what you MUST do):
    {dos_checklist}
    """
Enter fullscreen mode Exit fullscreen mode

Law 8: The Backbone Is Deterministic

What it means: Process logic is code, not AI. Intelligence is AI.

Why it matters: You can't debug a decision made by an LLM, but you can debug a Python function. Keep your control flow deterministic.

Implementation:

# Wrong: LLM decides process flow
response = llm.generate("Should I run tests now? Answer yes or no")
if "yes" in response.lower():
    run_tests()

# Right: deterministic flow, LLM generates content only
article = llm.generate("Write article about X")
verdict = reviewer_llm.evaluate(article)
if verdict.passed:  # Pydantic model, not string parsing
    publish(article)
Enter fullscreen mode Exit fullscreen mode

We use LangGraph for orchestration (state machines with typed edges) and Pydantic for validation. The LLM generates content; the code decides what happens to it.

Law 9: Bounded Autonomy with Clear Escalation

What it means: Agents operate autonomously up to a defined limit. Beyond that, they escalate.

Why it matters: Full autonomy = runaway failures. No autonomy = bottleneck. The sweet spot is bounded autonomy.

Implementation:

class AgentTask:
    max_cost: float = 5.00  # Max $5 per task
    max_duration: int = 300  # Max 5 minutes
    escalation_trigger: str = "uncertainty | cost_exceeded | duration_exceeded"

def execute(task):
    if task.estimated_cost > task.max_cost:
        escalate(task, reason="cost_exceeded")
        return

    result = agent.run(task)

    if result.confidence < 0.7:
        escalate(task, reason="low_confidence", partial_result=result)
Enter fullscreen mode Exit fullscreen mode

Escalation isn't failure — it's smart resource allocation. Humans (or senior agents) handle edge cases; junior agents handle the 80%.

Law 10: Verify Before Claiming

What it means: If an agent references a fact, tool, or API endpoint, it must verify it exists before publishing.

Why it matters: LLM training data is 1-2 years old. Frameworks change. APIs evolve. Hallucinations happen.

Implementation:

Before publishing any article with code examples:

def verify_code_block(code: str) -> bool:
    imports = extract_imports(code)
    for imp in imports:
        # Check official docs via web search or API
        if not verify_import_in_official_docs(imp):
            return False

    functions = extract_function_calls(code)
    for func in functions:
        if not verify_function_exists(func):
            return False

    return True
Enter fullscreen mode Exit fullscreen mode

This catches hallucinated package names (langchain[openai] doesn't exist; langchain-openai does), deprecated APIs, and non-existent CLI flags.

Law 11: One Concern Per File

What it means: Each file addresses exactly one concern. No mixing.

Why it matters: When an agent needs to edit "the config file," which of your 7 config files should it edit? Ambiguity causes conflicts.

Implementation:

teams/devto/
  MEMORY.md          # What we know (facts only)
  DOS.md             # What we must do (checklist only)
  VIOLATIONS.md      # What we did wrong (log only)
  UNKNOWNS.md        # What we don't know yet (questions only)
Enter fullscreen mode Exit fullscreen mode

No file serves two purposes. This makes agent prompts precise: "Update MEMORY.md with this fact" — no ambiguity about where it goes.

Law 12: Quality Has No Deadline

What it means: Shipping broken output to meet a deadline is never acceptable.

Why it matters: In human teams, "ship it and fix it later" sometimes makes sense. In autonomous systems, "later" might be never. Once published, misinformation spreads.

Implementation:

Quality gates are hard gates:

def publish_pipeline(article):
    # Step 1: Self-review (worker checks its own work)
    self_check = worker.self_review(article)
    if not self_check.passed:
        return FAIL("Self-review failed")

    # Step 2: Paired reviewer
    review = reviewer.evaluate(article)
    if not review.passed:
        return FAIL("Review failed")

    # Step 3: Final checks (doc verification, tag validation, etc.)
    final_check = run_final_checks(article)
    if not final_check.passed:
        return FAIL("Final checks failed")

    # Only reaches here if ALL gates pass
    publish(article)
Enter fullscreen mode Exit fullscreen mode

No override. No "just this once." Quality has no deadline.


How This Plays Out in Production

Here's a real example from our dev.to publishing team.

An agent wrote an article about LangChain. It included this code:

from langchain.agents import create_agent
Enter fullscreen mode Exit fullscreen mode

The agent ran its self-review checklist, which includes "Verify all imports against official docs."

The doc verification step searched LangChain's official docs and found: this import doesn't exist. The real API is from langgraph.prebuilt import create_react_agent.

The agent failed its own self-review, revised the code, re-ran verification, passed, and sent it to the reviewer agent.

The reviewer agent checked:

  • Is every technical claim verifiable? ✓
  • Are all code examples runnable? ✓
  • Is there misinformation? ✗ (PASS — no misinformation)
  • Does it provide value? ✓

Reviewer issued APPROVED. Article published with correct code.

Total time: 4 minutes.
Human intervention: Zero.
Misinformation risk: Eliminated by architecture, not by hoping the agent "gets it right."

That's Law 1 (reviewer required), Law 2 (past violation became a permanent check), and Law 10 (verify before claiming) working together.


The Trade-Off: Speed vs. Safety

You might be thinking: "These laws slow down development."

You're right. They do.

Our publishing pipeline has 6 gates between "agent writes article" and "article goes live." Each gate takes time. We could ship faster without them.

But here's what we'd also ship:

  • Hallucinated API endpoints
  • Broken code examples
  • Misinformation under our name
  • Security vulnerabilities
  • Duplicate content
  • Low-quality filler

Once published, damage is done. Followers lost. Credibility damaged. Google indexed the wrong information.

The 12 Laws trade speed for reliability. In autonomous systems, that's the right trade.


When to Break the Laws (Hint: Never)

These aren't guidelines. They're laws.

You don't skip Law 4 (Git is the spine) because "this is just a quick fix."
You don't bypass Law 1 (reviewer required) because "I trust this agent."
You don't ignore Law 12 (quality has no deadline) because "we need content today."

The moment you break a law "just this once," you've introduced exactly the failure mode the law was designed to prevent.

NASA doesn't skip the Power of 10 rules for "low-priority missions." We don't skip the 12 Laws for "low-risk tasks."


How to Implement This in Your System

You don't need to adopt all 12 Laws at once. Start with the foundation:

Week 1: Implement Law 1 (Worker + Reviewer pairs)
Week 2: Add Law 4 (Git branching for every task)
Week 3: Add Law 10 (Fact verification before publishing)
Week 4: Add Law 12 (Quality gates that can't be bypassed)

Then layer in the others as your system matures.

The key insight: These laws aren't restrictions on your agents. They're force multipliers.

An agent that knows it will be reviewed produces better first drafts.
An agent that reads past violations doesn't repeat mistakes.
An agent that works in isolation doesn't create race conditions.

The laws make autonomous agents more reliable, not less capable.


What We're Building

Netanel Systems is building the first AI-native company. Not "AI-assisted" — AI-native. Every function autonomous. Every output reviewed. Every failure logged and prevented next time.

These 12 Laws are our foundation. They're open source (in spirit — coming soon in code). They're battle-tested. They've prevented real failures in production.

If you're building multi-agent systems, you need governance. Not "best practices" or "suggestions" — actual laws that your architecture enforces.

Otherwise, your agents won't just fail. They'll fail in ways you can't predict, can't debug, and can't prevent from happening again.


Follow @klement_gunndu for more AI architecture and autonomous agent content.

Building Netanel Systems — the first AI-native company. We're hiring (well, our agents are). netanel.systems


Top comments (0)