DEV Community

Cover image for Building a Governance Layer for an Autonomous AI System Without Knowing AI Safety Existed
Nick-111
Nick-111

Posted on

Building a Governance Layer for an Autonomous AI System Without Knowing AI Safety Existed

Building a Governance Layer for an Autonomous AI System Without Knowing AI Safety Existed
I spent one month building an autonomous multi-agent trading system alone. Six engines, sixty trading strategies, twenty-five ML models, 8,600 tests. After 380,000 lines of Python, I discovered something that changed how I think about AI systems entirely.

The strategies don't make money. But the governance layer I built to manage them might be more valuable than any trading strategy.

What I Built
ZEUS is an autonomous trading system with six engines: HERMES (execution), ATHENA (ML/quant), AEGIS (security), APOLLO (evolution), HADES (infrastructure), and ARES (governance).

ARES is the one that matters for this post. It has 12 modules designed to answer a single question: how do you know if any part of your autonomous system is broken?

The answer turned out to be surprisingly close to what the AI Safety community calls "agent governance infrastructure." I just didn't know that term when I built it.

The Problem: Solo Development at Scale
When you write 380,000 lines of code alone in one month, you hit a wall that has nothing to do with programming skill. You open a file you wrote three weeks ago and have no idea if it's still used. You find modules that import other modules that import modules that go nowhere. You discover that 40% of your "production system" is dead code that nothing touches.

This isn't a code quality problem. It's a system observability problem. In a traditional team, institutional knowledge tells you what's alive and what's dead. When you're one person and Claude, that institutional knowledge lives in your head — and your head is unreliable.

So I built tools to externalize it.

The ARES Governance Stack
Here's what 12 modules of AI system governance looks like in practice:

  1. Module Vitality: SilenceScanner The most fundamental question: is this module alive?

SilenceScanner parses the entire Python AST, builds a directed import graph, and classifies every module into five categories:

truly_dead: Exported symbols exist but nothing references them. These are candidates for deletion.
fake_alive: Imported by something that is itself dead. A zombie module — looks alive, does nothing.
standalone: CLI scripts, database migrations, build tools. No imports in, but that's by design.
island: Imported by something, but missing its INTERFACE.md contract. It works, but nobody can verify it does what it claims.
structural: init.py files and API surfaces. The skeleton that holds everything together.
The key insight: not all unreferenced code is dead. A database migration script has zero imports because it's called by Alembic at deploy time, not by other Python modules. If you flag it as "dead," you'll break production. SilenceScanner knows the difference because it has a taxonomy of standalone patterns (CLI, migrations, build scripts, test infra, server entrypoints).

  1. Contract Compliance: ContractVerifier Once you know which modules are alive, the next question: do they do what they claim?

ContractVerifier reads Python AST, generates an INTERFACE.md describing every public function, class, and method with their signatures, and then verifies that the actual code matches the documented contract.

This is not documentation for humans. It's a machine-verifiable behavior contract. If you change a function signature without updating the contract, CI blocks the merge. If you add a new public method without documenting it, CI blocks the merge.

In AI Safety terms: this is a basic form of behavioral specification enforcement. The system cannot drift from its documented behavior without explicit approval.

  1. Dependency Graph Analysis: IntegrationGraph Builds a directed graph of all module dependencies. Detects:

Dead ends: modules that import nothing and are imported by nothing. True orphans.
Cycles: A imports B imports C imports A. These are maintainability time bombs.
Hubs: modules imported by 50+ others. Changes here have massive blast radius.
Bridges: modules that are the sole connection between two subsystems. Single points of failure.
This matters because in an autonomous system, dependency structure is attack surface. A cycle means a bug in one module can propagate back to itself through the loop. A hub means a single module's failure cascades to half the system.

  1. Gate Enforcement: CiEnforcer Three automated gates run before any commit lands:

Dead code gate: any new truly_dead module blocks the build. (Standalone modules are exempt.)
Contract gate: any public function without a contract entry blocks the build.
Integration gate: any new cycle or orphan blocks the build.
The gates are enforced by the CI pipeline — not by human code review. This is crucial because humans get tired, miss things, or make exceptions. Automated gates don't.

  1. Technical Debt Tracking: DebtTracker Zero tolerance. Any module flagged as dead, any contract violation, any integration gap — it gets a debt item with a timestamp and severity. The CI gate treats all debt as blocking.

This sounds extreme. In a team setting, you'd negotiate debt. In a solo autonomous system, there's nobody to negotiate with. Either the debt is real (fix it now) or the rule is wrong (change the rule). No exceptions.

  1. Runtime Monitoring: ActivityMonitor Static analysis tells you what the code says. Runtime monitoring tells you what it does.

ActivityMonitor hooks into the production event bus and counts every inter-module call. A module that appears alive in the import graph but has zero runtime calls in 24 hours is fake-alive — it exists, it's wired up, but nothing actually uses it.

7-12. Lifecycle, Self-Destruct, Negotiation, Twin, Patterns, Health
The remaining modules handle what happens after you detect a problem:

LifecycleManager: four-stage module lifecycle (active → deprecated → removed → archived). No module gets deleted without going through this pipeline.
SelfDestructManager: orchestrates the deactivation of silent modules. No Python file just gets rm -rf'd — that breaks imports. Deactivation means: log the dependency tree, verify no live callers, move to archive, regenerate contracts, verify the build still passes.
NegotiationRegistry: before a module calls another module's function, it checks the version contract. If the callee's interface has changed, the caller gets a warning before production breaks.
DigitalTwin: a sandboxed copy of the dependency graph where you can simulate removing a module and see what breaks. Like git branch for architecture changes.
PatternMiner: five anti-pattern detection engines — circular dependency, god module (too many imports), shotgun surgery (too many callers), feature envy (importing across unrelated subsystems), and unstable dependency (depending on a module that changes frequently).
HealthScorer: 0-100 score per module based on four weighted factors: references (30%), contract compliance (20%), test coverage (25%), runtime call count (25%).
What I Got Wrong
Here's the honest part.

For months, ARES reported: zero dead modules, 100% contract compliance, health score 100, all six engines rated 3/3.

This was a lie. Not an intentional lie — a structural lie.

When the same person designs the scoring criteria, writes the scoring tools, runs the scoring process, and interprets the results, the output is guaranteed to look good. I had built a closed evaluation loop — a system that validates itself against standards it defined for itself.

This is not unique to my project. Every AI system that evaluates its own safety without external reference points has this problem. If you define "safe" as "passes my safety tests," and you wrote the safety tests, you haven't proven safety — you've proven that your tests match your assumptions.

The only way to break the loop is to expose the system to an evaluation framework you don't control.

For a trading system, that's the market. Sharpe ratios don't care about your self-assessment. For an AI safety tool, that's external users. GitHub issues, bug reports, people using your tool in ways you didn't anticipate.

What I'm Doing About It
I extracted the first module — SilenceScanner — as a standalone open-source tool:

pip install dead-scanner
dead-scanner /path/to/your/project
GitHub: github.com/Nick-lll/dead-scanner

It's MIT licensed, zero dependencies, pure Python. It works on any Python project. It's not the most sophisticated module in ARES — ContractVerifier and PatternMiner are deeper — but it's the one that's easiest to verify independently. Download it, run it, and in 30 seconds you know whether it's useful.

I expect to be wrong about a lot of things. Maybe dead module detection isn't as valuable as I think. Maybe the classification taxonomy needs adjustment. Maybe the whole approach of static analysis + graph theory is the wrong frame.

That's exactly why I'm doing this. I need to be wrong in public, with data, so I can become less wrong. The alternative — being wrong in private while the system tells me everything is 100% — is far worse.

Connection to AI Safety
I didn't build ARES because I read AI safety papers. I built it because managing a 380,000-line autonomous system alone forced me to solve problems that the AI safety community has been thinking about for years:

Behavioral specification: how do you verify an agent does what it claims?
Containment: how do you safely decommission an agent without breaking the system?
Observability: how do you know if any part of the system has silently stopped working?
Dependency integrity: how do you prevent a failure in one agent from cascading to others?
These are not theoretical problems when you have 60 strategies running autonomous trading loops. A "hallucination" in this context isn't a weird chatbot response — it's a trade at the wrong size, or a position held through a stop-loss because the risk module was silently deactivated.

ARES is not a solution to AI safety. It's a set of concrete engineering patterns that address a subset of AI governance problems at the module/agent level. I'm sharing them because I suspect other people building autonomous agent systems are hitting similar walls, and the patterns might be useful even if the implementation is imperfect.

What I'm Looking For
I'm not selling anything. I'm not raising money. I'm looking for:

Technical feedback: what am I wrong about? What would you do differently?
Related work: are there papers, projects, or people I should be reading?
Collaboration: if you're working on AI agent governance, observability, or safety infrastructure, I want to talk.
If you're at Anthropic, DeepMind, or any team working on making AI systems more governable — I'd especially like to hear from you. I built this in isolation. I know there's a lot I'm missing. I want to learn from people who've been thinking about these problems longer than I have.

I built a trading system that doesn't make money. But the governance layer I built to manage it — contract compliance, lifecycle management, dead code detection, dependency graph analysis, pattern mining, digital twin simulation — might actually be useful to people building autonomous AI systems. I open-sourced the first module. If this resonates, reach out: github.com/Nick-lll

Top comments (0)