Why I'm Building Trust Infrastructure as a Solo Founder

Ryo Hoshi — Tue, 16 Jun 2026 07:50:21 +0000

I run two AI coding agents simultaneously when I work. Claude Code handles architecture decisions and code review. Codex runs parallel implementation tasks — writing tests, scaffolding modules, handling migrations. On a productive day, they make hundreds of decisions between them.

I have no idea what most of those decisions are.

I'm not exaggerating. I set the direction. I review the outputs. But the intermediate reasoning — why Claude Code chose this abstraction over that one, why Codex structured a test suite in a particular way, what trade-offs each agent silently made — disappears the moment the context window rolls over. I get the artifacts. I lose the logic.

This is my daily reality, and I'm a technical founder who chose this workflow. Imagine an enterprise with fifty teams, each running their own agents, with no coordination, no audit trail, no shared governance model. That's not a hypothetical. That's most organizations adopting AI agents right now.

I wrote about this problem recently — I called it Agent Dark Matter. The invisible mass of unrecorded, unmonitored agent decisions that shapes your organization's outcomes without anyone knowing.

The response confirmed what I suspected: this resonates because it's real. Developers know it. Engineering managers feel it. Nobody has named it until now, and nobody is building the infrastructure to solve it.

So I'm building it.

Why solo

The honest answer: because the infrastructure layer I'm describing doesn't lend itself to a venture-funded sprint. Trust Infrastructure has to be open source — full stop. If the tool that governs your AI agents is itself a black box, you've solved nothing. It has to be built in a language that takes security seriously, which is why I chose Rust. And it has to be designed by someone who uses AI agents every day, not by someone who theorizes about them from a slide deck.

I am my own first user. I'm building deAria — open-source Trust Infrastructure for AI agents — using the very AI coding agents that deAria is designed to govern. Every friction I hit, every moment where I think "I have no idea what that agent just decided," becomes a design requirement. Every workflow I run without governance becomes proof that the problem is real.

This is the recursive loop: deAria governs AI agents. AI agents build deAria. The development process is the product's living proof.

The dogfooding loop:
Build deAria → using AI coding agents → which produce invisible decisions → which reveal design requirements → which feed back into building deAria → ♻️

What I'm not building

I'm not building another agent framework. There are enough of those. I'm not building another LLM wrapper or another chatbot platform.

I'm building the layer that sits beneath all of those — the infrastructure that makes agent activity visible, auditable, and governable. The same way Kubernetes doesn't replace containers but makes containers manageable at scale, Trust Infrastructure doesn't replace agents. It makes agents trustworthy at scale.

What comes next

I'm building in public. The architecture decisions, the trade-offs, the mistakes — all of it will be documented here. If you're running AI agents in production and feeling the gravitational pull of decisions you can't see, you're experiencing Agent Dark Matter. And you're exactly who I'm building this for.

The code is Rust. The license will be open. The first milestone is a working Decision-Aware Runtime that can record every decision an AI agent makes, and let you define policies for what decisions require human approval.

Let's illuminate the dark matter.

I'm building open-source Trust Infrastructure for AI agents at dearia.dev. Read the full problem statement: Agent Dark Matter: The Invisible Crisis in Your AI Stack.

Agent Dark Matter: The Invisible Crisis in Your AI Stack

Ryo Hoshi — Tue, 16 Jun 2026 05:47:54 +0000

Last Tuesday, a senior engineer at a mid-stage startup noticed something strange. The production database schema had changed overnight. A new column had appeared in the users table. The migration had been applied cleanly — no errors, no rollback.

Nobody on the team had written the migration.

She checked the CI/CD logs. Clean. She checked Slack. No deploy notifications. She checked the git blame. The commit existed, authored by the team's AI code review agent. It had reviewed a pull request that included a schema suggestion, determined the change was safe, approved the PR, and triggered the migration pipeline — all autonomously, all between 2:47 AM and 2:49 AM, while every human on the team was asleep.

The change happened to be benign. This time.

But the question that kept her up the next night wasn't about the schema. It was simpler and more unsettling: How many other decisions has this agent made that I don't even know about?

If you run AI agents in production — or even in development — you should be asking yourself the same question. Because the answer is almost certainly: more than you think.

The universe has a name for what you can't see

In astrophysics, dark matter is the invisible mass that makes up roughly 27% of the universe. You can't see it. You can't detect it directly. But you know it's there because of its gravitational effects — galaxies rotate in ways that only make sense if something massive and invisible is pulling on them.

Your organization has its own dark matter problem.

AI agents are proliferating across engineering teams, customer support pipelines, data processing workflows, and internal tooling. They approve pull requests. They classify support tickets. They route alerts. They draft responses. They make thousands of small decisions every day, each one shaping outcomes that humans downstream inherit and act on.

Most of this activity is invisible. Not because anyone is hiding it, but because nobody built the infrastructure to make it visible. There are no audit logs for agent reasoning. No dashboards for agent decision volume. No policies that define what an agent should not be allowed to do.

I call this Agent Dark Matter: the aggregate of all AI agent activity within an organization that is unrecorded, unmonitored, and uncontrolled — yet exerts real gravitational pull on business outcomes.

The term isn't arbitrary. Gartner coined "Dark Data" years ago to describe the information organizations collect but never analyze. Agent Dark Matter is the operational cousin — not data sitting idle in storage, but decisions being made in the dark by non-human actors. And unlike dark data, which is passive, agent dark matter is active. It's doing things. Right now.

Three symptoms you're probably already experiencing

Agent Dark Matter manifests in three distinct ways. If any of these feel familiar, you're not alone.

The first symptom is invisibility. Your agents make decisions, but those decisions aren't recorded as decisions. They show up as side effects — a merged PR, a resolved ticket, a reclassified data point — without any trace of the reasoning that led there.

Consider a code review agent integrated with your repository. It scans incoming pull requests, evaluates code quality, checks for security patterns, and posts an approval or rejection. The approval appears in your git history as a status check. But the reasoning — why it judged the code safe, what patterns it weighed, what edge cases it considered and dismissed — is gone the moment the LLM context window clears. You have the outcome. You have none of the logic.

The second symptom is unauditability. Even when some record exists, you can't reconstruct the chain of reasoning. An AI customer service agent escalates a ticket from Priority 3 to Priority 1. The escalation is logged. But when your VP of Customer Success asks "why was this escalated?" the honest answer is: nobody knows. The agent's internal reasoning — the weights it assigned to sentiment signals, the pattern it matched against historical escalations, the threshold it applied — is opaque. The best you can offer is "the AI decided."

In regulated industries, "the AI decided" is not an acceptable answer. In any industry, it shouldn't be.

The third symptom is ungovernability. There is no mechanism to define boundaries for agent behavior. You cannot say: "This agent may read from the production database but never write to it." You cannot say: "This agent requires human approval for any action affecting billing." You cannot say: "No agent may communicate with external APIs after business hours." Not because these rules are unreasonable, but because the infrastructure layer to express and enforce them doesn't exist.

Each of these symptoms — invisible, unauditable, ungovernable — compounds the others. An invisible decision can't be audited. An unauditable decision can't be governed. And an ungoverned agent will, by definition, produce more invisible decisions. It's a flywheel, and it's accelerating.

The Dark Matter flywheel:
Invisible decisions → can't be Audited → can't be Governed → produces more Invisible decisions → ♻️

The numbers behind the fog

This isn't a theoretical risk. The data is already in.

Gartner predicts that over 40% of agentic AI projects will be cancelled by the end of 2027, citing escalating costs, unclear business value, and — critically — inadequate risk controls. That last phrase is doing a lot of work. "Inadequate risk controls" is analyst language for "we deployed agents without the infrastructure to govern them, and now we can't prove they're doing what we want."

IDC, in a study sponsored by Microsoft, projects 1.3 billion AI agents will be in circulation by 2028. Not chatbots. Not autocomplete features. Autonomous or semi-autonomous agents making decisions and taking actions in enterprise environments. 1.3 billion of them. Microsoft has already shipped Agent 365 — a control plane for managing agents — because even they recognize that deploying agents without management infrastructure is untenable.

Forrester estimates that 75% of enterprises will fail to build their own agentic AI architecture. They'll try. They'll hire consultants. They'll run pilots. And they'll discover that the hard part isn't building the agent — it's building everything around the agent.

Meanwhile, Barclays estimates that global AI compute capacity can support 15 to 22 billion AI agents. The ceiling is high. The guardrails are low.

Put these numbers together and the picture is stark: agent count is exploding, governance infrastructure is nearly nonexistent, and the gap between the two is widening every quarter. This is not a future problem. This is a now problem with a future that gets exponentially worse.

Why observability isn't enough

At this point, a reasonable engineer might say: "We have observability tools. We have LangSmith. We have Datadog. We have tracing. Isn't this already solved?"

No. And the reason matters, because it's the difference between watching something happen and being able to control whether it should happen at all.

Observability is retrospective. Governance is prospective.

Observability answers: What happened? Governance answers: What should be allowed to happen?

This is not a subtle distinction. It is the same distinction that separates CloudWatch from IAM. One shows you server metrics. The other controls who can access what. No one would argue that having CloudWatch means you don't need IAM. Yet in the AI agent ecosystem, we are collectively building the telescope and ignoring the guardrails.

Observability vs. Governance

Observability Governance

Question What happened? What should be allowed?

Timing After the fact Before it happens

Objects Logs, traces, metrics Policies, gates, constraints

Analogy CloudWatch IAM

	Observability	Governance
Question	What happened?	What should be allowed?
Timing	After the fact	Before it happens
Objects	Logs, traces, metrics	Policies, gates, constraints
Analogy	CloudWatch	IAM

The problem goes deeper than timing. Existing observability tools — excellent tools, to be clear — record events and metrics. An API was called. Latency was 230ms. Token count was 4,200. Cost was $0.03.

But AI agents don't just produce events. They produce decisions. A decision is not the same as an event. An event is a factual record: "this happened." A decision is a reasoning artifact: "the agent considered these inputs, applied this logic, and concluded that this action was appropriate." Events are deterministic. Decisions, when made by LLMs, are inherently non-deterministic — the same inputs can produce different conclusions on different runs.

This distinction has consequences. If you want to audit an agent's behavior, tracing its API calls tells you what it did, not why it did it. If you want to constrain an agent's behavior, monitoring its outputs tells you when it went wrong, not how to prevent it from going wrong in the first place.

Look at the landscape honestly:

LangSmith and LangFuse trace LLM calls, measure latency, and track cost. They are good at this. They cannot tell you whether a given decision should have been permitted in the first place.

Datadog and New Relic monitor infrastructure metrics and application performance. They are excellent at this. They cannot audit the reasoning logic of an AI agent that decided to approve a financial transaction.

Arize and WhyLabs detect model drift and data quality issues. They are vital for ML operations. They cannot enforce a policy that says "this agent must request human approval before modifying production data."

These tools aren't failing. They're answering a different question. The question they answer is: "Is the system healthy?" The question nobody is answering is: "Should this agent be allowed to do what it's about to do?"

The gap in the current tool landscape:

✅ Agent frameworks (LangChain, CrewAI, AutoGen) → build agents
✅ Agent observability (LangSmith, LangFuse, Arize) → watch agents
❌ Agent governance → govern agents (decision recording, policy enforcement, audit trails, HITL gates)

That second question requires a different kind of infrastructure entirely. Not another dashboard. Not another trace viewer. An infrastructure layer that treats agent decisions — not just agent actions — as first-class objects that can be recorded, queried, constrained, and audited.

What's missing isn't another framework

Let me be specific about what I'm not arguing for.

I am not arguing for another AI agent framework. We have plenty. The world does not need framework number forty-seven for orchestrating LLM calls.

I am not arguing for another monitoring tool. The monitoring space is well-served.

What I am arguing for is a layer that doesn't exist yet: Trust Infrastructure — the infrastructure that makes AI agent activity visible, auditable, and governable.

Think about what Kubernetes did for containers. Before Kubernetes, containers worked. Docker was real. You could run containers in production. But you couldn't manage containers at scale — scheduling, health checks, scaling, networking, service discovery. Kubernetes didn't replace containers. It made containers manageable. It made containers something you could trust in production.

AI agents are at the same inflection point containers were at in 2014. The agents work. The frameworks exist. The models are capable. But the infrastructure to manage agents at scale — to know what they're doing, to prove what they did, to define what they're allowed to do — is missing.

This is the layer the industry needs to build. Not the agent itself, but everything around the agent. The runtime that records every decision. The policy engine that constrains behavior before it happens. The audit trail that proves compliance after the fact. The orchestration layer that coordinates multiple agents with human oversight at the right checkpoints.

Trust Infrastructure isn't about slowing agents down. It's about making it possible to speed them up responsibly. You can give an agent more autonomy when you can prove what it's doing. You can deploy agents to more sensitive workflows when you can enforce boundaries. You can scale agent fleets when you can govern them.

Without this layer, every new agent you deploy adds to your organization's dark matter. With it, you illuminate.

Look around you

I'll leave you with three questions. Answer them honestly.

First: How many AI agents are currently operating within your organization? Not the ones you personally deployed — all of them. Across every team. Every integration. Every developer who spun up a coding assistant with production access.

Second: Can you enumerate the decisions those agents made in the last 24 hours? Not their outputs. Their decisions — the moments where they evaluated options and chose a course of action.

Third: For each of those decisions, who approved it? Which human was in the loop? What policy governed what the agent was and wasn't allowed to do?

If you can't answer these questions — and almost nobody can — then Agent Dark Matter is already in your organization. It's not a theoretical risk. It's not a future concern. It's the current state of your AI stack, right now, today.

The first step is acknowledging the problem has a name. The second step is building the infrastructure to solve it.

We have a lot of building to do.

I'm building open-source Trust Infrastructure for AI agents at dearia.dev. If Agent Dark Matter resonates with your experience, I'd love to hear about it in the comments.

DEV Community: Ryo Hoshi