What Nobody Tells You About Building a Protocol for AI Agents

#opensource #ai #protocol #buildinpublic

For the past few months, I've been building ARSIA Protocol as a part-time open source project, an open compliance layer for AI agents, designed to sit above MCP (Anthropic) and A2A (Google).

The first ideas came in July 2025. Back then it was just a question: what if there was a compliance layer that sat above agent communication protocols? For months, that's all it was, an idea slowly taking shape. I'd study the regulatory landscape, read the EU AI Act, sketch mental models, scribble notes. Weekend conversations about what the architecture could look like. No code, no repo, no rush. The idea needed time to mature, and I let it.

By November I had rough concept drafts. By January the mental model was solid enough to start testing assumptions on paper. But it wasn't until March 2026, with the architecture firmly settled in my head, that I sat down and wrote the first real draft of the specification. That incubation period of seven months of thinking before building turned out to be one of the best decisions I made.

Two people. Six specs. Two SDKs. A CLI. A server. 900+ tests.

Here's what was actually hard.

1. You'll rewrite the architecture at least twice

We started with the obvious model: the agent implements the protocol. Discovery endpoints, EdDSA signing, compliance fields, audit trail, all inside the agent code.

It took us weeks to realize this was backwards. A developer with a working LangGraph or CrewAI agent doesn't want to rewrite it. Nobody does. The MCP succeeded not because of its protocol. It succeeded because it hides it.

So we pivoted. Hard. We built a sidecar proxy (ARSIA Client) and a gateway (ARSIA Server). The agent never touches the protocol. The developer changes one environment variable. The organization deploys a server. Done.

That pivot invalidated dozens of design decisions, hundreds of tests, and an entire CLI workflow. We rebuilt it anyway.

2. Specs are never "done"

We started with 8 specification documents. Then we realized two of them (Compliance and Onboarding) didn't deserve to be standalone, their content belonged inside the other five. So we merged them. That meant rewriting every cross-reference across six documents, verifying 34 files, and hunting for stale links that pointed to specs that no longer existed.

A protocol spec isn't code. You can't run a linter on normative language. When section 4.3.6 of Core references section 7 of State, and you renumber State's sections during a merge, nothing breaks, until an implementer reads it six months later and builds the wrong thing.

We found 8 real gaps in our own spec after publishing Draft-01. Token refresh unspecified. Audit trail queries with no cross-agent authentication. GDPR data portability with no defined export format. WebSocket messages with no audit mapping. Each one small enough to ship without, dangerous enough to bite someone in production.

3. Naming will haunt you

We named our Python import namespace arsia.*. Simple, clean. Then we discovered another company had "arsia" registered at the EU trademark office. Class 9. Software.

So we migrated everything. arsia.* became arsiaprotocol.*. The CLI went from arsia to arsiactl. Agent IDs, payload types, capability strings, PyPI package names, npm scopes, all of it. Over 100 files touched across three repositories. One migration, zero tolerance for leftover references.

The lesson: pick your canonical names on day one, search every trademark registry, and never use a short unqualified namespace.

4. Part-time open source, full-time complexity

There is no QA team. There is no DevRel team. There is no product manager. There are two people building this in their free time writing specs, building SDKs in Python and TypeScript, creating Docker demos, writing a CLI with four namespaces, building a server with a six-guard enforcement pipeline, and preparing a public comment for NIST.

My routine looks like this: I have two young kids. I put them to bed, and then around 10 PM I sit down and work until 2 or 3 AM at least a few days a week. On weekends I sometimes go through the entire night. Not because I have to. Because I get so absorbed that I lose track of time. The protocol pulls you in. There's always one more cross-reference to verify, one more edge case to handle, one more test to write.

During the day, I'm fully dedicated to my job. But I leave AI agents running in the background executing tasks, running test suites, checking automations. At lunch I glance at the results. Then I don't look again until the night session. The automated parts (tests, conformance checks, CI) run fine on their own. But the hard parts can't be delegated: architecture decisions, spec writing, prototyping new primitives. Those require deep, uninterrupted thought, and that's what the late nights are for.

5. How AI became my force multiplier

I have to be honest about this: building a protocol of this scope as a part-time project would have been nearly impossible without AI, specifically Claude. It fundamentally changed my prototyping speed.

When I have an architecture idea at midnight, I can prototype it in conversation, exploring trade-offs, stress-testing edge cases, generating test scaffolds, drafting spec language in a fraction of the time it would take solo. Claude doesn't replace the thinking. The seven months of incubation, the mental modeling, the architectural decisions are mine. But once I know what I want to build, AI accelerates the how dramatically. It's the difference between spending a weekend on a proof of concept and spending an hour.

The spec reorganization is a good example. Merging two specs into the existing five, updating every cross-reference across six documents, verifying consistency, that's tedious, error-prone work that can take days. With AI assistance, it took hours. The creative decisions were still mine. The execution was radically faster.

For solo builders and small teams working on ambitious open source projects in their free time, AI isn't a luxury. It's what makes the project viable at all.

6. Nobody knows they need you yet

This is the hardest one. We built a protocol that solves EU AI Act compliance, GDPR audit trails, MiFID II retention rules, and human oversight signaling at the protocol level, not the application level. It's the kind of thing that European enterprises will be legally required to solve in the next 12 to 18 months.

But today, most AI agent developers don't know they have a compliance problem. They're building cool things with tool-calling and multi-agent workflows. Regulatory enforcement feels distant. Until it isn't.

So you're building infrastructure for a problem that's real but not yet felt. You're competing against "just ship it" culture with a product that says "ship it, but with an audit trail." That's a tough sell, until the first fine lands.

What kept us going

Honestly? Those 2 AM moments when arsiactl scaffold agent spins up a working AI agent with a local LLM, makes a real GitHub API call through the compliance layer, and produces a signed, auditable envelope, all without the developer writing a single line of protocol code. That moment when the protocol becomes invisible is when you know the architecture is right. And it makes the next late-night session feel worth every lost hour of sleep.

ARSIA Protocol is open source (Apache 2.0). The specification (Draft-01) is available now. The SDKs for Python and TypeScript will be released in the coming days, and the ARSIA Client (sidecar proxy) and ARSIA Server (organizational gateway), the pieces that make compliance invisible to the developer, are coming soon after. If you're building AI agents for regulated markets, we'd love your feedback.

I hope you like it: arsiaprotocol.org | GitHub

Kirk Ferreira, Creator of ARSIA Protocol