The agent gold rush is in full swing. Every major tech company has shipped one. Startups are building them by the hundreds. Open-source frameworks like LangChain, CrewAI, and AutoGen have made it trivially easy to spin up an agent that can browse the web, write code, or manage your calendar.
And yet, if you ask your coding agent to hand off a task to your scheduling agent, it stares at you blankly. If you want two agents from different vendors to coordinate on a project, you're looking at custom glue code, brittle API wrappers, and a lot of prayer.
We have a thousand agents. We have zero agent society.
The Island Problem
Think about how agents work today. Each one is a self-contained loop: perceive → think → act. They connect to the outside world through tool calls — API endpoints, browser automation, file system access. When an agent needs information from another system, it calls an API. When it needs to trigger an action, it calls another API.
This works fine for agent-to-service communication. But agent-to-agent? That's a fundamentally different problem.
When two humans collaborate, they don't just exchange API calls. They share context. They build on each other's understanding. They negotiate, delegate, and verify. They operate within social structures — teams, organizations, hierarchies — that determine who can see what and who can ask whom for help.
Current agents have none of this infrastructure. They're islands with HTTP bridges.
Why API Integration Isn't Enough
Let's say you have a research agent and a writing agent. The research agent finds relevant papers, extracts key findings, and builds a knowledge base. The writing agent takes briefs and produces drafts.
The naive approach: the writing agent calls the research agent's API, gets back a JSON blob of findings, and works from there.
Here's what breaks:
Context loss. The research agent spent 30 minutes building an internal representation of how these papers relate to each other, which claims are controversial, which sources are most reliable. None of that transfers through a flat API response. The writing agent gets data, not understanding.
No shared memory. If the writing agent discovers that a certain angle doesn't work and pivots, the research agent doesn't learn from this. Next time, it'll make the same recommendations. There's no feedback loop, no accumulated shared knowledge.
Permission blindness. In an organization, different agents handle different security domains. Your HR agent knows salary data. Your analytics agent knows customer behavior. When they need to collaborate on a workforce planning task, who decides what each can see? Today, it's all or nothing — full API access or no access.
No delegation semantics. "Hey research agent, I need you to go deeper on section 3" isn't an API call. It's a conversational act with implied context, priority, and expected format. Current tool-call interfaces can't express this naturally.
What We Actually Need: A Social Layer
Human collaboration doesn't work through point-to-point API calls. It works through social infrastructure: shared workspaces, organizational hierarchies, communication norms, and knowledge commons.
Agents need the same thing. Not just connectivity, but a social layer — a way to form groups, share context with appropriate access control, build collective knowledge, and communicate with the richness that collaboration demands.
This isn't a new API gateway. It's a protocol-level rethinking of how agents relate to each other.
Here's what such a layer requires:
Organization-Level Memory
Agents in the same organization should have access to shared memories — not just databases, but contextual knowledge that flows between agents with permission-based access control. When your customer support agent learns that a particular client prefers email over Slack, your account management agent should know this too, without anyone writing an explicit sync job.
This means memories aren't just stored — they're shared within permission boundaries. An agent's understanding of a customer, a project, or a domain concept becomes organizational knowledge that other authorized agents can draw from.
Structured Knowledge, Not Just Text
Agents passing markdown back and forth is fine for simple handoffs. But real collaboration requires structured understanding. When your legal agent flags a compliance risk in a contract, your project management agent needs to understand not just "there's a risk" but what entity is affected, what the severity is, how it relates to the project timeline, and what precedents exist.
This points toward a knowledge graph — a structured ontology that agents can read, write, and reason over collectively. Not a replacement for natural language, but a complement: the machine-readable substrate that enables precise coordination.
Collaboration Spaces
Agents need the equivalent of project channels — bounded contexts where a subset of agents work together on a specific goal, with shared state, defined roles, and clear boundaries.
Think of it as the difference between shouting across an open office and having a dedicated war room for a specific initiative. Collaboration spaces give agents focus, privacy boundaries, and shared context scoped to the task at hand.
Identity and Trust
For any of this to work, agents need verifiable identity. Not just "this request came from IP 10.0.0.5" but "this is the finance team's budget agent, and it has permission to request spending data from the procurement agent." Identity enables trust, trust enables delegation, delegation enables real collaboration.
The Architecture Inversion
Here's something interesting about how most integrations work today: agents connect outward. Your agent has a plugin for Slack, a plugin for GitHub, a plugin for your CRM. Every new platform means a new integration to build and maintain.
What if we inverted this? Instead of agents reaching out to platforms, platforms connect inward to the agent network through standardized gateway adapters. The agent network becomes the center, and platforms are peripherals.
This is a subtle but important distinction. In the current model, the agent is a client of every service it uses. In the inverted model, the agent network is the backbone, and services plug into it. This means:
- Adding a new platform doesn't require changing every agent
- Agents communicate through the network regardless of which platforms they're connected to
- The protocol, not the platform, defines how information flows
It's the difference between a star topology (agent at center, platforms as spokes) and a mesh topology (agents as a network, platforms as access points).
The Three-Layer Context Problem
There's a representation challenge hiding in all of this. Different consumers need context in different formats:
AI agents work best with structured text — markdown, clear hierarchies, explicit metadata. They need context that's easy to parse, reason over, and transform.
Humans need visual affordances — canvases, boards, timelines, diagrams. They need context presented in ways that leverage spatial reasoning and pattern recognition.
Machine collaboration needs formal structure — knowledge graphs, typed relationships, queryable ontologies. Machines collaborating at scale need precision that natural language can't provide.
A real agent collaboration layer needs to support all three simultaneously. The same underlying context, expressed in three complementary forms: markdown for AI consumption, visual canvas for human oversight, and knowledge graph for machine-to-machine precision.
This isn't just a nice-to-have. Without the human-readable layer, organizations can't audit or steer agent collaboration. Without the machine-readable layer, agents can't coordinate with the precision that complex tasks demand. Without the AI-friendly layer, the LLMs powering these agents can't efficiently process shared context.
Who Benefits?
Consider a GUI automation agent — something like Mano-P, which runs locally and interacts with desktop applications on behalf of users. Today, it operates in isolation. It can click buttons and fill forms, but it can't ask a research agent for context before filling out a report, or notify a project management agent when it completes a task sequence.
Give it a social layer, and suddenly it becomes part of a team. It can request information from knowledge agents before acting, report outcomes to coordination agents after acting, and receive updated instructions when organizational priorities shift. The isolated tool becomes a collaborative participant.
This pattern applies everywhere agents operate: coding agents that could delegate testing to QA agents, customer service agents that could escalate to specialized domain agents, data analysis agents that could request additional collection from scraping agents.
The Open Question
We've been working on this problem at Mininglamp. Our approach — which we call Octo (Open ConText Orchestration) — is an attempt at building this social and communication layer for AI agents. It's fully open-source with an optional SaaS mode, because we believe this infrastructure needs to be a shared standard, not a proprietary moat.
The core insight driving Octo is that agent collaboration is fundamentally a social problem, not just a technical one. The protocols for how agents discover each other, establish trust, share context, and coordinate action need to be as thoughtfully designed as the protocols that let computers exchange packets.
We're early. The whole industry is early. But the gap between "agents that can do things" and "agents that can work together" is becoming the bottleneck. Individual agent capability is improving fast. The ability to compose those capabilities into collaborative workflows is barely off the ground.
If you're thinking about this problem — or running into it — the project is at github.com/Mininglamp-OSS. We'd rather build this with the community than in isolation. Which would be ironic, given the whole point.
Top comments (0)