Discussion on: Agentic Platform Engineering: How to Build an Agent Infrastructure That Scales From Your Laptop to the Enterprise

View post

The three-repo separation is clean and the Backstage catalog integration is a smart choice. What strikes me is that your resource-catalog is essentially solving agent discovery at the individual/team scale -- it answers the question "what exists and where does it live" for your agents.

The challenge I keep thinking about is what happens when this needs to work across organizational boundaries. Your library.yaml manifest is a single source of truth for your agent ecosystem, but when agents from different orgs need to discover each other's capabilities (which MCP servers are available, what skills a remote agent exposes, what protocols it speaks), there is no equivalent of library.yaml at the network level.

The symlink-based deployment is elegant for the single-developer case, but it also highlights the gap: symlinks only work when you control both ends. For cross-org agent interop, you would need something more like a DNS-style discovery mechanism where agents can resolve capabilities by querying a well-known endpoint rather than relying on pre-configured paths.

The token efficiency design is probably the most underrated part of this architecture. Scoping skills by directory is such a simple idea but the savings compound fast. Have you measured the actual token reduction versus a flat AGENTS.md approach? I would be curious to see the numbers.

Saul Fernandez • Mar 22 • Edited

Thank you so much for this insightful feedback! I really like your concerns and I can tell you that I've been thinking about it since I read it. Thanks for sharing it! It gave me a real thinking boost XD.

Inspired by your comment, I've been giving it some thought and you are completely right. So I propose:

In the library.yaml, allowing a dual system where:

Local/Single-Dev: Keep symlinks for zero-latency local development (not everyone is working in big companies and perhaps they just want to play around).
Distributed/Cross-Org: Prioritize Endpoints over Symlinks, leveraging standard web discovery protocols.

Even so, we need a strategy to expose this library.yaml to the internet to be consumed by other agents. So, I also propose adopting the .well-known/ directory pattern.

Instead of an agent needing pre-configured knowledge of another organization's tools or relying on reading a remote library.yaml (which statically couples them to our Git structure and creates security/token overhead), we treat library.yaml solely as our GitOps Single Source of Truth.

Our CI/CD pipelines will parse this YAML, extract the publicly available MCP endpoints, and 'compile' an agent-capabilities.json file. This is then published to a standard endpoint like https:// api.yourcompany.com/.well-known/agent-capabilities.json (either via a static GCP Bucket with a CDN, Kubernetes Ingress, etc.).

When an external agent needs to interact with our infrastructure, it simply queries that public 'reception desk' endpoint. It discovers dynamically where the MCP servers live, what skills are exposed (e.g., Terraform runner), and what authentication is required (OAuth/mTLS).

What do you think?

Regarding token efficiency—you are absolutely right, it's the hidden superpower of the hierarchical design. While I haven't measured the exact token reduction yet, moving from a monolithic flat AGENTS.md (which would inject 10k-20k tokens mixing Terraform, React, personal, and work rules into every prompt) to a scoped directory approach keeps the context hyper-focused (around 2k-4k tokens). It saves cost, reduces latency (TTFT), and significantly mitigates the 'lost in the middle' phenomenon for LLMs. I'll definitely run some metrics on this for a follow-up post!

Thanks again for sparking this evolution in the design! All this transition is definitely blowing my mind right now and I think it is the next frontier for Agent Platform Engineering.

PD: In the following days I will publish a repo with this methodology and structure to iterate and work with it. Actually, the results I am having are amazing so I hope this could be a help for someone else and with some luck, to recieve external ideas and contribuitions like yours ;). Again, thanks.

Shaya K. • Mar 22

Made an account just to ask when you're publishing that repo. I am sure it would help a lot of people out. This is unreal, I'm not a coder but I am able to understand.

Saul Fernandez • Mar 22

You have the repo already published at the end of the post ;)

Shaya K. • Mar 22

If I have an OpenClaw setup on a VPS With missioncontrol and I'm setting up a whole kuktiagent workflow with APIs etc (ideally model-router) and will have a dev aspect. This will be good for my setup? For any agentic setup?

Saul Fernandez • Mar 22

Of course, I am using pi-coding-agent and OpenClawd and both are working perfectly fine, although I use both for different reasons. In fact, the proyect started with OpenClawd and all this design was first conceived to make OpenClawd be able to work in all my codebase.