Saul Fernandez

Posted on Mar 19 • Edited on Mar 22

Agentic Platform Engineering: How to Build an Agent Infrastructure That Scales From Your Laptop to the Enterprise

#agents #devops #ai

Starting local is not thinking small. It's thinking strategically.

Stripe recently published a series on how they build "Minions" — fully autonomous coding agents that take a task and complete it end-to-end without human intervention. One agent reads the codebase, another writes the implementation, another runs the tests, another reviews the result. All in parallel. All coordinated. All producing production-ready code at scale.

Reading that, most engineers react one of two ways. Either "that's years away for us" — or they start thinking about what foundations would need to be in place to even begin moving in that direction.

This article is about the second reaction.

What Stripe describes is not a product you buy. It's a capability you build, incrementally, on top of solid infrastructure. And that infrastructure — the way agent configuration is stored, versioned, distributed, and composed — is what separates teams that can scale AI seriously from teams that are still copy-pasting prompts into chat windows.

I call this discipline Agentic Platform Engineering. And its first principle is simple: treat agent intelligence as infrastructure, not as improvisation.

The Problem at Any Scale

Whether you're a solo engineer with 10 repositories or a platform team with 200, you hit the same structural problem when you try to work seriously with AI agents.

At the individual level, it looks like this: you spend time configuring your agent well — crafting instructions, building reusable procedures, defining what it should and shouldn't do depending on where you're working. Then you switch machines, reinstall a tool, or try a different agent. Your configuration is gone. You start over.

At the team level, it's worse: every engineer has their own private, undocumented, non-transferable agent setup. There's no shared knowledge about how agents should behave in your codebase, no consistency in what they can and cannot do, no way to onboard a new member into an agentic workflow. The AI capability of your team cannot scale because it lives in individual heads and local files.

And at the enterprise level — the level where Stripe operates — you cannot even begin to think about autonomous agents running pipelines if you haven't solved the fundamental question: where does agent configuration live, who owns it, and how does it reach every context where it's needed?

Most engineers interact with AI agents in one of two ways:

Ad-hoc: No configuration, just prompting. Works for one-off tasks, but the agent has no memory of your stack, your conventions, or your constraints.
Single-file config: One big AGENTS.md or CLAUDE.md at the root of a repo. Better, but it doesn't scale — the same instructions get injected everywhere, regardless of whether they're relevant, and they live in one repo while you work across twenty.

Neither approach is infrastructure. Neither can scale. Neither gives you what you'd need to move toward autonomous multi-agent systems.

The question is: what would infrastructure actually look like?

The Architecture: Three Repos, One System

The solution I landed on separates concerns into three distinct repositories, each with a single, well-defined responsibility. This is not a personal productivity hack. It's a deliberate architectural pattern that mirrors how platform engineering works for any shared infrastructure — you version it, you document what exists, and you decouple the interface from the implementation.

agent-library/    ← The brain (tool-agnostic intelligence)
agent-setup/      ← The bridge (tool-specific deployment)
resource-catalog/ ← The map (inventory of everything)

Let me explain each one.

1. `agent-library` — The Brain

This is the single source of truth for everything the agent knows and how it behaves. It contains no tool-specific configuration. If tomorrow I switch from one AI coding tool to another, this repo stays untouched.

The structure:

agent-library/
├── library.yaml          ← Central manifest
├── SKILLS-INDEX.md       ← Human-readable index of all skills
├── layers/               ← Context-specific instructions
│   ├── global.md         ← Identity, principles, environment map
│   ├── repos.md          ← Shared git conventions
│   ├── work/             ← Work domain
│   │   ├── domain.md     ← Conservative rules, safety constraints
│   │   ├── terraform.md  ← Terraform-specific workflow
│   │   ├── gitops.md     ← GitOps/FluxCD rules
│   │   └── code.md       ← Code conventions
│   └── personal/         ← Personal domain
│       ├── domain.md     ← Experimental mode, fast iteration
│       ├── fintech-app.md
│       └── infra-gcp.md
├── skills/               ← Reusable procedures
│   ├── global/           ← Available everywhere
│   └── work/             ← Domain-specific
├── rules/                ← Always-on constraints
└── prompts/              ← Reusable prompt templates

The key concept is layers. Each layer is a Markdown file that becomes an AGENTS.md (or equivalent) for a specific directory. They are designed to be cumulative — the agent loads them from parent to child, each adding context on top of the previous one.

When the agent is working in ~/repos/work/terraform/, it loads:

~/.agent/AGENTS.md              → global.md        (who I am, core principles)
~/repos/AGENTS.md               → repos.md         (git conventions)
~/repos/work/AGENTS.md          → work/domain.md   (conservative, safety-first)
~/repos/work/terraform/AGENTS.md → work/terraform.md (terraform workflow)

Each layer is laser-focused. global.md doesn't know about Terraform. work/terraform.md doesn't know about React. The agent assembles its context from the bottom up, with exactly the information it needs for where it currently is.

The library.yaml manifest is the glue. It declares every layer, skill, rule, and prompt — what it is, where it lives in the repo, and where it should be deployed on the filesystem:

layers:
  - name: work-terraform
    description: "Terraform-specific rules for work domain"
    source: layers/work/terraform.md
    target: ~/repos/work/terraform/AGENTS.md
    scope: "~/repos/work/terraform/*"

skills:
  - name: terraform-plan
    description: "Terraform plan/apply workflow"
    source: skills/work/terraform-plan.md
    scope: "~/repos/work/terraform/*"

2. `agent-setup` — The Bridge

This repo is the adapter between the tool-agnostic brain and the specific AI agent tool I use today. If I switch tools next year, I replace only this repo. The brain stays the same.

Its core is a single setup.sh script that reads library.yaml and deploys everything via symlinks:

# Creates: ~/repos/work/terraform/AGENTS.md → agent-library/layers/work/terraform.md
# Creates: ~/.agent/skills/terraform-plan → agent-library/skills/work/terraform-plan.md
# ... and so on for every layer, skill, rule, and prompt

Why symlinks instead of copies?

Because when I edit a layer in agent-library, the change is immediately live everywhere. No re-deployment needed. setup.sh only needs to run again when I add a new file (a new symlink to create). For edits to existing files, the symlink already points to the right place.

The repo also contains tool-specific settings, keybindings, and extensions — things that only make sense for a specific tool.

3. `resource-catalog` — The Map

This is the index of everything that exists in my engineering ecosystem. It follows the Backstage catalog format — the same standard used in enterprise engineering platforms.

# components/agent-library.yaml
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: agent-library
  description: "Tool-agnostic agent configuration library"
  annotations:
    github.com/project-slug: your-username/agent-library
spec:
  type: ai-agent-config
  lifecycle: production
  owner: your-name
  system: personal-ai-agent-platform

Every repository I own — infrastructure, applications, documentation, and yes, the agent-library itself — is registered here with its type, owner, system, and source location.

The catalog is not where agent logic lives. It's a map, not an engine. The distinction matters: the catalog tells you that agent-library exists and what it is. The agent-library itself contains what the agent knows. Mixing these two concerns would be like embedding source code inside your package.json.

Skills: The Reusable Procedure Library

Beyond layers (which define how the agent behaves), the library contains skills — reusable, step-by-step procedures for common tasks.

A skill looks like this:

# Skill: Terraform Plan

Use this skill when working with Terraform in the work domain.

## Steps
1. Check context — confirm directory and workspace
2. terraform fmt -recursive
3. terraform validate
4. terraform plan -out=tfplan
5. Review plan — summarize what will change
6. Highlight risks — flag any destroys or critical changes
7. Wait for confirmation — never apply without explicit approval
...

## Red Flags (Stop and Ask)
- Any resource marked for destruction
- Changes to IAM policies
- Changes to production databases

Skills are invoked explicitly: /skill:terraform-plan. They're never loaded automatically — that's intentional. The agent doesn't pre-load every procedure it might need. It loads the skill when the task calls for it.

The Token Efficiency Design

This is where the architecture earns its keep.

A naive approach would be: put everything in one big AGENTS.md. All the rules, all the skills, all the context. The agent always knows everything.

The problem: that file becomes enormous. Every single message you send to the agent carries the full weight of that context as tokens. You're paying for Terraform rules when you're writing a Python script. You're loading GitOps procedures when you're working on documentation.

The architecture solves this at three levels:

Level 1: Layers are scoped by directory. The Terraform layer only activates when you're in ~/repos/work/terraform/. Not in your React app. Not in your docs.

Level 2: Each layer only declares what's relevant at its level. global.md lists 6 universal skills (debug, code-review, refactor, test, documentation, git-workflow). It does not list terraform-plan or catalog-management — those are irrelevant in most contexts. work/terraform.md lists terraform-plan and nothing else, because that's the only skill you need there.

Level 3: Meta-skills are scoped to their home. create-skill (the skill that creates new skills) is only available inside agent-library/. catalog-management is only available inside resource-catalog/. Why would the agent know how to modify the agent library while it's working on your fintech app?

The result: when the agent is in work/terraform/, its active context is exactly:

6 global skills
1 domain skill (infrastructure-review)
1 directory skill (terraform-plan)

That's it. No noise.

Disaster Recovery in Under 5 Minutes

The entire system is built for one guarantee: if everything breaks, you can rebuild from scratch in under 5 minutes.

# Step 1: Clone the three repos
mkdir -p ~/repos && cd ~/repos
git clone git@github.com:your-username/agent-library.git
git clone git@github.com:your-username/agent-setup.git
git clone git@github.com:your-username/resource-catalog.git

# Step 2: Deploy
cd agent-setup && bash setup.sh

# Step 3: Verify
cd ~/repos/work/terraform && your-agent "What context am I in?"
# → Agent responds with terraform-specific context ✓

Done. The agent has its full identity, all its domain knowledge, all its skills, and all the right rules for every directory it works in.

This is only possible because:

Everything is in git. No local-only configuration that can be lost.
The brain is separate from the tool. Reinstalling the tool doesn't lose the intelligence.
The manifest declares everything. library.yaml is the complete description of the system — setup.sh just executes it.

The Mental Model: A Package Manager for Agent Intelligence

Think of it like a software package manager, but for how agents think and behave.

library.yaml is your package.json — it declares everything that should exist and what it does. setup.sh is your npm install — it takes the manifest and wires everything up on any machine. The layers are your source modules — composable, scoped, loaded on demand. The skills are your function library — procedures you invoke when you need them, not before.

The difference from a traditional package manager: the "packages" here are not code. They're instructions for how to reason in a given context.

This matters beyond the individual level. If you're a platform team and you want every engineer to work with agents consistently — the same safety rules around production, the same conventions for code review, the same escalation procedures — you publish to agent-library. Engineers run setup.sh. Done. The intelligence is distributed, versioned, and auditable. Just like any other shared infrastructure.

This is the foundation Stripe's approach requires. Before you can run autonomous agents in parallel on real codebases, you need to have solved: where do agents get their instructions? Who updates them? How do changes propagate? How do you ensure an agent working on your payment service doesn't behave like an agent working on an internal tool?

The architecture described in this article is an answer to those questions — starting from a single developer setup, but designed to scale.

What This Looks Like Day-to-Day

When I open a terminal in ~/repos/work/terraform/:
The agent already knows it's in conservative mode, that any terraform apply needs a reviewed plan and explicit confirmation, that pre-commit hooks must run before any commit, and exactly which skill to use for the full workflow.

When I open a terminal in ~/repos/personal/fintech-app/:
The agent knows it can move fast, that this is a Python financial analysis platform, that API keys live in environment variables and never in code, and that tests run before committing.

When I want to add a new skill:
I run /skill:create-skill. The agent walks me through creating the file, registering it in library.yaml with the right scope, updating SKILLS-INDEX.md, and committing. The skill is live the moment it's committed — no redeployment needed (symlinks).

When a colleague joins my team or I get a new machine:
Three git clones and one bash script. Same agent, same behavior, same context everywhere.

The Design Decisions That Matter

Why not one big repo? Separation of concerns. The brain shouldn't depend on the tool. The catalog shouldn't contain executable logic. Mix them and you create coupling that makes the whole system fragile.

Why Backstage format for the catalog? It's an industry standard built exactly for this — describing what exists, who owns it, and how it relates to other things. It's human-readable, tool-agnostic, and designed to scale.

Why symlinks instead of copies? Real-time updates without redeployment. Edit terraform.md in the library, it's immediately live in ~/repos/work/terraform/. No sync step, no drift between source and deployed config.

Why scope skills to directories instead of loading all of them? Token efficiency and cognitive clarity. An agent with 30 loaded skills is an agent that has to decide which one applies. An agent with 2 loaded skills knows exactly what to use.

What I Haven't Built Yet

This is an honest article, so here's what's still on the roadmap — and what brings the architecture closer to the Stripe model:

Extensions (next): Custom tools for things like catalog lookup, repo navigation, and library sync directly from the terminal
MCP integration: Model Context Protocol servers for deeper, structured tool integrations — giving agents access to live data sources, not just static instructions
Multi-agent orchestration: The ability to spawn specialized agents in parallel for complex tasks — one reads the codebase, another implements, another validates. This is the direction Stripe's Minions move in, and this architecture is specifically designed so the layer system can feed each specialized agent exactly the context it needs, nothing more
Centralized distribution: Moving from local symlinks to a pull-based model where any machine or CI environment can fetch the latest agent configuration from agent-library automatically

Each of these is a step up the autonomy ladder. But none of them are possible without the foundation: versioned, scoped, composable agent configuration that you control.

Conclusion

Stripe's Minions are impressive. But they're not magic — they're the result of building the right infrastructure first.

The architecture described here — three repos, clear separation of concerns, a manifest-driven deployment, and scoped context loading — is that infrastructure, starting at the smallest possible scale. One developer, one machine, three git repositories.

The local setup is not the destination. It's the proof of concept for a pattern that scales: agent intelligence is configuration, configuration belongs in git, and git belongs in a system where it can be versioned, distributed, and composed.

Start local. Think at scale. Build the foundation that makes the next step possible.

That's Agentic Platform Engineering.

UPDATE - 1
The Cross-Org Agent Discovery Problem

Since publishing this article, a great discussion sparked by @globalchatads in the comments regarding a critical question: This local Monorepo/Symlink architecture is great for a single developer, but how does it actually scale to a multi-team Enterprise environment?

If Team A (Security) has an agent that needs to run an audit tool owned by Team B (Networking), how does Team A's agent discover that tool? Naively, we could give the agent a Git Personal Access Token (PAT) to read Team B’s repository. However, in a zero-trust enterprise, sharing Git tokens across domains creates a massive security overhead and tight coupling.

The Solution: Cross Organization Discovery for Agents

Instead of relying on shared filesystems or direct repository access, we need a network-routable Service Registry. Taking inspiration from standard web protocols, I've integrated the RFC 8615 (.well-known/) directory pattern into this architecture.

Here is how a distributed (Polyrepo) setup works in practice:

GitOps as the Source of Truth: Team B maintains their local agent-library in their own repository.
The Build Step: When changes are merged, a CI/CD pipeline parses their internal library.yaml, extracts the tools meant for public consumption (e.g., MCP server endpoints), and compiles a standardized agent-capabilities.json.
The Deployment: This JSON is published to an internal, highly available endpoint.
Runtime Discovery: When Team A's agent needs to interact with the Networking domain, it simply queries https://api.networking.internal/.well-known/agent-capabilities.json` to understand what tools are available and what OAuth scopes are required.

To demonstrate this, I have updated the Reference Architecture Repository with three key additions:

📄 The Discovery Protocol Docs: A deep dive into the JSON schema and discovery mechanics.
⚙️ Mock CI/CD Pipeline: An example GitHub Action showing how a team compiles and publishes their capabilities.
🛠️ Domain Discovery Skill: A base tool that allows your local agent to query remote domains and learn their capabilities on the fly.

Big thanks to the community for pushing this concept further. The evolution from local scripts to standards protocols is exactly what will define the next generation of Agentic Platform Engineering!

Top comments (13)

Global Chat • Mar 21

The three-repo separation is clean and the Backstage catalog integration is a smart choice. What strikes me is that your resource-catalog is essentially solving agent discovery at the individual/team scale -- it answers the question "what exists and where does it live" for your agents.

The challenge I keep thinking about is what happens when this needs to work across organizational boundaries. Your library.yaml manifest is a single source of truth for your agent ecosystem, but when agents from different orgs need to discover each other's capabilities (which MCP servers are available, what skills a remote agent exposes, what protocols it speaks), there is no equivalent of library.yaml at the network level.

The symlink-based deployment is elegant for the single-developer case, but it also highlights the gap: symlinks only work when you control both ends. For cross-org agent interop, you would need something more like a DNS-style discovery mechanism where agents can resolve capabilities by querying a well-known endpoint rather than relying on pre-configured paths.

The token efficiency design is probably the most underrated part of this architecture. Scoping skills by directory is such a simple idea but the savings compound fast. Have you measured the actual token reduction versus a flat AGENTS.md approach? I would be curious to see the numbers.

Saul Fernandez • Mar 22 • Edited

Thank you so much for this insightful feedback! I really like your concerns and I can tell you that I've been thinking about it since I read it. Thanks for sharing it! It gave me a real thinking boost XD.

Inspired by your comment, I've been giving it some thought and you are completely right. So I propose:

In the library.yaml, allowing a dual system where:

Local/Single-Dev: Keep symlinks for zero-latency local development (not everyone is working in big companies and perhaps they just want to play around).
Distributed/Cross-Org: Prioritize Endpoints over Symlinks, leveraging standard web discovery protocols.

Even so, we need a strategy to expose this library.yaml to the internet to be consumed by other agents. So, I also propose adopting the .well-known/ directory pattern.

Instead of an agent needing pre-configured knowledge of another organization's tools or relying on reading a remote library.yaml (which statically couples them to our Git structure and creates security/token overhead), we treat library.yaml solely as our GitOps Single Source of Truth.

Our CI/CD pipelines will parse this YAML, extract the publicly available MCP endpoints, and 'compile' an agent-capabilities.json file. This is then published to a standard endpoint like https:// api.yourcompany.com/.well-known/agent-capabilities.json (either via a static GCP Bucket with a CDN, Kubernetes Ingress, etc.).

When an external agent needs to interact with our infrastructure, it simply queries that public 'reception desk' endpoint. It discovers dynamically where the MCP servers live, what skills are exposed (e.g., Terraform runner), and what authentication is required (OAuth/mTLS).

What do you think?

Regarding token efficiency—you are absolutely right, it's the hidden superpower of the hierarchical design. While I haven't measured the exact token reduction yet, moving from a monolithic flat AGENTS.md (which would inject 10k-20k tokens mixing Terraform, React, personal, and work rules into every prompt) to a scoped directory approach keeps the context hyper-focused (around 2k-4k tokens). It saves cost, reduces latency (TTFT), and significantly mitigates the 'lost in the middle' phenomenon for LLMs. I'll definitely run some metrics on this for a follow-up post!

Thanks again for sparking this evolution in the design! All this transition is definitely blowing my mind right now and I think it is the next frontier for Agent Platform Engineering.

PD: In the following days I will publish a repo with this methodology and structure to iterate and work with it. Actually, the results I am having are amazing so I hope this could be a help for someone else and with some luck, to recieve external ideas and contribuitions like yours ;). Again, thanks.

Shaya K. • Mar 22

Made an account just to ask when you're publishing that repo. I am sure it would help a lot of people out. This is unreal, I'm not a coder but I am able to understand.

Saul Fernandez • Mar 22

You have the repo already published at the end of the post ;)

Shaya K. • Mar 22

If I have an OpenClaw setup on a VPS With missioncontrol and I'm setting up a whole kuktiagent workflow with APIs etc (ideally model-router) and will have a dev aspect. This will be good for my setup? For any agentic setup?

Saul Fernandez • Mar 22

Of course, I am using pi-coding-agent and OpenClawd and both are working perfectly fine, although I use both for different reasons. In fact, the proyect started with OpenClawd and all this design was first conceived to make OpenClawd be able to work in all my codebase.

René Zander • Mar 24

Good framing on the laptop-to-enterprise path. One thing I'd add: the gap between 'works on my laptop' and 'runs in production' for agents is mostly about state management and failure recovery, not compute. I run agent workflows on a single VPS with systemd services and structured task queues. The infra complexity only needs to scale when the concurrency demands actually justify it.

Daniel Novitzkas • Mar 21

Hey Saul, this is great. My new venture is about to come out of Stealth mode which is an agentic orchestration layer/adapter for Enterprise. My architecture covers everything in your article plus your non built section and more.
My name is Daniel Novitzkas and my startup is Astrohive (dot) ai
Come find me!

Saul Fernandez • Mar 22 • Edited

It looks amazing! Share or contact if you want user testing and feedback ;)

idea HAZE • Mar 26

Really appreciate you taking the time to break that down. The behavior/knowledge split you're describing is exactly the right mental model, and the GitOps determinism argument is one I hadn't seen articulated that cleanly before.

I did end up wrapping the DB-backed system into an MCP server, and it's working well... but building it surfaced something your roadmap might want to account for early: GitOps solves auditability of rules, but it doesn't solve auditability of retrieval. You can version-control what the agent is supposed to do, but the specific vector or graph result it actually got at runtime is still opaque. When something goes wrong, you can roll back the skill file, but you can't easily reconstruct why the agent pulled the context it did.

Still working through the right answer to that honestly. Logging the retrieval payloads alongside task execution helps, but it adds complexity fast.

The hybrid model is definitely the sweet spot though in my experience so far. Curious whether you're thinking about that retrieval auditability problem or approaching it differently.

idea HAZE • Mar 24

Really clean separation of concerns here... the brain/bridge/catalog distinction resolves something most people just ignore until it bites them.
One genuine question: how do you think about context overload as the agent-library grows? The directory scoping is elegant for coding contexts but I'm curious whether you've hit ceilings on the flat file approach as skill count scales.
I ended up going a different direction for similar reasons: a database-backed custom RAG implementation where each entry carries a summary, topic tree, relationships and metadata rather than full content. Sidesteps the chunking loss problem and lets agents pull exactly what's relevant per task rather than loading by directory proximity.
Curious whether you've considered a hybrid or whether the file approach has held up better than I'd expect.

Saul Fernandez • Mar 26 • Edited

This is a fantastic question, and context overload is definitely the final boss of agentic systems and its part of my focus when thinking in agentic platform engineering design.

To answer your question: the flat-file approach has held up surprisingly well, and hasn't hit a ceiling yet. The main reason is lazy loading and the fact that directory scoping acts as a highly accurate proxy for "task context".

When the agent is in a directory, the AGENTS.md layer doesn't inject the entire content of all available skills into the context window. It only injects the index (the skill names and a one-line
description). The agent only reads the actual step-by-step execution file when the skill is explicitly invoked. Because of the directory scoping, a single layer rarely exposes more than 5-10 highly relevant skills anyway.

However, the reason I actively avoid a database for agent instructions comes down to core Platform Engineering principles: GitOps and Determinism.

Agent behavior, safety constraints, and standard operating procedures are essentially Infrastructure as Code. If a team's agent misbehaves or executes a destructive action, I need to be able to look at a Git commit history, review a PR, and perform a deterministic rollback. DB-backed RAG systems are fantastic for dynamic retrieval, but they lose that strict version control and peer-reviewability. It's hard to PR a vector database update.

That said, I completely agree with your approach for a different tier of context, and I think the
hybrid model is the ultimate sweet spot.

In my roadmap, this hybrid model is achieved via MCP (Model Context Protocol):

Behavior & Rules (How to act): Stays in Git-backed flat files (Layers & Skills) to guarantee determinism and auditability.
Domain Knowledge (What to know): Handled by DB-backed RAG/Graphs (like your setup), exposed to the agent via MCP servers.

So instead of the agent trying to load a massive API spec or relationship tree from a file, the
flat-file skill simply instructs: "Use your internal knowledge MCP tool to query the database for relationships and summaries before writing the code."

By the way, and talking about MCP, have you considered wrapping your DB-backed system into an MCP server? It feels like it would be the perfect bridge between deterministic agent instructions and dynamic knowledge retrieval.

Thanks for your comment! it gave me a lot to think about :)