Stop Prompting Your Agents. Start Leading Them.

#ai #agents #llm #leadership

If you want a leg up in applied AI right now, you should be studying the Marine Corps over Anthropic's documentation. The hardest problem in building agents has already been solved, by people who staked everything on getting it right. We just have not been looking in the right place.

I have spent years building production AI agents for real companies, shoulder to shoulder with some of the best people in applied AI. Soon we will not just be building agents but organizations of them. And the pattern I keep seeing is capable models falling short in production. Not because they were not intelligent enough. Because nobody gave them a mission. We hand agents tasks. We forget to give them intent.

The best teams give intent, then get out of the way

When an agent disappoints us, the reflex is engineering. Rewrite the prompt. Swap the model. Add a guardrail. That work matters. But the harder problem is what comes after: building an organization of agents, each with a role and the authority to decide in its lane without bottlenecking through you. That is not a technical problem. It is an organizational one, and the framework already exists.

General Stanley McChrystal documented in Team of Teams that when authority moves to whoever is closest to the problem, decision quality goes up, not down. Pushing decisions down is not a compromise you make to save time. It produces the better answer. The military has spent centuries learning this, and the companies already winning at AI are running on the same principle, whether they know it or not.

I learned to see it through my clients before I applied it to agents. A while back I was building an autonomous agent for a Fortune 500 metals manufacturer, live in their production today. Work across enough companies and you see fast which ones are ready for what is coming. This one was, and it had nothing to do with technology. Direction came down as intent, not instructions, and the people giving it backed off. Managers led instead of managed. The company solved genuinely hard problems at a pace most organizations cannot match.

The ones that struggle look nothing like that. Layers of management, a process for everything, permission required at every turn. In a space as new as AI, where the right answer is rarely written down anywhere, that structure slows you to a crawl.

I could see the pattern. I did not yet understand it, or see how it meant anything to how I build agents. In the middle of that project, I flew out to see a buddy who is an officer in the Marine Corps. We got to talking about leadership, a problem I was facing in my own career, not with agents, with people. How do you lead a team instead of managing it. How do you keep everyone aligned when the work is complex and the situation keeps changing.

The Marines have refined their answer for two hundred years. He writes orders that name the objective and the why, then turns people loose on the how. They plan to adapt from the start, so when the situation changes keep moving instead of waiting for permission.

I took that home to Chicago, and the diagnosis stung. I had been managing, not leading. I owned every task, stayed in the details while my team executed, and called it diligence. It was a ceiling. I was capping what my team could do at whatever I had bandwidth to watch. So I backed off. Gave people the intent and the room to own it, and they were capable of far more than I had been letting them be. I cannot make every decision, and I should not. The moment I stopped trying, everything got better.

Leading my agents beat managing them, and the evals proved it

I was sitting on the exact same problem with my agents and it took me too long to notice.

So I made the same shift. Stopped managing my agents and started leading them. That means handing over the mission, what done looks like, why it matters, the boundaries to work inside, and then getting out of the way. Then I pushed it into the developer prompts running our system for that same metals manufacturer and our evaluation scores jumped ten percent. The model had not changed. The agent had simply learned what it was for. It could handle the edge cases because when the situation drifted from the plan it understood the intent well enough to adapt.

Then I made the real leap in my own projects. Stopped thinking about an agent and started building an organization of them. I set up Claude Code as a team. Each agent has its own order, a clear main effort, supporting efforts around it, and channels to communicate. A small military. I stopped giving commands and started running it like a unit.

The hard part was the same one I hit in Chicago. I kept wanting to climb back into the details. I caught myself doing it and recognized it instantly. Micromanaging again. So I stepped back, asked the nervous question (Can I trust this thing?), and let it own its lane. It does full stack development now. It makes calls I used to insist on making, and makes them well because it is closer to the information than I am. It knows what to send up the chain and what to handle. Which is how I go to bed and wake up to find the work has moved forward.

Decentralized command is the whole game

Underneath all of it is one principle the military paid in blood to learn. As Helmuth von Moltke observed, no plan survives contact with the enemy. You cannot centralize every decision without the mission dying while it waits for an answer. So you push decisions to whoever is closest to the problem. The intent tells them what they are actually trying to achieve. The boundaries tell them what is theirs to call and where they stop. The values are for when something happens that nobody planned for and they have to choose. Then you trust them.

Jocko Willink and Leif Babin called this Decentralized Command in Extreme Ownership. L. David Marquet built it on the USS Santa Fe in Turn the Ship Around! The Marines codified it as commander's intent in MCDP 1. Different names, same idea: the people closest to the problem make better decisions when they understand the mission than when they wait for instructions.

The labs are arriving at the same place from the other direction

The deeper I went into military doctrine, the more I started recognizing it in Anthropic's own published guidance on building agents. Not in the language. They never reach for military vocabulary. But in the structure underneath.

What Anthropic's Building Effective Agents calls dynamic process direction, Marines call commander's intent: give the mission and the end state, not the steps. What Anthropic calls tool scoping, Marines call left and right limits: the defined edges of what an agent can touch and what it cannot. What Anthropic recommends as escalation checkpoints, Marines build into their chain of command: decide within your lane, send the genuinely hard calls up. What Anthropic describes as the orchestrator-subagent relationship, Marines call the commander-subordinate: a hierarchy of authority where each level operates independently inside its scope.

Neither field is copying the other. They are being pulled toward the same structure by the same underlying problem. Two fields, centuries apart, same answer. That is not a coincidence. It is the shape of the problem, and the Marines got there first. That is the only reason I am pointing you at their doctrine instead of another prompting guide.

Centralized agents lost in complex environments

The whole argument so far is that you should trust an agent to make the call instead of making every call yourself. That should feel backwards. Surely the one with the overview, the one in charge, decides better than someone down in the weeds. It is a fair objection, and it is testable. So I tested it.

Picture a night every software team knows. An outage, a dozen systems failing at once, each one needing a decision, the clock running. You can run it two ways. One agent in charge of all of it, making every call. Or an agent on each system, trusted to handle its own. Same AI on both sides, same standing orders for what a good call looks like. The only difference is who gets to decide.

Whoever decides has to know what is actually going on, and that is where the two part ways. The agent sitting on one system sees everything about it: what changed in the last hour, whether the outage is really a vendor's fault, whether data is quietly corrupting. The agent in charge of all of them cannot watch any one that closely. It gets summaries, and the fact that decides the call is usually the part a summary leaves out.

The first result: when the call was easy, who made it made no difference. When the call was hard, it made all the difference in the world.

[Chart: Share decided correctly, across 51 incidents. An "easy" call is one the summary already answers, like a total outage. A "hard" call turns on a detail the summary leaves out.]

The easy calls prove the test was fair. Both setups nailed every one. They are also the calls that never needed a leader: a total outage, obvious noise, anyone holding the summary gets them right. It is the hard calls, the ambiguous ones, where the agent in charge fell off a cliff, from a perfect score down to barely two in five.

That collapse is not stupidity. It is blindness, and you can prove it by handing the blindness back. The more systems I let the agent in charge investigate for itself, the better it did, and once it could dig into everything it caught right back up to the agents on the ground.

[Chart: The same climb showed up on two different AIs, a smaller one and a larger one. The more capable AI was no less blind, because the missing fact was not something a sharper mind could reason out. It was not in front of it.]

So the gap was never intelligence. Same AI, same orders. Give the decision-maker the full picture and it decides just as well, wherever it sits. The whole gap is what it could see.

Which sounds like an easy fix. Just let the one in charge see everything. In a test with fifty incidents and no real clock, you can. In production you cannot, and that is the part that matters most. There is too much happening to read all of it, too little time before the call is due, and never the budget to dig into every corner. You check a fraction. And the moment you can only check a fraction, you have to guess which fraction is worth checking, and you guess wrong, because the calls that matter rarely look urgent in a summary. The error climbing quietly toward a cliff reads calmer than the loud outage that turns out to be nothing. So the agent in charge chased the fires that looked big and walked past the quiet ones that weren't. The agents on the ground never had to guess. Each already had its own situation in full.

That gap is the line between handing out a task and handing over a mission. When you can see everything and the answer is obvious, give the task: do this. But that is the easy call, the one that never needed you. Real work is the other kind, ambiguous and scattered and urgent, the deciding fact sitting somewhere you cannot reach in time. There you cannot hand over a task, because you do not know the answer either. You can only hand over the mission, the intent and the limits, and trust the agent in it to make the call. That is not splitting work into pieces. Plenty of people split the work and still route every decision back to themselves, and they hit the exact wall above. It is trusting the piece to decide.

And it only gets harder the more you build. A couple of agents and you can almost keep up, checking the calls that matter yourself. An organization of them, running faster and in more places than you can watch, and "let me look into it first" dies on call after call. The leader who insists on every decision goes blind the moment the work outgrows one mind. The one who gives intent and trusts the call keeps moving.

Lead what you build

The frontier labs will keep making the models smarter. That work matters enormously. But the model is only half of the equation, and the leadership half is almost entirely neglected right now.

Here is what that neglect is costing us. The technology to build real organizations of agents exists today. Not in some future version of the models. Right now. The ceiling is not capability. It is how we are thinking about the problem. We are building tools when we could be building teams. We are writing prompts when we could be writing orders. We are managing when we could be leading.

So the next time an agent lets you down, before you reach for the prompt editor, ask one question. Did I give it a mission, or just a task. Most of the time the honest answer is just a task. That is the whole problem.

Stop managing your agents. Start leading them.

Originally published on Medium.