A Google DeepMind safety lead said this week that they're putting $10M behind multi-agent safety because "there just isn't really a field of research for multi-agent safety yet."
I read that and laughed, because I'm already running the thing the research field doesn't exist for yet. Most of us are. You spin up a couple of agents, hand them work, and somewhere in there you quietly become a manager of workers that don't think like workers.
Two days before that, PMI published the first official standard for AI in project work. It's a solid document. It also leaves the entire "how do you actually do this on a Tuesday" layer to you. So here's my Tuesday layer: five shifts I had to make, each one learned by getting it wrong first.
You stop filling the queue and start drawing the line
My first instinct with an agent was the same as with a person: here's work, go.
That broke the first time an agent made a reasonable decision on something that turned out to be irreversible. It wasn't the agent's fault. I never told it which decisions were one-way doors.
So now the first artifact I write isn't a task list. It's a boundary file. Something like this lives next to the work:
# decision-boundaries.yml
autonomous:
- reformat, refactor, rename within a module
- anything reversible with a git revert
escalate:
- schema changes, public API shape
- deletes, migrations, anything touching prod data
- spend over $0 or any external send
on_unsure: stop_and_ask
That file does more for me than any standup. Leadership moved from assigning the work to defining what may be decided without me.
You read work you never watched happen
I used to review work I'd seen get built. I knew the steps, so "looks right" was usually safe.
Then I started getting finished diffs with no memory of how they came to be. "Looks right" stopped being safe. The code was clean and the reasoning under it was wrong in a way you only catch if you go digging.
The skill now is judging a result cold, with zero context on the path. Ethan Mollick wrote this week about a model holding twelve hours of focus on one spec. When the attention window outlasts mine, my job isn't checking steps. It's scoping the spec so tightly the steps don't need a babysitter.
You plan capability, not headcount
"How many engineers do I need" is a question I catch myself asking and kill.
The real one: what mix of people and agents produces this outcome, and what's the human-only core I'd never hand off? The plan turned into a capability map with a deliberately protected center.
Gergely Orosz's June job-market analysis lands in the same place from the data side: the roles that compound are where judgment about AI systems is the scarce input, not execution on a known stack. Capability planning is that judgment pointed at your own team.
You design the alarm before the fire
Standup tells you something broke. Which means it tells you late.
Workers that fail unpredictably need the alarm built up front. I keep a short tripwire list, each one a single sentence: if this observable crosses this line, halt and ping me, and here's who owns the ping.
# tripwires.yml
- watch: test_pass_rate
trip: "< 100% on touched files"
action: halt + page me
- watch: files_changed
trip: "> 20 in one task"
action: pause for scope review
It feels too simple to matter. It has saved more bad mornings than any dashboard I've built.
You own the system, not the deliverable
This is the one that's actually a promotion.
Ownership used to mean the outcome is mine. It still is. The level changed. I don't own the deliverable directly anymore. I own the system that makes it: people, agents, and the rules between them. That's the only level that scales.
Boris Cherny, who runs Claude Code, said this week he hasn't written a line of code himself in eight months. People hear a flex. I hear the shift in one sentence: stopped producing the work, started owning the system that produces it. Bigger job, not a smaller one.
Where are you on these
I'm not clean on all five. Solid on three, shaky on two, and the shaky ones cost me the most.
Rate yourself one to five on each, fast. The two you score lowest are the two behaviors that move you this quarter. Which one did you make first, and which are you still avoiding?
Tags: #projectmanagement #ai #career
Top comments (1)
honestly the boundary file falls apart the second an agent hits a decision that's reversible in code but not in trust - like it emails a stakeholder something technically fine but politically wrong. git revert doesn't fix that, and i don't have a clean rule for it yet.