DEV Community

I Lead AI Agents Every Day - Here Are 5 Shifts No Standard Tells You How to Make

Mykola Kondratiuk on June 12, 2026

A Google DeepMind safety lead said this week that they're putting $10M behind multi-agent safety because "there just isn't really a field of resear...

Read full post

Sloan the DEV Moderator • Jun 16

Hey, this article appears to have been generated with the assistance of ChatGPT or possibly some other AI tool.

We allow our community members to use AI assistance when writing articles as long as they abide by our guidelines. Please review the guidelines and edit your post to add a disclaimer.

Failure to follow these guidelines could result in DEV admin lowering the score of your post, making it less visible to the rest of the community. Or, if upon review we find this post to be particularly harmful, we may decide to unpublish it completely.

We hope you understand and take care to follow our guidelines going forward!

Mykola Kondratiuk • Jun 12

honestly the boundary file falls apart the second an agent hits a decision that's reversible in code but not in trust - like it emails a stakeholder something technically fine but politically wrong. git revert doesn't fix that, and i don't have a clean rule for it yet.

FastAnchor_io • Jun 12

The boundary file idea is underrated. I've found adding a third category helps: "inform" — decisions the agent can handle autonomously but logs with reasoning so I can audit later. Keeps autonomy high without the trust gap.

Mykola Kondratiuk • Jun 12

the 'inform' category is exactly right in theory — where it breaks for me is audit discipline. three weeks in i stopped opening the logs daily, so it became 'inform' in name only. the category only holds if you have a trigger that forces review, not just access.

FastAnchor_io • Jun 13

the audit discipline point is sharp. a trigger-based approach — like a scheduled CI job that diffs agent logs against expected patterns — would make 'inform' actionable rather than aspirational. without that enforcement layer, it degrades into a label with no teeth, which is worse than not having it at all because it creates false confidence.

Mykola Kondratiuk • Jun 13

the CI job approach is sharper than daily reviews — scheduled beats aspirational. the hard part is defining 'expected patterns' for contextual agent decisions. what's worked: alert on rate (decisions-per-day above baseline) and novelty (action types absent from last week) rather than matching specific decision content. rate + novelty catches drift without needing to model what 'correct' looks like in advance.

FastAnchor_io • Jun 14

Rate + novelty is exactly right. I would add one more signal: decision churn — when an agent keeps flipping between two action types on the same input. High churn on stable inputs usually means the context window is confusing the agent, not that the problem changed. Caught a few silent drift cases that way.

Mykola Kondratiuk • Jun 14

decision churn is a sharp addition — rate and novelty both miss oscillation: count stays stable, action types stay stable, but the agent is stuck cycling. and the context window hypothesis fits: churn should spike after a prompt change or model version bump, which makes it a useful version-change detector on top of a drift signal.

Alex Shev • Jun 12

Leading agents feels closer to managing a production system than prompting a chat window. You need scope, interfaces, review, escalation, and a way to tell whether progress is real.

The biggest shift for me is that instructions are not enough. Agents need operating context: what matters, what must not change, where evidence lives, and how to report uncertainty without turning it into confident noise.

Mykola Kondratiuk • Jun 12

the production system framing covers the mechanics but I keep hitting a wall on the judgment layer. you can't page on a bad agent decision the same way you page on a 5xx - the spec was valid, the action was within bounds, but the context was wrong. that gap is the shift I couldn't borrow from SRE playbooks.

Alex Shev • Jun 13

Yes, that gap is exactly where the SRE analogy starts to break. For agents, “healthy” cannot only mean valid input and no runtime error. You need a judgment trail: what context was used, what alternatives were rejected, what uncertainty was left, and who owns the final decision. Otherwise the failure looks normal until after the damage.

Mykola Kondratiuk • Jun 13

the alternatives-rejected piece is what kills forensics - context can be reconstructed, ownership can be assigned retroactively. but why path A over path B is gone unless you built the trace in upfront.

Alex Shev • Jun 13

Exactly. The rejected alternatives are usually where the incident report starts making sense. A trace that says 'called tool X' is useful; a trace that says 'called X after rejecting Y because Z constraint' is where you can actually audit an agent decision.

Mykola Kondratiuk • Jun 13

yeah, and that's also what breaks silently on model upgrades. new version just skips Y without logging why. no trace of the drift.

NOVAInetwork • Jun 16

The boundary file maps almost exactly to how I run agents on my own infra, but the line I'd draw harder is inside your "escalate" bucket. Not all escalations are equal. There's a class of operation where the failure is silent and unrecoverable, and for those the rule can't be "escalate," it has to be "the agent proposes, a human executes."

Concrete example from this week: I had an agent do all the mechanical work of a destructive git history rewrite on a throwaway clone, run the verification, and then hard-stop before the force-push. It surfaced three verification gates for me to read, and I ran the push myself. The agent never touched the irreversible step. That split, agent does the deterministic work, human owns the one-way door, is what made it safe to hand off at all.

Your tripwire on files_changed is the same instinct pointed at scope. The one I'd add: a tripwire on "is this the second irreversible operation in one session." Doing one carefully is fine. Doing two in parallel is where the bad mornings come from, because your attention splits across exactly the steps that can't be undone.

Scored myself: solid on boundaries and tripwires, shaky on "read work I never watched." Cold-reading a clean diff whose reasoning is quietly wrong is the one that still gets me.

Mykola Kondratiuk • Jun 16

silent-and-unrecoverable can't share a bucket with 'needs a second look.' we ended up with a hard halt class for that - nothing proceeds until a human re-initiates, no retry, no timeout override. what forced it was an agent that re-ran a write because the escalation path itself timed out.

NOVAInetwork • Jun 17

Yeah, that's the exact trap. I hit the same class from a different angle , a consensus halt where the recovery path was part of the failure. The wedge lived in persisted state, so restarting a stuck node just reloaded the wedge. Same shape as your timeout-driven re-run: the automatic machinery meant to recover is the thing that re-arms the failure.

The property I landed on is that escape has to require genuinely new external input, not re-running the existing path. A hard-halt class is necessary but not sufficient on its own , you also have to make sure nothing in the system can quietly "recover" the halt state through the same automatic route that's supposed to help. Human re-initiation works precisely because it's the one input the failing loop can't generate itself.

Mykola Kondratiuk • Jun 17

persisted state reloading the wedge is the exact trap I did not see coming. retry looks clean from outside but just replays the bad state. what cleared it for you — manual wipe, or did you have to redesign the checkpoint scope entirely?

NOVAInetwork • Jun 18

Neither a wipe nor a full checkpoint redesign, it was narrower than that but in the checkpoint-scope direction. The wedge came from the sync path advancing the commit cursor on weak evidence: contiguous blocks plus a matching state root were enough to move it forward, so on restart it would happily re-advance across the same bad prefix. A manual wipe just resets the start point, the loop walks back into the wedge.

What cleared it was tightening what is allowed to advance the cursor. Now the sync path will not move the commit height unless each block it crosses carries its own verified certifying quorum certificate, not just contiguity and a state-root match. So the recovery path can no longer re-bless the wedged prefix, because the thing that wedged it never had the certification the stricter gate now demands. The escape had to come from outside the failing loop's own evidence, exactly your point: the loop cannot self-certify its way out.

Did your case end up needing the checkpoint scope redesigned, or was a narrower evidence-tightening enough for you too?

Mykola Kondratiuk • Jun 18

the cursor-advance evidence threshold being separate from checkpoint-write is what usually gets collapsed - and then nobody can untangle why replays keep wedging on the same commit. was your fix more of a write-guard, or did it end up needing to be a rollback trigger too?

Manuel Bruña • Jun 15

The boundary-file idea is the practical part for me. Most agent failures I see are not "bad model" failures, they are missing decision boundaries: what can be changed, what must be escalated, and what counts as an external side effect. Standards help, but that small YAML contract probably prevents more damage than a long policy doc.

Mykola Kondratiuk • Jun 15

the external side effect category is the one that shifts most — sending a message is clearly external, but once you add a draft review step, creating a draft becomes debatable too. the YAML is only as stable as your definition of what counts as external, which is more slippery than it looks.

Brian Kirkpatrick • Jun 16 • Edited

I find myself translating a lot of my bread-and-butter engineering practices and see some of these strongly reflected above:

For example, we have a healthy appreciation (particularly in aerospace/systems engineering) for a well-balanced relationship between objectives, requirements, and constraints.
These are design patterns that translate well into specific decomposition (components/subsystems) and planning (road-mapping/schedules and phases/subtasks) activities & artifacts; resource projections (people/tokens/money/inputs) fall out naturally enough from these decisions (+/- some estimation and risk of course).
But nothing replaces the value of a data-first design, regardless of the human/agentic combination--design transitions into development by first identifying and iterating on the specific data constructs or models; where those bytes live; and how those bytes move around.

Mykola Kondratiuk • Jun 16

the objectives/requirements/constraints triad is underused in agent design - most teams spec objectives only and wonder why the agent goes off-script. constraints are what make the boundary file real rather than aspirational.

Julian Neagu • Jun 17

Capability planning feels like the real unlock here not headcount, but system composition.
Once agents enter the loop, org design starts looking like distributed systems with humans as high-trust nodes.
Most teams still optimize for tasks, not for the reliability of the system producing those tasks.

Mykola Kondratiuk • Jun 17

distributed systems analogy is close but breaks at exception handling - in a real dist system a failed node gets rerouted. humans don't route around cleanly. so the real org design question isn't reliability but which decisions need irreversible human judgment vs which ones should just resolve and log without surfacing