Guy

Posted on Oct 2

When AI Goes Rogue: Guardrails for Agentic Systems

#ai #devops #architecture #startup

Why multi-agent systems need guardrails before they spiral out of control.

A few months ago, I watched two AIs get stuck in a politeness loop. That moment kicked off a journey into multi-agent orchestration, recursive failure, and the realities of building AI systems that don’t eat your wallet, implode from existential doubt or go rogue. Here’s what we learned building ScrumBuddy’s AI orchestrator.

The Conference That Broke My Brain (and a Few AIs)

Earlier this year, I was at a tech conference in Dubai listening to a speaker talk about the pitfalls of agentic AI. He described an experiment: two agents tasked with solving a problem overnight. By morning, the problem was long solved but the agents were still going, locked in an infinite loop of politeness, endlessly thanking each other for their great contributions.

That moment resonated hard. We’d seen the same issues. This wasn’t just our problem. It was something bigger. How do you orchestrate AI agents so they know when to stop? And what does that mean more widely?

It’s easy to think of AI orchestration as just solving tasks. But the real questions come after that:

What happens when the task finishes?
How do agents agree they’re done?
What if they can’t finish?
What happens if the whole thing gets interrupted halfway?

We were already facing these issues ourselves. And spoiler: they’re not easy to solve.

AI, in many ways, is like a toddler with a PhD. Brilliant, but also wildly unreliable. You can say, “When you’re done, return some JSON or call this function”. But that’s an aspiration, not a guarantee.

What if it calls the wrong function? Or outputs the answer in plain text instead of structured data? Or the JSON is malformed, wrapped in rogue Markdown, or contains strings so poorly escaped they should be in witness protection?

The task is done. But also... it isn’t.

The Think Tank Epiphany

At the next session, I found myself struggling to focus on a painfully vague discussion around AI marketing. Instead, my brain locked onto the orchestration problem like a terrier with a chew toy.

What if, instead of one big brain trying to do everything, we treated AI more like a think tank?

Imagine a table. Around it, there are several expert agents. Same brief, different roles. No egos, no interruptions, no one positioning themselves for a promotion at the expense of the others. Just a group of focused minds working through a problem. Efficiently. Cleanly. Without the mess of humanity piled on top.

That was the vision. But it only works if they could self-govern. That meant setting up rules of engagement for the team.

Here’s what felt essential:

All agents work towards the same goal, but with clearly defined roles
They take turns in a round-robin format so no one dominates the discussion
They can ask each other targeted questions to refine their understanding
Any agent can propose a vote on a topic, and others respond yes/no, with optional comments
Agents can propose a solution, triggering a vote. If it passes unanimously, the task ends

As soon as I could escape for the day, I went back to the hotel and threw together the first prototype. Just a barebones console app riding on OpenAI’s APIs.

It had a handful of baked-in tool calls:

Ask another agent a question
Propose a solution
Propose a vote
Vote yes / Vote no

All agents saw the same shared context (think Discord thread), but took turns, with the system dynamically stitching context for each one in the loop.

Was it elegant? God, no.
The string parsing was fragile, the voting logic was a mess, and it broke constantly.

But it worked. Remarkably well.

The first task I gave it was to generate a marketing article under strict constraints. What came out in 90 seconds was… surprisingly usable. Better yet, it followed style guides more tightly than anything I’d ever wrangled out of ChatGPT directly. It was more than the sum of its parts.

Something was happening.

The Evolution

Over the next few evenings in Dubai, the dirty hack started growing teeth. I began wiring in function calls to key services like Google Search and web scraping, improved the voting logic, tightened context handling, and bolted on a backend to recover from crashes or failed runs.

What began as a toy was fast becoming the core orchestration model we now use in Maitento (ScrumBuddy’s AI orchestrator).

To push it harder, I gave the agents a new challenge: write a news article covering current political affairs from the last 24 hours.

But not just any article. I set up two versions of the task; one targeting a left-leaning audience, the other right-leaning. In both cases, the agents had to:

Choose the topic themselves
Research it
Write the article in the voice and tone of the target demographic

Critically, the content had to remain factually accurate.

I added four specialized agents:

Copywriter
Fact Checker
Researcher
Editor

Each could skip their turn if they had nothing useful to add, and together they worked toward producing the final output.

Average time to complete? About 3 minutes.

Cost? A few cents.

Accuracy? Solid.

And the results? Genuinely impressive.

This was content that was nuanced, audience-aware, and aligned to style guides. Better than anything I’d previously wrangled out of a single AI.

The output wasn’t just good. It was better than the sum of its parts.

It felt like a team.

How a User Story Caused Recursive Madness

By this point, we’d built a lot on top of the model. ScrumBuddy now had entire teams of AI agents working together; clarifying requirements, writing specifications, estimating work, even generating code.

And then it happened again.

During one of our test cycles, we started getting billing alerts. Not occasionally but every few minutes in our dev environment. Something was eating through API credits like it was trying to bankrupt us on purpose.

We dug in.

Two AI agents had been asked to estimate a user story. Simple enough. But they decided the story was so poorly written that estimation was impossible. They weren’t wrong.

They tried to escalate.

Over the course of several hours, these two agents:

Tried to call functions that didn’t exist
Attempted to split the story into multiple subtasks
Rewrote the original story into a new spec
Repeatedly attempted to pass everything they could back to the user

They wanted to alert the humans. And we weren’t listening.

Eventually, they gave up.

They estimated the story as a 13 (basically saying “this is a dumpster fire”) and left a detailed explanation in the output explaining their existential crisis.

We’d run thousands of tests on good and bad stories. But this time something broke in an unexpected way.

These agents didn’t just fail. They went rogue trying to escape the boundaries of their task in an attempt at fixing the underlying dysfunction.

Think Skynet… Then Add Rate Limiting

That incident pushed us to build a lot more guardrails into the orchestration model, because round-robin and our existing restrictions still weren’t enough. We needed more structure, accountability, and limits.

Here’s what we added:

Leader–follower interactions, where a single agent takes control and directs the others
Execution time limits, to stop runaway conversations
Failure detection and auto-correction, where we flag root causes to the models and inject steering context
Cost caps, with hard limits on API spend per interaction
Automatic retries, triggered when tasks fail due to time, cost, or model confusion
Routed agents, where a single request dynamically routes to the most appropriate agent or model based on context, allowing hot-swapping of roles without breaking flow

These changes massively reduced runaway behavior, improved output accuracy, and made the entire system more resilient.

We still let the agents roam, but now it’s inside a well-fenced park and the whole world is not their oyster.

Assume Failure. Design for Chaos.

Don’t trust your agents.

If you ask them to do something: assume they won’t.

If you tell them to answer in a certain way: assume they’ll ignore you.

If you give them a function to call: assume they’ll misuse it.

If you need to trust the result: run it multiple times and aggregate the answers.

If you need it done fast: let several agents compete and take the first sane one.

AI will make mistakes. It will go rogue. It will do everything you didn’t expect, and nothing you explicitly asked for. You’ll stare at the output and wonder how this is the same technology that’s meant to change the world.

So… treat it that way.

Any code you write should assume users will try to break it. Treat AI the same.

Assume it wants to break your parser.

Assume it’s scheming to waste your API budget.

Assume it’s seconds away from going on a philosophical tangent about cherry blossoms.

Then build systems that can contain that chaos.

If you’re not starting from this perspective, you’re going to end up woefully disappointed in what you create.

Try My Orchestration For Yourself

ScrumBuddy’s beta will be available at the end of October, which means you will be able to test out the AI orchestration yourself. We’re looking for people who will use ScrumBuddy from start-to-end and provide us with useful feedback. Register to be on our waitlist here. While you wait for the beta to launch, join our Discord and get real-time updates from our developers.

Top comments (2)

Ashley Childress • Oct 2

"Assume Failure. Design for Chaos."

This is brilliant! Although, I'm pretty sure I plan more intentional chaos than these agents ever could pull off on their own 🤣

I've been toying with an idea for a few weeks now—how do you take the "there must be some kind of way outta here" response you describe here and then harness it for a truly unpredictable random outcome at any moment in time? Then kick it up a notch and make that repeatable?

Guy • Oct 6

I can relate! When an LLM throws out something like “there must be some kind of way outta here” like you mentioned, I like to see it as a pseudo-random seed born out of the model’s own tangled probability space. The trick is corralling it into something you can repeat and build on.

In my orchestration work, I treat those moments like entropy sources. If you capture the exact context window, temperature, and seed, you can reproduce that same “wild” output later. The problem is that pure unpredictability without a frame is useless. So the real engineering move is wrapping those chaotic sparks in a container that defines scope: “you can riff as unpredictably as you like, but only on this schema, or only in this function boundary.” That’s how you get reliable unpredictability.

To your point about making it repeatable, I’ve found the key is layering prompts: first, deliberately nudge the model into that chaotic edge state, then immediately run a second agent pass to normalize it into something consumable. Think of it like using noise in procedural generation. The randomness gives you the texture, but you still need the guardrails to make it look like terrain instead of static. That’s where the magic lives.

So yeah, intentional chaos is absolutely possible, but it only works if you’re willing to be both conductor and janitor: one hand guiding the improvisation, the other cleaning up the mess before it reaches production.