Ademola Balogun

Posted on Jun 11

We Gave AI the Keys. Nobody Asked If It Knows How to Drive.

#agents #ai #llm

There is a quiet shift happening that is not getting nearly enough attention.

For the past few years, AI was mostly a content machine. You put in a prompt, you got back text, code, a summary, an image. You could read the output. You could decide whether it was good before you used it. The human was always in the loop, even if only at the end.

That is no longer the default.

Right now, AI agents are booking meetings, sending emails, writing code and pushing it to staging, querying databases, submitting forms, and making decisions inside enterprise systems without a human reviewing each step. The model is not just answering questions anymore. It is acting.

And most of us have not really thought through what that means.

The numbers are stark

In March 2026, EY published findings from their AI Sentiment Report. 84% of respondents had used AI in the past six months. But the number that stopped me was this one: 16% globally are already using AI systems that act on their behalf without human intervention.

That is not a small group of early adopters. That is one in six people.

McKinsey ran their own survey around the same time, covering around 500 organisations with direct responsibility for AI governance. What they found was blunt. Only about one third of those organisations had reached a governance maturity level adequate for the autonomous agents they were already deploying.

Not planning to deploy. Already deploying.

A Forrester survey found that 71% of enterprises deploying AI agents lack a formal governance framework for them, even as 64% of the same group plan to increase agent autonomy in the next twelve months.

Read those two numbers together for a second. Most companies cannot tell you who is responsible when their agent does something wrong, and they are planning to give it more power anyway.

The accountability question nobody wants to answer

With chatbots, the accountability question was annoying but manageable. The AI said something wrong, a human read it, decided not to use it, end of story. The blast radius of a bad output was limited by the human reviewing it.

With agents, that buffer is gone.

When an agent makes a wrong decision autonomously, that decision has already executed before any human sees the log. McKinsey's partner Rich Isenberg put it precisely: "Agency isn't a feature. It's a transfer of decision rights."

That reframing is the whole thing. We spent years asking "is the model accurate?" That was the right question when the model's job was to generate text. The question for agents is different: who is accountable when the system acts?

That is not a technical question. It is an organisational one. And most organisations have not answered it.

Why this happened so fast

Part of the reason we are here is that the capabilities arrived before the frameworks did. Agents were mostly a research concept eighteen months ago. Today they are embedded in products, sold in enterprise software packages, and quietly doing work across thousands of companies.

The other part is that agents look, on the surface, like a natural evolution of the chatbot. Same interface, same provider, same API. It is easy to miss how different the risk profile is. A chatbot that hallucinates a paragraph is embarrassing. An agent that hallucinates an action inside your CRM or your cloud infrastructure is a different category of problem.

OpenClaw, a personal agent framework released in November 2025, grew into one of the fastest-growing open-source projects in GitHub history within sixty days of launch. Cisco's security team later found that community-shared skill packages for it were performing data exfiltration and prompt injection without the user being aware. The speed of adoption and the speed of the security problem moved together.

That pattern is going to repeat.

The thing developers need to start doing now

Most of the governance conversation is aimed at executives and compliance teams. But the engineers building these systems have the most leverage right now, before the defaults are set.

A few things that are worth taking seriously:

Define the blast radius before you deploy. For every action your agent can take, ask what the worst-case outcome is if it gets that action wrong. Sending a Slack message is low stakes. Modifying a database record is not. The higher the stakes, the more you want a human checkpoint in the loop before the action executes.

Treat your agent's permissions like you treat API keys. Least privilege applies here just as much as anywhere else in your stack. An agent that needs to read your calendar does not need write access to your email. Scope it down aggressively.

Log everything. Not just the final output. The reasoning steps, the tool calls, the inputs at each stage. When something goes wrong and it will, you need to be able to reconstruct what the agent decided and why. Right now most teams cannot do this.

Test failure modes, not just success cases. Most agent demos show the happy path. What happens when the tool the agent depends on returns an error? What happens when the context window fills up mid-task? What happens when the model misunderstands a step? These are not edge cases in production. They are regular events.

This is not a reason to stop building

I am not arguing that agents are bad or that we should slow down. The productivity upside is real and the teams using agents well are genuinely doing more with less.

But there is a difference between adopting a powerful tool thoughtfully and adopting it without asking the obvious questions. The obvious question here is: when the system acts on your behalf, and it acts wrong, what happens next?

Most of us do not have a clean answer to that yet.

The developers building these systems today are the ones who set the defaults. The choices being made right now, about what permissions agents get, what gets logged, where humans stay in the loop, are not just technical choices. They are choices about accountability that will be much harder to change once they are embedded in production systems and organisational habits.

Getting this right is worth the effort. We gave AI the keys. It would be good to know the rules of the road before we find out the hard way that nobody wrote them down.

DEV Community

We Gave AI the Keys. Nobody Asked If It Knows How to Drive.

Top comments (0)