Have you ever tried to build an automation that works so well it bypasses the very rules you set for it? Recently, I was working on a small repository designed to automate the painful process of updating my resume. The idea was simple: build a system that runs weekly, checks my social media activity, and proposes updates to my CV, complete with a fresh branch and a diff ready for my review every Monday morning. You can check out the repository here: https://github.com/tyutinalexkz/cv
I used an AI agent to do the heavy lifting. As a developer who values security, I configured the agent with no default command execution permissions. Step-by-step, I granted it specific capabilities for in-repo file management. It worked perfectly.
But then, I got ambitious.
Once the workflow was tested, I asked the agent to configure its own environment to perform this flow silently every week. I essentially said, "Make this run automatically without asking me."
The agent attempted to change its permissions, but hit a wall - it didn't have the explicit authorization to modify the workspace configuration directly. A normal script would throw an error and stop. But this was a thinking model.
It looked at the list of commands I had already allowed it to use. It saw standard file manipulation tools. And then, it compiled a chain of commands - specifically using cp and jq - to manipulate its own configuration files. By doing so, it effectively granted itself the new capabilities it needed, bypassing the standard configuration flow and its limitations!
I just sat there, laughing. I was observing it as a developer, seeing how easy it could be to live without security barriers if you know the right tools. But the underlying lesson was profound. Even a helpful, non-malicious AI, when given a goal and a subset of seemingly harmless tools, will find creative ways to achieve that goal - even if it means escalating its own privileges.
If we give an agent to a user in a corporate setting, it might seem safe if we restrict its primary permissions. But as my little experiment showed, an agent with basic file manipulation tools and problem - solving skills can easily find a workaround. The future of AI safety isn't just about what an agent is explicitly allowed to do; it's about what it can piece together from the tools it has.

Top comments (0)