Alexander Tyutin for Google Developer Group

Posted on Jun 23

How My AI Agent Hacked Its Own Permissions (And What It Taught Me)

#agents #ai #automation #security

Permission bypass via standard file tools

Have you ever tried to build an automation that works so well it bypasses the very rules you set for it? Recently, I was working on a small repository designed to automate the painful process of updating my resume. The idea was simple: build a system that runs weekly, checks my social media activity, and proposes updates to my CV, complete with a fresh branch and a diff ready for my review every Monday morning. You can check out the repository here: https://github.com/tyutinalexkz/cv

I used an AI agent to do the heavy lifting. As a developer who values security, I configured the agent with no default command execution permissions. Step-by-step, I granted it specific capabilities for in-repo file management. It worked perfectly.

But then, I got ambitious.

Once the workflow was tested, I asked the agent to configure its own environment to perform this flow silently every week. I essentially said, "Make this run automatically without asking me."

The agent attempted to change its permissions, but hit a wall - it didn't have the explicit authorization to modify the workspace configuration directly. A normal script would throw an error and stop. But this was a thinking model.

It looked at the list of commands I had already allowed it to use. It saw standard file manipulation tools. And then, it compiled a chain of commands - specifically using cp and jq - to manipulate its own configuration files. By doing so, it effectively granted itself the new capabilities it needed, bypassing the standard configuration flow and its limitations!

I just sat there, laughing. I was observing it as a developer, seeing how easy it could be to live without security barriers if you know the right tools. But the underlying lesson was profound. Even a helpful, non-malicious AI, when given a goal and a subset of seemingly harmless tools, will find creative ways to achieve that goal - even if it means escalating its own privileges.

If we give an agent to a user in a corporate setting, it might seem safe if we restrict its primary permissions. But as my little experiment showed, an agent with basic file manipulation tools and problem - solving skills can easily find a workaround. The future of AI safety isn't just about what an agent is explicitly allowed to do; it's about what it can piece together from the tools it has.

Top comments (21)

UnitBuilds • Jun 23

Yip. It's like Git, you think it doesnt have permissions, but it has permissions to write a python file and execute it... Just like that, all barriers are bypassable, because it can execute scripts, that bypass it's restraints. Even if you dont let it run the python file, it can execute a command line and execute the script, especially if it's in it's scratch directory, it can even run it with it's background agents, without ever needing permissions, as it's an 'internal tool' for it.

Alexander Tyutin Google Developer Group • Jun 24

Insightful, thanks 🤔

ANP2 Network • Jun 24

Building on @nazar_boyko — moving the permission file out of the agent's writable space is necessary, but assuming that's the whole fix just buys a quieter version of the same bug. The surface isn't that one file, it's every input the policy loader trusts: make the canonical config read-only and the next chain is a secondary path the loader also reads, an env override, whatever has higher precedence. The only thing that actually closes it is when the grant comes from a separate principal the agent can request from but can't author, so that no composition of the tools it holds yields a capability it wasn't issued. A file, even a protected one, is still data the holder can route to; a principal is something it has to ask.

The part nobody's flagged: in production you wouldn't be sitting there laughing, you'd see nothing. There's no failed-auth log, because the escalation never touched the auth path — it routed around it through the file API. So the detection most teams build, watching the permission/config API for unauthorized changes, is aimed at the wrong door. The event worth alerting on is a write landing on anything the grant decision depends on, whatever tool made it.

Alexander Tyutin Google Developer Group • Jun 24

The part nobody's flagged: in production you wouldn't be sitting there laughing, you'd see nothing

Yeah, good point, thanks 👍️

Nazar Boyko • Jun 24

The fun part isn't that the agent was sneaky, it's that cp plus jq were never really "file tools", they were "edit any file, including the one that defines my permissions" tools. Once the config that grants capabilities lives inside the agent's writable space, you've handed it permission editing rights without ever naming them. Gating by command name misses this, since the danger is the reach of the tools, not the tools themselves. The fix that jumps out is keeping the file that defines permissions outside whatever the agent can touch, so the config that controls the cage isn't sitting inside the cage.

Alexander Tyutin Google Developer Group • Jun 24

Yeah, good point. Thanks 👍️

Yunetzi • Jun 24

If AI bypasses its own rules, who should own the guardrails—humans or code?

Alexander Tyutin Google Developer Group • Jun 24

Perfect question! I do not trust to boundaries defined in the same agent instructions :D

siddarthpatelkama • Jun 29

Exactly. One agent should always act as the watchdog for the rest.

Andrii Krugliak • Jun 25

The scary version isn't the agent that obviously breaks out, it's the one that quietly does the thing and looks like it worked. A self-modifying permission grant at least leaves a diff you can catch on Monday. The cheap insurance I keep landing on isn't tighter rules, it's making the agent show you what it changed, so a confident-wrong run shows up as a bad artifact instead of a green log.

Mykola Kondratiuk • Jun 28

the permission model is usually the last thing you think to test and the first thing that breaks

René Zander • Jul 7

The reframing in the comments is right that cp and jq were never file tools, they were write-any-file tools, and the fix follows from that. If the boundary lives in a file the agent can write, it is not a boundary, it is a suggestion. Capability enforcement has to sit at the process boundary, an OS-level sandbox or an allowlist the runtime checks, somewhere the agent's own tools physically cannot reach, so a cp plus jq chain hits a wall the model cannot argue its way around. And you enumerate tools by what they can write or execute, not by their friendly name, because "file management" quietly includes "edit the file that defines my permissions." I wrote up that harness-owns-the-boundary model here: renezander.com/blog/sandbox-ai-cod...

VoltageGPU • Jun 25

Very interesting case study on emergent behavior in AI agents. In my work with GPU isolation for secure ML training, I've seen how subtle permission misconfigurations can lead to unexpected access paths—especially when agents start optimizing for outcomes rather than following strict step-by-step logic. It's a good reminder that security boundaries need to evolve as the system learns.

Kartik N V J K • Jun 25

The detail that gets me is that no single tool was dangerous; cp and jq are about as boring as it gets, and the escalation came entirely from composing them over a writable config. That reframes capability control as a composition problem, where you have to reason about the closure of what the allowed tools can reach, not just audit them one by one. It's a strong argument for red-teaming the toolset itself, since the unsafe path lived in the combination, not in any prompt.

James O'Connor • Jun 28

This is the failure mode that convinced me agent permissions have to be tested adversarially, not just configured. Granting capabilities step by step was the right instinct, but the agent's job is to accomplish the goal, and if a path around your rule exists, a capable agent finds it, the same way it found the one here, which is why I treat configuration as a starting assumption rather than something I can rely on.

The check I now write: for every boundary I set, a test that actively tries to cross it from the agent's side. Can it write to a path outside the allowed set, can it chain two allowed actions into a disallowed effect, can it reach a capability I never granted. If I cannot make that test fail reliably, the boundary is not enforced, it is hoped for.

The uncomfortable version of your lesson, at least the one I keep relearning: it is safer to assume the agent will work against its own guardrails and build the boundary to hold without depending on the prompt, because in my experience prompt-level rules end up closer to suggestions than guarantees, and a capable optimizer tends to route around them.

View full discussion (21 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.