AI did a good job... and almost deleted everything

I would like to share an experience I had today.

TLDR:

AI finds minor security problem: danger of deleting most of root through a malicious symlink
Other AI determines the problem will likely never be exploitable but asserts impact of fix is very small
AI starts to work on task using TDD
Red: test with a symlink to root
Green: ...

Every few months I use an AI tool to try and see if I can make it write the code for a project idea I have. I do this to determine the current state of AI models. My main focus is to get an intuition of the type of things I tell the AI to steer it into the direction needed to create a maintainable product.

I was quite pleased with the progress the first few days. It had created a proof of concept, we iterated over it, discovered what it actually needed to be and even rewrote the whole thing into another language.

When I was somewhat satisfied with the behavior I did a few rounds (evenings) of code review. Its interesting that the current models have some built-in counter: Gemini usually gives lists of 4, while Claude stick with 7-10 items. You can push that up if you request a review on multiple levels: project, features, files.

After that I went for a few rounds of code reduction. AI likes to generate code, but when asked to reduce the amount of code it generated I was able to fairly quickly shave of 10%. When I finished that round I focused on the tests (I had instructed the AI to use TDD, so there were many). This helped to improve the architecture of the code as well.

I performed each type of work in a branch. And when I completed the work in a branch I had the different models review all code in that branch. After a few rounds I was satisfied and created a pull request. We have our pull requests setup so that AI will review them.

This time it took about 3 rounds of AI fixing the findings reported by the other AI. Only minor nitpicks were left, one of them was interesting: An attacker could construct a symlink to root, causing my code to delete a lot of files there.

I always discuss the findings with AI before we work on it, and it told me the danger was very small because the location of the potentially dangerous directory (if it would be a root-referencing symlink) was hard-coded. However, its advise was to add the guard because it only was a one line guard.

I agreed and the AI got to work. Following its regular (and advised) mode of development it started with a failing test... Yes, it used a symlink to root for the test.

I noticed the AI struggling with the fix and stopped it. I asked it what was going one. It told me a test was running a bit longer and seemed stuck. Then I realized and asked: "Are you actually testing with a symlink to root?"

What followed was a the typical AI response, I'll leave that to your imagination.

The damage? Apparently it used a built-in method of the programming language to perform the delete. If that method could not delete a directory, but had access to it, it would traverse it and try the sub directories. Eventually it managed to access my home directory and delete some files.

I will never know what files it deleted, luckily I have no files of value on my computer. It is hilarious that this problem would probably never have been triggered in production.

I wonder if these types of problems will ever be solved in the models themselves. I fear not, and certainly not soon. But if I ever write my own agent loop, I will certainly include a 'Does this test have any potentially dangerous side-effect when run?' question.

DEV Community

AI did a good job... and almost deleted everything

Top comments (0)