At First, I Asked AI to Write Code. That Was a Mistake

#automation #testing #qa #buildinpublic

Hey dev.to!

Ilya here again. In my first article, I shared how I got tired of fixing old, "brittle" automation projects and decided to build my own AI-powered tool.

Today, I want to share my first failed approach and how it led me to the architecture I use for Debuggo now.

The "Obvious" Idea That Didn't Work
When I first had this idea, the solution seemed obvious. I'm a QA engineer. AI can write code.

So, I just needed to ask the AI: "Write me a Selenium/Cypress test that does X, Y, and Z."

Problem solved, right? Not really.

Attempt #1: The Python Indentation Hell
I started with Python + Selenium because I know them well. I set up a prompt for an AI API to generate a complete .py script.

The AI generated the code. It looked... okay.

And then I tried to run that code.

Anyone who's worked with Python knows: one wrong space, one bad indent, and the whole thing crashes. Trying to take AI-generated code (which sometimes "imagines" its own indentation) and run it reliably on a server was a nightmare. It was unstable, unpredictable, and needed a ton of hacks just to clean it up.

I gave up on that idea fast.

Attempt #2: JavaScript and the Same "Brittle" Problem
"Okay," I thought, "Python is the problem."

I switched to JavaScript (for Playwright or Puppeteer). It was much easier to run—no sensitive indentation, more flexible syntax. I asked the AI again: "Generate a JS script..."

This worked. Sometimes.

But I ran into the exact same problem I was running away from: the tests were still unstable.

Why? Because the AI was generating code, and generated code is just as "brittle" as code written by hand. The AI hardcoded selectors (#main > div:nth-child(2)), it didn't always handle "waits" correctly, and any small change in the UI broke it.

I realized I had just traded one problem for another: I went from fixing my brittle code to fixing AI-generated brittle code.

The "Aha!" Moment: I Don't Need Code, I Need Steps
I realized the problem wasn't the language (Python or JS). It was the approach.

I didn't need generated code that I had to run. I needed to understand what the user wanted to do.

And that's how I got to the architecture that powers Debuggo:

A user writes in plain English: "Click 'Login' and then fill 'email' with 'test@test.com'".
The AI API acts as a "translator," not a "coder." It doesn't write code. It translates that text into a structured command (basically, simple JSON): [{"action": "click", "target": "Login"}, {"action": "fill", "target": "email", "value": "test@test.com"}]
I store these steps in a database. Not .py or .js files, just these structured instructions.
I have my own "runner". This is one, stable piece of code on my server that I control completely. This runner just reads the steps from the database and runs them one by one.

Why This Changed Everything
This approach solved two problems at once.

First, stability. My "runner" is stable. I know how it works, and it doesn't change. The AI isn't generating 100 different brittle scripts—it's just generating instructions for my one reliable runner.

Second (and this is the most important part): The user, don't need to know how to code at all.

You don't need to know Python. You don't need to know JavaScript. You don't need to worry about indentation, selectors, or async/await.

Your job is to describe the test case in plain English. My job is to translate it correctly and run it.

This is what Debuggo.app is. It's not a "code generator." It's a "translator" from human language to test steps.

Thanks!

DEV Community

At First, I Asked AI to Write Code. That Was a Mistake

Top comments (0)