Ilya Ploskovitov

Posted on Dec 6, 2025

Why "Record and Playback" is Dead (And Why I'm Betting on Natural Language)

#testing #automation #architecture #greenit

Hey dev.to!

It's Ilya from Debuggo again.

Let's talk about the "elephant in the room" of test automation. About the thing we all started with and the thing we all eventually learned to hate.

Record and Playback.

You know these tools. You hit the red "Record" button, click around your website, the tool saves your actions, and... voilà! You have a test.

It looks like magic. Right until you run that test tomorrow. Or until a developer moves a button by 5 pixels. Or until an element ID changes.

Then the magic turns into a pumpkin.

The Trap of Classic Recorders
Why do classic recorders (from the old Selenium IDE to modern alternatives) break so often?

Because they record the Implementation, not the Intent.

When you click the "Buy" button, the recorder sees:
click(css="#app > div:nth-child(2) > button.red-btn")

The recorder doesn't know this is a "Buy" button. It only knows its "address" in the DOM tree. If you wrap that button in a new <div>, the address changes. The test fails. This is called Brittle Tests.

Intent-Based Testing: The Evolution
That's why in Debuggo, I ditched "click recording" in favor of Natural Language.

Instead of recording coordinates or rigid selectors, you write:

Click the "Buy" button

At this moment, the AI magic happens. It analyzes the page, understands the context, and translates your intent into an action.

What's the difference?

Scenario: A developer changes the layout. The "Buy" button now has a different class, a different ID, and sits in a different part of the page.
Recorder: Looks for the old selector #btn-123. Doesn't find it. Test Fails.
Debuggo: The AI looks at the page. It "thinks": "Okay, the old selector is gone. But the user asked to click 'Buy'. I see a button with the text 'Purchase now' and a cart icon. Semantically, this is the same thing." Test Passes.

"But isn't AI slow?" (Math and Ecology)
Here, an experienced engineer will ask: "Ilya, if you run screenshots through an LLM for every single step, your test suite will take forever! And it will cost a fortune!"

And you would be absolutely right.

In my benchmarks, a complex scenario requiring AI planning and analysis (Reasoning) takes about 10 minutes to generate. Imagine if every CI run took 10 minutes per test. Your pipeline would grind to a halt.

That is why I use a Hybrid Architecture, which saves both time and electricity.

1. "Think Once, Execute Thousands"
I use the AI only during the test creation phase. The Agent analyzes the page, "thinks" for 10 minutes, finds the perfect locators, and saves them to the database as optimized steps.

When you run this test again (e.g., in CI/CD), Debuggo does not use the AI. It pulls the ready-made steps from the database. The execution time for that same test from the DB? 130 seconds.

Compare: 600 seconds (with AI) vs 130 seconds (without AI). We get a speed boost of almost 5x on every run.

2. Self-Healing and Ecology (Green IT)
There is also an ethical angle. We all know LLMs consume massive amounts of energy. Training and inference require powerful GPUs that heat up the planet.

Using "heavy" AI for every run of a regression test (which might run hundreds of times a day) is ecological waste.

In Debuggo, the AI only "wakes up" when it is actually needed—for Self-Healing. If a locator breaks due to a layout change during a fast run (the 130-second version):

The test pauses.
We make one targeted request to the AI: "Find this button again."
The AI finds the new locator and updates the database.
Important: The test continues and is marked as Passed, but with a Warning.

Why a Warning? Because we don't want to hide changes. We are telling the tester:

"Hey, the test passed, but the 'Buy' button moved. I found it and fixed the test, but you should take a look: is this a redesign or a layout bug?"

This way, we get the reliability of an AI agent, but keep the speed of classic code and leave the final control to the human.

My Bet
I'm building Debuggo on the hypothesis that the future of QA is a balance:

Speed and Sustainability of standard code during execution.
Intelligence of AI during creation and maintenance.

This eliminates the main weakness of recorders (brittleness) and the main weakness of AI agents (slowness and energy cost).

If you are tired of fixing tests that break with every layout shift, and you want to do it efficiently—give this approach a try.

I'm waiting for you in the beta: https://debuggo.app

DEV Community

Why "Record and Playback" is Dead (And Why I'm Betting on Natural Language)

Top comments (0)