Testing AI agents before users do

Derf — Tue, 07 Apr 2026 08:03:31 +0000

A lot of AI testing still feels too dependent on gut feeling.

You run an agent, chatbot, or RAG workflow, tweak a prompt, change a tool, try again and then ask yourself:

Did this actually get better, or does it just feel different?

That was the starting point for QLANKR Test.

I built it because I wanted a faster and more structured way to test AI systems before users do.

The problem

A lot of builders are shipping:

But the evaluation loop is often messy.

It is easy to demo something.
It is harder to inspect quality clearly, compare runs over time, and understand where a system breaks down.

QLANKR Test lets you run an evaluation and get:

The goal is not to replace human judgment.

The goal is to make AI evaluation more structured, repeatable, and easier to inspect.

The main thing I wanted to avoid was “vibe-based testing”.

That feeling where you:

I wanted something that helps create a better feedback loop.

The big questions for me right now are:

If you work on AI products, agents, or evaluation workflows, I would genuinely love feedback.