DEV Community

Derf
Derf

Posted on

Testing AI agents before users do

Site: [https://test.qlankr.com]

A lot of AI testing still feels too dependent on gut feeling.

You run an agent, chatbot, or RAG workflow, tweak a prompt, change a tool, try again and then ask yourself:

Did this actually get better, or does it just feel different?

That was the starting point for QLANKR Test.

I built it because I wanted a faster and more structured way to test AI systems before users do.

The problem

A lot of builders are shipping:

  • AI agents
  • chatbots
  • RAG systems
  • tool-calling workflows

But the evaluation loop is often messy.

It is easy to demo something.
It is harder to inspect quality clearly, compare runs over time, and understand where a system breaks down.

What QLANKR Test does

QLANKR Test lets you run an evaluation and get:

  • a structured report
  • a QI score
  • clearer signals on what feels weak, inconsistent, or unreliable

The goal is not to replace human judgment.

The goal is to make AI evaluation more structured, repeatable, and easier to inspect.

What I wanted to improve

The main thing I wanted to avoid was “vibe-based testing”.

That feeling where you:

  • try a few prompts
  • get a decent answer once
  • assume the system is good enough
  • then discover later that it breaks in real usage

I wanted something that helps create a better feedback loop.

What I am still figuring out

The big questions for me right now are:

  • does the report feel genuinely useful?
  • does the score make sense?
  • what is still missing for real-world AI testing?

If you work on AI products, agents, or evaluation workflows, I would genuinely love feedback.

Site: [https://test.qlankr.com]

Top comments (0)