DEV Community

Building a Database Performance Testing Tool With AI: The Honest Breakdown

Alicia Marianne Gonçalves on May 21, 2026

It still feels a little strange to have AI writing practically all the code — but I decided to give it a real shot on this new project. A bit of co...

Read full post

Mykola Kondratiuk • May 31

asking AI to pick your project is the underrated trap in this workflow. it'll suggest something technically tractable but not necessarily worth building - you end up optimizing for code-plausibility over actual problem fit.

S M Tahosin • May 24

The honesty here is appreciated. Letting an AI write the bulk of the logic for something as critical as a performance testing tool is a great exercise in code review and architecture validation. It’s one thing for the AI to generate the connection pools, but ensuring the metrics are actually statistically significant is where the human element still shines.

Mikhail Golikov • May 29

The "honest breakdown" framing is what made me stop and read. I built secure-log2test in a similar problem space (turn structured logs into pytest cases), and I went the opposite way on the AI question: zero LLM in the generation path, fully deterministic templating with Jinja2. Cost reasoning was part of it, but the bigger driver was reproducibility. The team I work with cannot have a regression suite that emits subtly different assertions across runs.

What you described about cleaning the AI output until it matched what you would have written manually resonates. The deterministic-vs-AI tradeoff in test generation feels like it lands differently depending on whether the output is meant to be read once or shipped to a CI lane.

Repo for reference: github.com/golikovichev/secure-log2test

mote • May 24

The "AI wrote the benchmarks so the benchmarks are suspect" problem you hit is real. I've seen the same pattern: an LLM generates a test suite that accidentally tests the fast path 90% of the time, and you walk away thinking your DB is 10x faster than it actually is.

One thing that helped me: splitting the benchmark into two layers. Layer 1 is hand-written adversarial queries — joins that force index misses, skewed key distributions, concurrent writes during reads. Layer 2 is where AI helps, generating variations of those adversarial patterns to catch regressions. The human designs the stress points, the AI fills in the coverage.

Did you end up mixing hand-crafted test cases with the AI-generated ones, or was the whole suite LLM-produced? I'd be curious what ratio actually caught real perf bugs.

Alicia Marianne Gonçalves • May 27

For this project, since it was my first time working with this type of test, the test cases were 100% generated by AI. I wanted to see if it could understand the context, but I noticed that at times it went off the rails, which confirms what you said. In real-world scenarios, I would provide better context to prevent that from happening.

Ohad Badihi • May 25

The variations-vs-scenarios split is the framing in this post that scales beyond databases. Once you ask the model to invent the adversarial case rather than execute a variation of one you've already designed, you lose the property that lets you trust the output — you knew what the bug looked like before you asked. The 80% codegen number is impressive, but the more interesting metric would be: how much of that 80% you'd still have written the same way after seeing the LLM's first draft. That ratio is the real productivity gain, and I suspect it's lower than line count suggests.

Mudassir Khan • May 26

the N+1 detection use case is the right one to start with — it's the failure mode that hides longest in development and only shows up once you're under real load.

the honest bit is what I appreciate. AI picked the idea too? we've started doing that for internal tooling: give it a list of known pain points and ask it to rank by complexity vs value. the project scoping is surprisingly good; the edge case coverage is where it falls apart every time.

did you find the AI generated test assertions were reliable, or did you end up rewriting most of those by hand?

Alicia Marianne Gonçalves • May 27

Some of the queries weren't quite realistic—the AI kind of “went off the rails”—but I didn't rewrite any of them. The main goal was to understand how it generated those results, so I could see how to improve them when I provide it with context.

TuanPK Builds • May 22

This was more interesting than the usual “AI will replace everything” posts.

The biggest thing I’ve noticed is that AI tools become much more useful once projects grow beyond a few files.
For small prototypes:

brainstorming
rough UI
quick iteration tools like Windsurf feel very fast. But when repos become larger and connected across modules, context management and cleanup become much more important. That’s where tools like Codex start feeling stronger. The workflow differences between AI coding tools are becoming more important than raw code generation itself.

Alex Shev • Jun 12

The honest breakdown matters more than a perfect demo. AI is useful for scaffolding and exploring test ideas, but database performance work still needs careful baselines, repeatable workloads, and measurement discipline. Otherwise the tool can produce something impressive that measures the wrong thing.

Honda Iroban • Jun 11

Hello.
I hope you are doing well. I have been following your work for quite some time and have been deeply impressed by your wealth of experience and technical insight. In particular, your approach to [specific project or technical field] has taught me a great deal, and it is this inspiration that led me to reach out to you.
Let me briefly introduce myself. I am a full-stack developer and AI engineer. I can independently build entire projects from start to finish, covering front-end, back-end, databases, deployment, and integrating AI models into production systems. I can handle every step of the process, from designing real-time data pipelines and optimizing React rendering to designing REST or GraphQL APIs and fine-tuning Transformer models for specific use cases.
I am not seeking mentorship or assistance. I want to collaborate with someone who has extensive experience. I believe that by combining your engineering knowledge, accumulated through years of experience, with my ability to execute quickly across the AI field, we can produce tangible results—whether that be open-source tools, a startup MVP, or a bridge connecting R&D to commercialization.
To prove my capabilities and dedication, I am willing to work without compensation initially. There are no guarantees regarding pay, equity, or promises. I will simply demonstrate the concrete results I can deliver to you. No matter how complex the task, if you provide me with a clearly defined assignment, I will complete it. Once you have verified the quality and reliability of my work, we can negotiate fair terms for future collaboration. Even if that doesn’t happen, you have nothing to lose.
I am not asking for your trust outright. I simply hope you will give me the opportunity to earn it through my work.
If you need the help of a capable professional for problem-solving or a side project, please feel free to contact me at any time. I’m ready to take on the first assignment.
Thank you.
Iroban Honda