DEV Community

Cover image for Superpowers With No Aim: What I Found After Stress-Testing an AI-Built App
Andrés Clúa
Andrés Clúa

Posted on

Superpowers With No Aim: What I Found After Stress-Testing an AI-Built App

Look, I love AI. I use it every day. It makes me faster, it helps me think, it writes boilerplate so I don't have to. But we need to have a serious conversation about what happens when you let AI build your app without knowing what you're actually asking for.

A friend of mine shipped a side project recently. Nice UI, clean design, worked well. He built most of it with AI assistance — prompting his way through features, shipping fast, iterating quick. The vibe coding dream, right?

I decided to throw some load at it. Just to see. Artillery, some custom scripts, a couple hundred random payloads. Nothing crazy, nothing illegal — just the kind of stuff any curious person with a terminal could do on a Tuesday afternoon.

What I found was genuinely scary.

The numbers

First test: 690 requests at 2 requests per second. Every single one went through. Zero rate limiting. The server didn't even flinch, but not in a good way — it just accepted everything blindly.

Second test: 3,050 requests with spikes up to 30 per second. Out of those, only 33 got a 429 (rate limited). That's 1%. The other 99% went through like butter.

Then I ran the fuzzer. 780 payloads — XSS, SQL injection, command injection, SSRF, path traversal, oversized strings, weird edge cases. The results:

  • 614 malicious payloads accepted with a happy 202 status
  • 86 payloads crashed the backend with 500 errors
  • 20 requests timed out because the server stopped responding

Let that sink in. I sent <script>alert(1)</script> as an input value and the server said "yeah cool, let me process that for you."

The really bad stuff

The scariest finding was SSRF. I put a cloud metadata URL as a user input — the kind of internal endpoint where AWS and GCP store instance credentials. The server actually tried to fetch it. If that request resolves, someone can pull service account keys, environment variables, the whole thing. No exploit kit needed. Just a URL in a text field.

There were also API parameters sent directly from the client with zero server-side validation. Path traversal payloads went through without any checks. That means potentially reading or writing data you were never supposed to access.

No authentication on any endpoint. No input sanitization. No content security policy. The server was even broadcasting what framework it was running through response headers.

This is not a skill issue, it's a prompting issue

Here's the thing — I'm not writing this to roast anyone. The app works. The UI is clean. The features do what they're supposed to do. The problem isn't the developer. The problem is how we're using AI to build stuff.

When you tell an AI "build me an app that does X", you get exactly that. A thing that does X. It works. It demos well. It looks great in a tweet.

But the AI didn't add rate limiting because you didn't ask for it. It didn't validate inputs because your prompt was about features, not security. It didn't think about SSRF because that wasn't in the conversation. It gave you exactly what you asked for — nothing more, nothing less.

We've become monkeys with shotguns. We have this incredibly powerful tool that can generate entire applications in minutes, and we're pointing it at production without understanding what it built or what it missed. We're not thinking. We're just firing.

"Make it work" vs "Make it right"

There's a massive difference between these two prompts:

Prompt A: "Build me an API that receives user input, processes it, and returns results"

Prompt B: "Build me an API that receives user input. Validate the input format using a strict whitelist. Add rate limiting at 10 requests per minute per IP. Add security headers. Sanitize all user input before storing. Block private IP ranges and internal hostnames to prevent SSRF. Use parameterized queries. Add proper error handling that doesn't leak stack traces or internal details. Return 405 for unsupported HTTP methods."

Prompt A gives you a working app. Prompt B gives you a safe working app. The AI is perfectly capable of doing both — but it only does what you tell it.

And that's the core problem. AI doesn't have opinions about your architecture. It doesn't push back and say "hey, are you sure you want to accept arbitrary input from anonymous users and process it server-side without any checks?" It just does it. It's the most agreeable coworker you've ever had, and that's exactly what makes it dangerous.

The debugging trap

Here's something that doesn't get talked about enough: AI debugging without context is often worse than the original bug.

You hit an error, you paste it into the chat, the AI fixes the symptom. The 500 goes away. But it didn't fix the root cause — it just wrapped it in a try-catch that silently swallows the error, or it loosened a validation that was actually protecting you, or it added a workaround that introduces a new vulnerability.

I've seen this pattern over and over:

  1. Dev builds feature with AI
  2. Something breaks
  3. Dev pastes error into AI
  4. AI patches the symptom
  5. Underlying issue is now hidden AND the patch introduced something new
  6. Repeat until the codebase is a house of cards

When the AI doesn't have the full picture — your architecture, your security requirements, your threat model — its fixes are just educated guesses. Sometimes they're great. Sometimes they remove the one validation that was actually keeping your database safe.

What I actually think we should do

I'm not saying stop using AI. That would be stupid. AI is genuinely incredible and I'd be a hypocrite — I used AI tools to run this entire security assessment.

What I'm saying is:

1. Know what you're building before you prompt. Have an architecture in your head. Know what patterns you want. Know what your security boundaries are. Then tell the AI to build within those constraints.

2. Iterate with intention, not with vibes. "Make it work" is step one, not the finish line. After the feature works, go back and ask specifically about security, validation, error handling, edge cases.

3. Treat AI output like a junior dev's PR. You wouldn't merge a junior's code without reviewing it. Don't deploy AI-generated code without understanding what it does and what it doesn't do.

4. Security is not a feature, it's a requirement. Bake it into your prompts from the start. "Add rate limiting" should be as natural as "add a button."

5. When debugging, give context. Don't just paste the error. Explain what the code is supposed to do, what your constraints are, what you've already tried. The better the context, the better the fix.

The bottom line

The app I tested took probably a few hours to build with AI. It would take maybe 30 minutes more to make it secure — if security was part of the conversation from the beginning.

That's the gap. Not days. Not weeks. 30 minutes of intentional prompting. The difference between "build me this feature" and "build me this feature, and here's how I want it protected."

AI gave us superpowers. But superpowers without direction are just destruction with extra steps.

Build with intention. Prompt with context. Review what you ship.

Or keep being monkeys with shotguns. Your call.


If you want to run similar tests on your own projects (with permission, obviously), the tooling is straightforward: Artillery for load testing, custom JS payloads for fuzzing, and curl for manual recon. Took about 20 minutes to set up and the findings speak for themselves.


Full transparency: This article was written with AI. I don't usually write like this — English isn't even my first language and my usual tone is way more chaotic. But that's kind of the whole point. I ran the security assessment step by step using AI tools, organized my findings as I went, and at the end I asked for a clean markdown copy for dev.to. The irony isn't lost on me — an article about being intentional with AI, made with AI. But that's exactly the difference I'm talking about. I didn't say "write me a blog post about security." I guided every section, every point, every finding, based on real tests I actually ran. The AI shaped the words. The thinking was mine.

Top comments (1)

Collapse
 
solisarg profile image
Jorge Solis

Is strange that any BE framework doesn't handle limit rate or SQL Injection by default. Security team always find something to take a look, but such a security breach is not tolerable.