Rex Zhen

Posted on Dec 16, 2025

Vibe Coding: From Hell to Heaven in One Insight

#ai #sre #career #devops

Vibe Coding: From Hell to Heaven in One Insight

As an SRE, I can spin up production infrastructure in my sleep. Terraform? Give me 2 hours and you'll have a complete ECS cluster and services with monitoring, networking, and CI/CD pipelines. But application code? That was not my domain.

Until AI changed everything.

I want to share two real-world projects where I learned to leverage AI—one spectacular failure, and one surprising success.

Project 1: The Slack App That Humbled Me (Hell → Heaven)

Week 1: The "Vibe Coding" Disaster

I decided to build a Slack application. The infrastructure? Done in 2 hours. The application code? That's where hell began.

My approach was simple: describe what I wanted to AI, copy-paste the code, deploy, and ship.

It didn't work.

Error after error. I'd paste the error back to AI, get new code, redeploy. Rinse and repeat. After one week of this back-and-forth, I couldn't move forward even one step.

I was stuck in what I now call "vibe coding hell"—blindly following AI without understanding the fundamentals.

Week 2: The Breakthrough

I stopped. Took a breath. Went to Slack's official SDK documentation and actually read it.

I learned:

What features Slack offers
How the SDK modules work
The proper workflow for Slack apps

Then I went back to AI—but this time, I gave it clear architectural instructions based on my understanding. The app was done in 3 days (including learning time and one complete rewrite when I misunderstood terminology).

After that? Any new feature took minutes to implement.

The Lesson

You can't outsource understanding to AI.

Software design and architectural decisions still come from humans. AI is a powerful assistant, but you need domain knowledge to guide it effectively.

The key insight: AI amplifies your capabilities when you provide the right direction.

This made me wonder: What happens when you combine domain expertise with AI assistance?

My second project showed me the answer.

Project 2: Building a Production-ready LLM Platform in 3 Days (Pure Heaven)

I had an idea: build a complete inference platform to host LLM models and fine-tuned variants.

My knowledge level: I learned the term "inference" the night before I started.

Timeline: 3 days to production-ready.

What I Built

Infrastructure (1 hour via Terraform): Complete cloud stack
Frontend Web UI: Full-featured interface
2 Backend Inference Services: Hosting different LLM models
Automated Training Pipeline: End-to-end data processing
Performance Optimization: 28-30 seconds → 3-4 seconds per query (pure software tuning, no hardware upgrades)

The Breakthrough

With my SRE background (system architecture, performance optimization, infrastructure patterns), I could guide AI effectively. I understood the options AI presented and could make informed decisions about:

Architecture patterns
Performance trade-offs
Infrastructure design
System integration

AI drove 80% of the implementation, but I drove 100% of the architectural decisions.

This is the power of combining domain expertise with AI assistance—you become a force multiplier.

What I'm Starting to Realize

After these experiences, I'm starting to see patterns (though I'm still figuring this out):

From Coder to Conductor

I'm not writing as much code manually anymore. Instead, I'm spending time on:

Architecture and design decisions
Giving AI clear direction
Validating and refining what it generates
Ensuring quality, performance, and that pieces fit together

It's like an orchestral conductor—I don't play every instrument, but I ensure everything works together harmoniously.

It reminds me of when we went from manually configuring servers to writing Infrastructure as Code. The skill shifted, but we didn't become less valuable. That's what's happening again.

Domain Knowledge Became My Superpower

Here's the ironic part: My Slack app failed because I tried to skip learning the fundamentals.

The LLM platform succeeded because my SRE background gave me the mental models to guide AI effectively.

AI doesn't replace what you know—it multiplies it.

Coming Next: I'm planning to build a full stack of application in Rust—a language I've never learned. This will test whether the principles I've learned apply across domains. Stay tuned.

Follow me for more cloud architecture insights, SRE war stories, and practical lessons on thriving in the AI era.

Previous article: AWS SRE's First Day with GCP: 7 Surprising Differences

Top comments (1)

Alton Lam • Dec 26 '25

I use AI to write code all the time, but I find that AI cannot write code from scratch. If you give it a task it will do the minimum amount of work to complete the task, but that's not how I want the code to be structured. I want the code to be modular, and I want to build framework and methods that can be applied towards different scenarios, so I always start by figuring out all the different functions that I need, and then I figure out how each functions interact with each other, and then I use AI to write a function with certain input parameters and output. Once you have the function, you just can't assume that the function will work the way you expect, so I tell AI to write me a unit test function where I feed it different inputs and I verify the results of the function. The function can call another function or it can be an api call. Is the function using the correct syntax for the api call? Does the function have checks for bad inputs? Does the function return correctly? Put in something unexpected, will the function still works? Then you feed the results of the unit tests back into AI to improve the function. Once you have the function working. Then you build other functions, and you keep on doing that until all your functions are written. Once all the functions have been tested, then you integrate it together to form a module. Then you put it all together to form an application.

You delegate to AI as if you are delegating to hundreds of programmers. This will force the code to be written in a modular way.

There are many decisions that humans are making. Some of the decisions are architectural. Some of the decisions are project management related. Some of the decisions might come from external factors that AI will not have any context about. That is why humans need to drive the decision making. For example, if I ask AI to write me a script that will package directory A, directory B, directory C, and directory D, then AI will hard code the directories, but it is missing some context, like directory B, C, D can be derived from directory A. Given Directory A, follow this algorithm to derive directory B, C, and D. Therefore I reduce the risk of error because I only need to supply Directory A, and I don't have to supply directory B, C, and D. In addition, I told AI that Directory A needs to follow a certain format, and I told AI to check for the presence of the directory. I also tell AI that I want all these particular values to be variables. Why do I make these particular values variables? Because they are going to be dynamic and varied. But of course AI doesn't know that because AI does not have the context of the source. Then I told AI that I don't like the way a function looks. It works, but it's too complex. It looks like gobbledygook. I have trouble debugging it. I tell AI to rewrite it so that it is broken up into multiple steps. I also tell AI that I don't like the variable names. Make your variable descriptive so I don't have to guess what these one letter character variables stand for. Then I tell AI that I don't like the order that AI put things in. Functions goes on top. Don't embed my functions in the middle of the script. Centralize the global variables in the same location on the script. Don't intersperse the variables into the script. I also tell AI to annotate the function calls so that I can see the flow of the function to check for logic flaws. I also tell AI to annotate the inputs of the functions and return value with the type. Is this input a string or integer or dictionary or list. Finally, I tell AI to add a debug option and add lines showing outputs of intermediate steps.

Why am I forcing AI to put all these things into a script?

reduce errors
reduce logic flaws
ease of troubleshooting
readability of code
modularization of code
reusability of code
reducing complexity

If you remove the human element from the process, then AI will just write it in binary, and then it would be a complete mess.