A year of building with AI and the thing that scared me most wasn't the hallucinations

A year ago I started building TasksForge. The premise was simple enough: use AI to simplify project planning for developers. What wasn't simple was everything else.

Back then, "vibe coding" was just starting to trend, but I went the opposite direction. I treated the AI like a junior engineer who desperately needed structure. I wrote out project descriptions, mapped key features, drafted user stories, and spun up task boards.

I was basically acting like a software architect before letting the machine write a single line of code heavily taking a cue from people like Tim Kitchens (Youtube Coding the Future With AI), who were already pushing this approach early on.

I used Aider paired with Claude Sonnet 3.x. I fed it the full spec and detailed tasks, validated each step, tested, fixed, and rewrote. 30 hours to bootstrap. 150 C$ in AI costs. Roughly 60% AI-generated code. A working MVP released in December 2025.

The code was a mess. The architecture wasn't.

I spent the next few weeks fighting hallucinations, misaligned logic, and partial implementations spending nights cleaning up and stabilizing code I hadn't fully written. There is a very specific kind of exhaustion that comes from debugging logic you don't entirely understand.

What actually saved the project wasn't the AI; it was the architecture. FastAPI backend, Next.js frontend, PostgreSQL. That decision paid off months later when I shipped version two UI improvements, dark mode, user API keys and it went down smooth.
Weirdly, I used less AI that time around. Context limits made it hard for the model to hold the full codebase in its head, and honestly, I just trusted myself more at that point. AI felt like a booster, not a builder.

Then everything changed.

Models got smarter. Context windows blew open. Tools got better, and competition made them cheaper, sometimes free, which is how I ended up back at TasksForge a few days ago after the site broke because my "free" model quietly stopped being free. Classic.

I wanted to add background processing: queue management, async execution, a progress tracking UI. Stuff I had zero real practical experience with. So instead of writing the spec myself, I asked the AI to generate it.

I described what I wanted using Opencode + MiniMax-M3. The AI explored my codebase, pitched Celery for the async processing and Redis for queue management, then generated backend and frontend architecture docs, implementation instructions, and file level changes. Twenty hours of back and forth. No user stories. No over engineering. Just focused, messy, honest specs.

30 minutes.

I took those files, started a fresh session, and fed them right back to the same model.

Thirty minutes later, the feature worked. Docker-compose updated, Redis added, dependencies installed, Celery and FastAPI wired end-to-end. A couple of quick pathing fixes and it was running.

I sat there kind of stunned. Twenty hours of specification. Thirty minutes of implementation. And I had zero hands-on experience with any of the async stack it had just built.

No hallucinations. Minimal rework. It just ran. And sitting there looking at the terminal, I didn't quite know how to feel about it.

What 30 minutes actually built

For context: the old version of TasksForge handled project elaboration through a standard chat box.

It was the classic 2025 trap: when in doubt, slap a chat UI on it. But chat is a terrible interface for structured planning. You couldn’t do a real back-and-forth to update the master spec; it just felt like texting a very fast intern who kept losing the thread.
The 30-minute generation completely ripped that out.

In its place is a sleek, stepped progression UI. When a user hits Elaborate Project, the frontend fires the payload, FastAPI hands it off to Celery, Celery parks the state in Redis, and the UI just listens to the heartbeat.

Because it’s a true background process now, the user isn't held hostage by an open tab. They can click start, close the browser, shut down their laptop, and go make lunch. When they come back later, the frontend reconciles with Redis, catches up to whatever step the worker is on, and eventually drops a clean "Project Ready" modal onto the screen.
Under the hood, the wiring looks like this:

Looking at that diagram now, it is a completely standard, battle-tested async topology.
The unsettling part is that I didn’t draw that map to build the feature. I drew it afterward just to understand what the machine had wired together.

The part nobody talks about

Here is the thing nobody tells you: when the AI gets good enough to outpace you, it doesn't feel like winning. It feels like pressure.

You find yourself reviewing more than you’re producing. You’re sitting there questioning the security and edge cases of code generated in four seconds, knowing it will take you twenty minutes sometimes much longer, to actually comprehend it. The volume of what AI can produce now genuinely exceeds a single human's ability to review it. That’s not a complaint; it’s just true.

There is an unexpected emotional labor to this. You’re never 100% sure if the foundation is solid. You are dealing with unpredictability not at the syntax level anymore, but at the trust level. And that is a completely different genre of exhausting.

Experience is still the thing that compounds

I keep coming back to this, though: the more I do this, the better my instincts get. The more precise my prompts become. The less generalized anxiety I feel when reading a massive auto-generated output. Experience is still the variable that compounds. The models got smarter, but they didn't replace the need for taste.

So I don't know. We aren't being replaced I still believe that. But we are being asked to develop a radically different relationship with our own work, and I think a lot of us are still quietly trying to figure out what that means.

Are you still manually reviewing every line? Or are you starting to trust the machine a little more?

Most days, I'm hovering somewhere right in the middle.