I've been using AI as a coding assistant for a while now. My usual approach: treat it like a senior developer who occasionally hallucinates after a rough night. Capable, fast, but usually not a great idea to merge their PR without reading it first.
That means I review every line. If something gets generated that I don't fully understand, I stop and ask for an explanation before I let it into the codebase. If it violates how I'd normally structure something, I say so and we refactor. The AI proposes, I dispose or approve, or redirect. It's collaborative, but I'm in the driver's seat (although I don't actually have a drivers license).
I also always work in plan mode, or at minimum ask it to outline what it's about to do before touching a file. That habit alone has saved me from a lot of "wait, why did it rewrite three components to fix one bug" moments.
So when I decided to try vibe-coding, genuinely handing over the wheel, auto-edit on, no review gate... I kind of expected it would go wrong. That was the point.
The Problem I Was Actually Solving
I'm running several workstreams in parallel at any given time: freelance projects, a consultancy, a few side projects, and I teach frontend at HackYourFuture. My task and project management needs don't fit neatly into any single tool.
Bear is great for notes but not for tasks. Notion is powerful but requires too much upkeep to stay useful. Things is clean and opinionated, which I like, but it doesn't bend to the way I think about projects that span different contexts and timelines.
None of them work for me: the way I actually move between things during a day.
So instead of trying to bend another tool to my brain, I decided to build my own. The experiment was the perfect excuse!
I named it Dunzo. If you've watched Parks and Recreation, you know exactly the scene. Tom Haverford: "We are dunzo." It felt right for a task app.
It's live at donezo.vercel.app. My girlfriend and I are currently the only two users. The landing page exists entirely because I needed something convincing enough to get her to actually use it.
How I Normally Use AI When Coding
Before getting into the experiment, it's worth being specific about what I'm comparing against: because "I use AI to code" covers a lot of ground.
My standard workflow looks like this:
Plan before touching files. I describe the feature or problem and ask for a plan first. What files will be affected? What's the approach? Are there tradeoffs worth discussing? Only once I'm aligned with that plan do I let it start generating.
Read everything it writes. Not skim, but read. I want to understand every function, every type, every structural decision. If I don't, I ask. This sounds slow, but it's actually what keeps the codebase coherent over time.
Challenge decisions actively. If it reaches for a pattern I wouldn't have used, I ask why. Sometimes the answer is good and I learn something. Sometimes it was just the path of least resistance, and we do it a different way.
Refactor to my standards before moving on. AI-generated code tends to be correct before it's clean. I don't let technical debt accumulate: each piece gets shaped to how I'd actually write it before the next feature starts.
This workflow is way slower than vibe-coding. It's also the reason the codebases I care about stay navigable six months later.
The Experiment: Auto-Edit On, No Safety Net
The rules I set for myself:
- Auto-edit mode, always on. No plan mode, no "confirm before changes," no second-guessing the approach before it started writing. Just go!
- Act as a technical product owner, not a developer. My job was to describe what I wanted, with precision, using my experience to frame things well. But not to steer implementation decisions.
- No copy-pasting UI from existing apps. If it was going to design something, it would design it from my descriptions, not from screenshots.
The shift that required the most discipline was staying out of implementation. My natural instinct when something gets generated is to read it, react to it, redirect it. For this experiment, I had to consciously resist that: describe the outcome I wanted, let it build, move to the next thing.
Day One: Faster Than Expected
The first day moved quickly. I started with a rough description of what Dunzo needed to be: a personal task management system with tagging, a daily overview, and the ability to group tasks by project or context. I didn't wireframe anything. I just described it in plain language with enough technical specificity to keep the output useful.
Within a few hours I had a working dashboard. Tags worked. The day view worked. Drag-and-drop task reordering worked, powered by dnd-kit. The auth flow was wired up through StackAuth. It wasn't beautiful, but it was functional in a way that would have taken me significantly longer to build with my normal workflow. Not because I'm slow, but because I would have made careful decisions at each step.
What I was doing wasn't passive. My prompts were doing real work. Instead of writing code, I was writing context:
- What the component needed to do
- What edge cases mattered
- What the data shape should look like
- How it connected to what was already built
Eight years of building products gave me a vocabulary for that. I suspect vibe-coding with less experience would produce a much messier first day.
The Redesign: Describing Aesthetics Instead of Code
Midway through day two, I decided the app needed to look better. Not a small tweak, but a full visual redesign. This was the part of the experiment I was most curious about, because design direction is something I'd normally drive very deliberately.
Before
After
I didn't give it screenshots of apps I liked. I described the feel I was going for: sleek, minimal, focused. I talked about the kind of typographic weight I wanted, the density of the layout, how I wanted whitespace to work. We went back and forth: it would propose, I'd react, it would adjust. It felt closer to a design conversation than a code session.
The result was entirely AI-generated based on that input. It's cleaner than what I'd have built if I were coding it myself under time pressure, because I could give feedback on the visual output without getting distracted by the implementation details producing it. There's something genuinely useful about that separation.
The Stack
"dependencies": {
"@dnd-kit/core": "^6.3.1",
"@dnd-kit/sortable": "^10.0.0",
"@dnd-kit/utilities": "^3.2.2",
"@hookform/resolvers": "^5.2.2",
"@stackframe/react": "^2.8.72",
"@tailwindcss/vite": "^4.2.1",
"react": "^19.0.0",
"react-dom": "^19.0.0",
"react-hook-form": "^7.71.2",
"react-router-dom": "^7.13.1",
"tailwindcss": "^4.2.1",
"typescript": "^5.9.3",
"zod": "^4.3.6",
"zustand": "^5.0.11"
},
"devDependencies": {
"@biomejs/biome": "2.4.6",
"@types/react": "^19.0.0",
"@types/react-dom": "^19.0.0",
"@vitejs/plugin-react": "^4.7.0",
"vite": "^7.3.1"
}
Nothing here is surprising, which is itself worth noting. The AI reached for the obvious choices: Zustand for state, Zod for validation, React Hook Form for forms, dnd-kit for drag-and-drop. These are all reasonable tools. They're also the tools that show up most often in training data, which probably isn't a coincidence.
A few observations on specific choices:
Zustand was the right call for a project this size, but the store it generated grew without any deliberate architecture. More on that in a moment.
Tailwind v4 + Vite v7 — both bleeding edge at time of writing. The AI used them without hesitation, which I appreciated. It wasn't defaulting to stable-minus-one out of caution.
Biome over ESLint + Prettier is a choice I'd have made myself. Fast, opinionated, one dependency instead of three.
What Actually Worked
The speed is real. Two days from nothing to a deployed, functional, multi-user productivity app... including a drag-and-drop interface, auth, and a full visual redesign. That's not something I'd wave away.
Prompting well matters enormously. The quality of what got generated was directly tied to the precision of what I described. Vague prompts produced vague code. When I described a feature with specificity (the data flow, the edge cases, the component boundary) the output was meaningfully better. This is not a workflow where you can turn your brain off.
Describing design direction rather than implementing it was a genuinely productive way to work. The redesign felt faster and less precious than it would have if I were writing the CSS myself. I could react to output aesthetically without getting attached to the code producing it.
The app works. That's not a minor thing to say. It does what I need it to do. My girlfriend uses it daily. It's live, it's deployed, it handles two users without issues. For a personal tool, that's the bar, and it cleared it easily.
What Didn't Work
The codebase is a mess. Not broken, but pretty messy.
The Zustand store is bloated. Because state was added incrementally, feature by feature, without anyone stopping to design the store as a whole, it grew into something I'd need to spend real time deciphering before I could safely refactor it. It works, but it's not something I'd want to hand to another developer.
No DRY, no KISS. The AI optimizes for making the current feature work. It doesn't optimize for the codebase as a coherent system (I know that can be prompted too, but I intentionally didn't). Similar logic got duplicated across components because each feature was generated in relative isolation. Abstractions that would have been obvious when writing by hand never materialized.
Tailwind without structure. Utility classes scattered without naming conventions, without component logic, without any discernible system. In a small codebase this is navigable. Scale it up and it becomes painful.
Hardcoded values in places they shouldn't be. Small things, but they accumulate. Magic strings, inline values that should be constants, configuration that ended up in component files. None of it is catastrophic; all of it is the kind of thing that slows you down later.
The pattern across all of these is the same: the AI builds for now, not for later. It has no stake in the maintainability of what it generates. That's a fundamental property of the workflow, not a bug to be fixed with better prompting.
What I Actually Learned
Vibe-coding is a speed tool, not a quality tool. If you want something live fast and the long-term state of the codebase is a secondary concern, it delivers. If you care about the code as much as the product, you'll spend time cleaning up what it built.
Your existing experience sets the ceiling. The precision I could describe features with, the ability to spot when something generated was subtly wrong, the judgment about when to push back: that all came from eight years of building things. Vibe-coding doesn't remove the need for engineering judgment. It just moves where you apply it.
The prompting approach matters more than I expected. There's a meaningful difference between describing what you want like a product owner who understands the technical constraints, and describing it like someone who just wants a feature to exist. The former produces significantly better output. That gap probably narrows as AI models improve, but right now it's real.
State management without architecture is a trap. Zustand is fine. Zustand grown organically across two days of feature additions, with no one designing the store shape, is a liability. If I did this again, I'd stop early and have a conversation specifically about state architecture before letting it grow.
The comparison to my normal workflow is sharper than I expected. When I work with AI carefully (reviewing, redirecting, refactoring as I go) the output reflects my standards. When I vibe-coded, it reflected the AI's defaults. Those are not the same thing, and the codebase shows it clearly.
Would I Do It Again?
For Dunzo: yes! I'll keep vibe-coding it as my needs change. It's a personal tool with two users, and architectural rigor doesn't pay off at that scale. The speed-to-value ratio is right for what it is.
For anything I care about maintaining, collaborating on, or scaling? No. My default workflow exists for good reasons, and two days of vibe-coding reminded me exactly what they are.
The experiment was worth running. The app works. The codebase is something I'll quietly avoid looking at too closely.
That feels like an honest summary of what vibe-coding gets you.
Dunzo is live at donezo.vercel.app. Parks and Recreation fans will get the name immediately. Everyone else will figure it out eventually.




Top comments (1)
I just tried it out, looks great :)
I think autoexpansion of tasks when navigating to a specific day would be great, saves me for bloody extra 1 click. haha
but it is your project that you use for your needs, so I guess this way is more convinient for you?