Ben Dechrai

Posted on Nov 19 • Originally published at bendechr.ai on Nov 6

3 Months of AI-First Development: The Real Results and What I Learned

#ai #programming #productivity #typescript

In Part 1, I outlined my hypothesis and the architectural decisions I made to optimise for AI-assisted development. Now for the crucial question: Did it actually work?

After 3 months of AI-first development, I'm shipping features significantly faster than before. But the real insight isn't just the speed improvement, it's how many "AI problems" disappeared entirely. Features that used to take a day of back-and-forth now ship in hours. The times it fails are just as revealing as when it succeeds.

What Changed

Here's what I noticed:

Feature velocity : Shipping complete features in hours instead of days
First-try success : Most implementations worked without major rewrites
Debug time : Dramatically reduced - AI rarely broke existing patterns
Context switching : Still happens, but anticipating it became second nature
Upfront investment : More time spent on architecture and specs
Pleasant surprises : Occasionally the coding assistant would add thoughtful touches I hadn't specified - better error messages, edge case handling, or UX improvements. These happy accidents happened about as often as complete misfires that needed rollback.

The Most Important Truth: It's Not Magic

The human oversight is everything. I'm not just hitting "accept" on AI suggestions. I'm orchestrating a development process where AI handles the mechanical work while I focus on architecture and review.

How I Made AI Development Location-Independent

The local development environment was crucial for this project. I wanted to be able to work from anywhere. When I'm not travelling, my laptop is almost permanently running at my desk, but I still want to be able to keep the development process going while I'm in line to see the densist, waiting for the petrol tank to fill up, or early for a coffee date with a friend. And in those situations, I want to avoid the constant port number juggling that happens when you work on multiple projects simultaneously.

I containerized everything with Docker. The frontend, the API, and PostgreSQL all run in containers orchestrated by Docker Compose.

Docker Compose packages your app and its dependencies into containers (like lightweight VMs), then orchestrates multiple containers together. Define your entire dev environment in a config file - run one command, everything starts.

But here's the little extra: in development mode, each container connects to my Tailscale network and exposes itself via Tailscale Funnel. This gives me stable URLs - bendechrai.tail[uuid].ts.net and api-bendechrai.tail[uuid].ts.net - that work from anywhere on any device.

Tailscale creates a private network between your devices. Tailscale Funnel exposes services from that private network to the public internet with a stable URL - no port forwarding or ngrok needed.

This solved multiple problems at once. Not only did I say goodbye to port conflicts between projects, I was able to check the work out on any other device I wanted to and also test third party integrations that wanted to call webhooks in my apps.

The Tailscale Funnel setup only runs in development mode. In production, the containers run normally without Tailscale, keeping the production configuration simple and avoiding unnecessary dependencies.

Monorepo Management: pnpm and Turborepo

For monorepo management, I used pnpm workspaces with Turborepo. This combination gives you fast, efficient installs with pnpm's hard-linking approach, plus Turborepo's intelligent caching and task orchestration.

pnpm is a package manager (like npm) that saves disk space by hard-linking shared dependencies. It's stricter than npm - if you use a package, you must declare it. Turborepo orchestrates tasks across multiple packages in a monorepo, running builds in the right order and caching results.

What mattered most was the task pipeline configuration. I set up Turborepo so that running turbo run build from the root automatically builds packages in the right order - shared UI library first, then the frontend and backend. Running turbo run test runs tests in parallel across all packages but with shared cache hits when nothing changed.

This created a development experience where the coding assistant could make changes to the UI library, and a single command would rebuild everything that depended on it. No manual coordination required, no chance of running the frontend with stale component code.

I chose pnpm over npm or yarn specifically for its strict dependency resolution. pnpm doesn't allow phantom dependencies; if a package uses something, it must explicitly declare it. This strictness can be annoying, but it's exactly the kind of explicit pattern that helps coding assistants understand what's available. No guessing, no implicit access to parent dependencies, everything is declared.

Testing Strategy: Vitest and Specification-Driven Development

For testing, I used Vitest with React Testing Library. This is fairly standard for Vite projects - Vitest shares Vite's configuration and runs incredibly fast because it leverages Vite's transformation pipeline.

Vitest is a test framework (like Jest) built specifically for Vite projects. It reuses Vite's config and runs tests blazingly fast. React Testing Library helps you test React components by simulating how users interact with them.

What was less standard was the specification-driven development approach I adopted, inspired by tools like Tessl, Spec-Kit, and OpenSpec. Before implementing any feature, I wrote a specification document in markdown that described what the feature should do, how it should behave, and what edge cases it should handle. Then I wrote tests based on that specification. Only after the tests were written did I implement the actual feature.

This might sound like extra ceremony, but it had a profound effect on AI-assisted development. When I asked the coding assistant to implement a feature, I could point it at the specification and the test file. The AI had clear acceptance criteria and a failing test suite. Its job was simply to make the tests pass while adhering to the specification.

This created a tight feedback loop. The coding assistant would implement something, run the tests, see failures, and iterate. No ambiguity about whether the feature was "done" - if the tests pass and the code matches the specification, it's done.

I considered test-driven development without the specification step, jumping straight to writing tests. But I found that having the coding assistant generate tests based on specifications, which I then review and refine, followed by it writing code to pass those tests, creates a remarkably high success rate. The specification provides context and intent, the tests provide concrete acceptance criteria, and the code implementation becomes almost mechanical. The likelihood of a test and code both being wrong in the exact way that causes a test to pass is, in my experience so far, so close to zero that it's negligible.

Comprehensive test coverage also serves as an early warning system for duplicate implementations. When a coding assistant tries to create new functionality that already exists, well-written tests often reveal the overlap. The new tests either duplicate existing test cases or conflict with established behavior patterns, making it easier to spot when the AI is solving a problem you've already solved.

Running Multiple AI Experiments Simultaneously

One workflow innovation that proved invaluable was using Git worktrees for parallel feature development. Git worktrees let you check out multiple branches simultaneously in different directories. Instead of constantly switching branches and restarting Docker containers, I could have multiple worktrees running different features in parallel.

Git worktrees let you check out multiple branches at once in separate directories. Instead of switching branches (which stops your dev server), you can have branch A running in one terminal and branch B in another - simultaneously.

This matters for AI-assisted development because you can experiment with different approaches simultaneously. Want to try two different implementations of a feature? Create two worktrees, ask a coding assistant to implement one approach in each, run both, and compare the results. Alternatively, have AI work on multiple unrelated features in parallel - one worktree developing a new authentication flow while another builds a data export feature. No branch switching, no merge conflicts, no accidentally clobbering your work-in-progress.

But running multiple instances of the same application comes with its own coordination overhead. Each worktree needs unique ports on your host machine. Docker containers need distinct names. Development URLs need to be tracked. Without automation, you're manually juggling configuration files and hoping you didn't accidentally reuse a port.

I built a collection of scripts to address that overhead, making worktree creation something a coding assistant could handle autonomously. These scripts handled port allocation, generated unique Tailscale hostnames, created Docker container names to avoid conflicts, and set up the entire development environment without manual intervention.

This worked so well that I turned it into devports, a CLI tool that you too can use to handle port and project name management. It's generic too, so you're not limited to Git worktrees and Docker containers. It tracks allocations, renders configuration files from templates, and handles all the tedious bits that previously required manual coordination. Update: I wrote about the full journey and solution in my Managing Ports and Containers article.

When AI Goes Wrong: What I Watch For

Every file change gets scrutinized. Here are my non-negotiables:

Scope creep : Did it touch files I didn't ask it to?
Pattern violations : Business logic in route handlers? New dependencies for existing functionality?
Silent changes : Web copy edits, API response formats, CLI outputs
Over-engineering : 200 lines where 20 would do

The AI wants to be helpful. Sometimes that means being too helpful.

The Hidden Danger: Duplicate Implementations

The most insidious problem I watch for is when AI creates new implementations of existing functionality. A coding assistant might add a helper to validate email addresses, not knowing that email validation already exists in the user service. It might create a new data transformation utility, unaware that similar logic lives in the API response formatters. It could implement error handling patterns that duplicate existing middleware.

This happens because AI assistants typically work within the context window of the current task. They see the immediate files they're working on, but they don't have the broader codebase knowledge that a human developer builds over time. The result? Two different implementations of the same core functionality, each with slightly different behavior, validation rules, or error handling.

The maintenance burden compounds quickly. When business requirements change, you now need to update multiple implementations. When bugs are found, you need to fix them in multiple places. When new developers join the team, they need to figure out which implementation is the "correct" one.

This is where maintaining your own understanding of the codebase becomes critical. You need to know what already exists well enough to catch when new implementations are being created unnecessarily. During code review, ask yourself: "Have we solved this problem before?" If the answer is yes, either direct the AI to use the existing solution or explicitly decide to replace the old implementation with the new one.

This problem also reinforces the value of smaller, focused services following Service-Oriented Architecture (SOA) principles. When each service is small enough to fit mostly within an AI's context window, the assistant is more likely to discover existing implementations before creating new ones. A microservice handling only user authentication is less likely to have hidden duplicate password validation than a monolith handling users, billing, notifications, and content management.

What 3x Faster Actually Looks Like

Now for the core question: did this architecture actually make AI-assisted development better?

Overwhelmingly, yes. But not just because of any single decision - it was the cumulative effect of many small choices that all pointed in the same direction.

When I asked a coding assistant to implement a new feature, it could focus on exactly one thing. If it was working on the frontend, it only needed to think about React components built with Radix UI primitives, TanStack Query for data fetching, and semantic CSS classes for styling. If it was working on the backend, it only needed to think about Hono route handlers, service functions, and Prisma queries. The boundaries were clear, the patterns were consistent, and the AI could work confidently within those constraints.

Future improvement : I'm considering moving UI components into their own separate package. This would create even cleaner separation - the app package would focus purely on data fetching, API communication, and rendering logic, while the UI package would handle component composition and Radix integration. When working on features, AI could focus entirely on application flow without thinking about component internals, or vice versa.

The Crystal DBA Postgres MCP integration for database access was particularly powerful. A coding assistant could query the database schema, look at actual data, and understand relationships without me needing to provide that context manually. This eliminated an entire category of back-and-forth communication.

The specification-driven development approach gave it clear success criteria. Instead of vague feature requests like "add a blog post editor," I could point to a specification that detailed every behavior, edge case, and validation rule. The AI had the same clarity that a human developer would want.

The testing infrastructure meant a coding assistant could verify its own work. After implementing something, it would run the tests, see what failed, and iterate until everything passed. But this isn't magic - I still need to know what I'm doing. I'm very intentional about telling the LLM the smallest change I want to make, reminding it where spec files are located, and specifying which package needs attention. Most importantly, I always review every file change before staging commits.

Here's what I specifically look out for during code review:

Did the LLM change any specs or tests? Unless that was the explicit task, it shouldn't touch these files
Has it deleted large amounts of code? If so, I need to understand why
Has it reimplemented existing functionality? This is where maintaining codebase knowledge matters most. Did it add business logic where it shouldn't? Install packages for existing functionality? Create duplicate implementations? The key is knowing your codebase well enough to spot when the AI is solving a problem you've already solved.

I also ensure comprehensive test coverage, especially for things like CLI tool response formats. If there's a missing newline or extra whitespace, I want that test to fail - and I want to notice it immediately.

The Bigger Picture

Look, I probably went too far with the separation between frontend and backend for a simple portfolio site. The HTTP boundary adds latency and complexity that a Next.js monolith would have handled more efficiently. But that wasn't really the point. This was an experiment - a way to test architectural patterns specifically for AI-assisted development. Sometimes you need to push ideas to their logical extreme to understand their boundaries.

The experience reinforced a fundamental principle from Service-Oriented Architecture: keep services small enough to understand completely. When a service handles only authentication, or only blog posts, or only user preferences, the entire codebase can fit within an AI's context window. The assistant can see all the existing functionality and is less likely to create duplicate implementations. But when a service grows to handle multiple domains, the AI can only see fragments at a time, leading to the reimplementation problems I described earlier.

This project taught me something important about the future of software development. AI-assisted development isn't just about typing less code - it's about returning to engineering discipline we thought we'd outgrown.

We abandoned waterfall practices because humans could muddle through without them. Write specs? Nah, developers can figure it out. Strict boundaries? Too much ceremony. Comprehensive tests before coding? Slows us down. Human brains compensated for all that missing structure.

AI can't compensate. When a human developer encounters messy code, they can ask questions, debug issues, and use intuition to navigate complexity. AI models can't do that as effectively. They need clear patterns and explicit context. Give them that, and they become incredibly productive. Make them guess, and they waste time generating code that doesn't quite work.

The monorepo structure, the service layer pattern, the specification-driven development, the clear separation between frontend and backend - all of these choices added some ceremony to the development process. But they paid enormous dividends in AI productivity. The coding assistant could work independently for hours, implementing features, writing tests, fixing bugs. My job shifted from writing code to reviewing generated code, refining specifications, and making architectural decisions.

I'm not the first person to discover this pattern - plenty of others are finding the same thing. But I can validate the perspective: developers who don't adapt to AI-assisted workflows will be left behind. The future belongs to those willing to return to more rigorous development practices. Not AI replacing developers, but AI amplifying developers who embrace the engineering discipline we thought we'd outgrown. The developers who thrive will be those who return to specs before code, boundaries before integration, and rigor before flexibility - the waterfall practices that turn out to be AI's foundation, not its constraint.

Conclusion

After three months of building production applications with AI-first architecture - deployify.com, braidflow.io, and now this personal site - I've validated what works at scale. This site serves as my experimental platform to consolidate these learnings into a repeatable framework. The site works beautifully - it's fast, it's maintainable, it's pleasant to work on. But more importantly, it proves that we can build better software faster when we optimise for AI-assisted development across any project scale.

The key insight is this: AI-assisted development isn't a new workflow you add to your toolbox. It's a return to waterfall-era rigor that agile development let us skip. Specifications before coding. Clear boundaries between components. Comprehensive tests before implementation. The "ceremony" we abandoned because human developers could compensate - AI needs it all back.

If you're building a new project in 2025, I encourage you to think about these questions: How would an AI model understand this code? How clear are the boundaries between components? How consistent are the patterns? How comprehensive are the tests? If you can answer those questions well, you're setting yourself up for a development experience that's faster, more pleasant, and more productive than anything we've had before.

I plan to release the base framework that emerged from this experiment as open source soon. If you want to be notified when it's available, follow me on GitHub, LinkedIn, or Bluesky.

More importantly, I encourage you to experiment with your own approaches to AI-comprehensible architecture. This field is brand new, and we're all still figuring out what works best.

Want to share your experience with AI-coding? Found something that works well for you? Find me on LinkedIn or Bluesky. I'd love to hear from you.

DEV Community