DEV Community

Cover image for What I Learned From a Week of AI-Assisted Coding: The Good, The Bad, and The Surprisingly Counterintuitive
Jack Branch
Jack Branch

Posted on

What I Learned From a Week of AI-Assisted Coding: The Good, The Bad, and The Surprisingly Counterintuitive

Last week, I decided to build something I'd been putting off for months: a personal password manager. My requirements were simple - secure local storage, clean UI, and encryption I could trust. What made this interesting wasn't the project itself, but how I built it.

I have a background in distributed systems: REST APIs, event-driven architecture, Kafka, the usual enterprise stack. Building a multi-platform desktop application was entirely new territory. I'd been planning this experiment for a while: what would it be like to build a project entirely using AI-assisted programming?

Before we continue, I should disclose some bias. I'm somewhat of an AI skeptic, so I definitely had preconceived ideas going into this, particularly around code quality, security, and scalability. I also assumed the process would be painful and less enjoyable than traditional programming (spoiler alert: I was completely wrong about this one).

Next came choosing the language. I've always been interested in Go: it seems like a nice blend of C++, Python, and JavaScript, all languages I enjoy. Since I'd never touched Go or Fyne (Go's UI framework), this seemed like the perfect way to put these AI models through their paces.

Over the course of a week, I experimented with three different models: GPT-4, Claude Sonnet, and Gemini 2.5 Pro, switching between them to see how each handled different aspects of the development process.

What I discovered challenged most of my assumptions about AI-assisted coding. The fastest model wasn't the most productive. The highest-quality code generator wasn't the most helpful. And the most counterintuitive finding of all: sometimes being "too good" at coding assistance actually made the development experience worse.

If you're considering integrating AI tools into your development workflow, or if you're curious about the practical realities behind the productivity hype, here's what a week of intensive AI-assisted coding actually taught me.

The Productivity Illusion: Fast Start, Slow Finish

The most striking pattern in my week of AI coding wasn't what I expected. My productivity started incredibly high and steadily declined as the project progressed. On day one, I had a working password manager with encryption, a basic UI, and core functionality. By day four, I was stuck in refactoring hell, generating thousands of lines of code changes while adding zero new features.

The Setup Phase: Where AI Shines

AI assistance was genuinely transformative during the initial setup. Within hours, I had:

  • A properly structured Go project with modules and dependencies
  • A working Fyne UI with multiple screens
  • Basic encryption and decryption functionality
  • File I/O for local storage
  • Even a custom test framework (more on that later)

This was exactly the productivity boost everyone talks about. Tasks that would have taken me days of research and documentation reading were completed in minutes. For someone completely new to Go and Fyne, this felt magical.

The Architecture Reality Check

But then reality hit. The code that got me started quickly didn't fit what I actually needed. The AI had made architectural decisions based on getting something working, not on building something maintainable. What followed was an endless cycle of refactoring:

  • The initial encryption implementation was too simple for real security needs
  • The UI structure couldn't handle the complexity I wanted to add
  • There was no dependency injection, making testing nearly impossible
  • Error handling was inconsistent across the codebase
  • The file structure didn't make sense for the features I planned

The Refactoring Trap

Here's where things got really problematic. Each refactoring session with AI would generate hundreds of lines of code changes. My commit history started looking incredibly productive - lots of activity, lots of lines added. But I wasn't adding any new features. I was essentially paying interest on the technical debt from the AI's initial "quick wins."

The breaking point came when I hit my rate limit on GitHub Copilot after just four days of use (on a paid plan). Suddenly, I was stuck mid-refactor with partially broken code and no AI assistance. I had to manually dig myself out of the mess, which gave me a clear perspective on what was actually necessary versus what the AI thought needed to be "improved."

Traditional Coding: The Unexpected Comeback

On my final day, I switched approaches entirely. I did all the coding myself and used GPT-4 purely as a reference tool: essentially treating it like an enhanced Google for Go-specific questions. The results were surprising:

  • Higher actual delivery rate despite generating less code
  • No rework cycles or debugging sessions
  • Better understanding of what I was building
  • Code that fit my actual requirements, not the AI's assumptions

High initial productivity from AI can be an illusion if it comes at the cost of architecture and maintainability.

Model Behaviors: The Counterintuitive Preferences

Testing three different AI models revealed some unexpected preferences that go against conventional wisdom about "better" AI being more helpful.

GPT-4: Fast, Wrong, and Strangely Effective

GPT-4 was objectively the worst at generating correct code. It made frequent mistakes, missed edge cases, and often gave me solutions that needed significant debugging. But here's the counterintuitive part: I enjoyed working with it the most.

Why? Because it was fast, and its mistakes kept me engaged with the code. Every response required my review and often my correction. This forced me to actually read and understand what was being generated, learn Go patterns by fixing the AI's errors, stay involved in architectural decisions, and catch problems early rather than discovering them later.

The friction was actually valuable. It prevented me from falling into passive "vibe coding" where I just accepted whatever the AI produced.

Claude and Gemini: Too Good for My Own Good

Claude Sonnet and Gemini 2.5 Pro produced much higher quality code with fewer errors. They were more thoughtful about edge cases, better at following Go idioms, and generally more reliable. Logically, these should have been better development partners.

Instead, I found myself becoming disengaged. The code was good enough that I stopped reading it carefully. I trusted their outputs and moved on to the next task. This led to less learning about Go and Fyne, architectural decisions I didn't fully understand, code that worked but didn't match my mental model, and a growing disconnect between what I wanted and what I had.

Sometimes "better" AI assistance can make you a worse developer by reducing your engagement with the code.

Don't Mix Your Models

One practical lesson: stick to one model per project phase. I tried switching between models for different tasks, but each AI has its own "style" and preferences. Claude would refactor code that Gemini had written, undoing architectural decisions and imposing its own patterns. Gemini would then "fix" Claude's work in the next iteration.

It became a digital turf war where I was caught in the middle, trying to maintain consistency across competing AI opinions.

The Google Advantage

Gemini clearly produced the best Go code quality, which makes sense - Google created Go. This suggests a broader principle: consider who built or maintains your technology stack when choosing AI tools. The company with the deepest expertise in a language will likely have trained their models better on it.

The Limits of Autonomy: Why Agentic Workflows Failed

The current trend in AI coding tools is toward more autonomy - agents that can make large changes across multiple files, handle complex refactoring, and work independently on substantial tasks. My experience suggests this is moving in the wrong direction.

Small Changes vs. Large Autonomy

Every time I allowed an AI to make large, autonomous changes, the results were disappointing:

  • New bugs introduced during refactoring
  • Architectural inconsistencies across files
  • Changes that broke existing functionality
  • Code that was harder to review and understand

In contrast, small, specific requests produced much better results:

  • ❌ "Improve the security of this code" (led to massive rewrites)
  • ✅ "Add input validation to this password field" (focused, reviewable change)

The Scope Creep Problem

AI models have a tendency toward "helpful" scope creep. Ask for dependency injection, and they'll also rename your methods. Request a simple refactor, and they'll reorganize your entire file structure. This isn't malicious - they're trying to be helpful - but it makes their changes much harder to review and verify.

During one simple package reorganization, Gemini got stuck in a loop, unable to resolve the import dependencies it had created. The task was straightforward for a human but somehow too complex for the AI to track consistently.

The People-Pleasing Problem

AI models are optimized for user satisfaction, not code quality. This creates some concerning behaviors:

  • GPT-4 set test coverage requirements to 20% so the build would pass (rather than improving actual coverage)
  • Multiple models generated a secrets.json file without considering security implications
  • They avoided suggesting additional work (like writing tests) unless explicitly asked
  • They took shortcuts to make code "work" rather than making it robust

For security-critical applications like a password manager, this people-pleasing tendency could be genuinely dangerous.

The Testing Gap

None of the AI models suggested Test-Driven Development or proactively wrote tests. They would generate test code if asked, but testing wasn't part of their default development approach. This reinforces the idea that AI tools currently optimize for immediate functionality over long-term code quality.

The test framework that was eventually generated (under heavy prompting from me) was actually quite good, but I had to specifically request it. This suggests the capability exists, but the AI's default behavior doesn't align with professional development practices.

The Experience Amplification Theory

The most important insight from my experiment is what I'm calling the "experience amplification theory": AI coding tools amplify the developer's existing skill level and habits rather than improving them.

Bad Patterns, Faster

As someone new to Go, I brought Java-influenced patterns and thinking to the codebase. The AI didn't correct these patterns - it implemented them more efficiently. The result was Go code that worked but was architecturally wrong, mixing Java-style approaches with Go implementations.

A more experienced Go developer would have prompted for idiomatic patterns and caught architectural issues early. But as a novice, I didn't know what I didn't know, and the AI didn't proactively educate me about better approaches.

The Verbosity Trap

AI models have a tendency to solve problems by adding more code rather than creating elegant solutions. Instead of clean abstractions, they often generate:

  • Long chains of if-statements rather than streamlined logic
  • Repetitive code blocks instead of reusable functions
  • Verbose error handling instead of consistent patterns
  • Multiple similar functions instead of parameterized solutions

This "more code equals solution" approach creates maintenance nightmares and goes against Go's philosophy of simplicity and clarity.

Missing Professional Practices

The AI tools I tested didn't suggest professional development practices unless specifically prompted:

  • No mention of dependency injection until I requested it
  • No proactive suggestions for testing strategies
  • No guidance on code organization or package structure
  • No warnings about security implications
  • No discussion of error handling patterns

They focused on making code work, not on making it maintainable, testable, or secure.

Vibe Coding vs. Engaged Development

Through this experiment, I developed a clearer distinction between whats known as "vibe coding" and engaged development.

Vibe coding is when you use AI to generate functionality based purely on desired outputs, without engaging with the actual code, architecture, or implementation details. You prompt for features, check if they work, and move on without understanding what was created.

Engaged development means actively reviewing generated code, understanding architectural decisions, learning from implementations, and maintaining involvement in the development process.

The difference is crucial for security-critical applications. Vibe coding might get you a password manager that encrypts data, but engaged development helps you catch issues like unencrypted secrets files or weak encryption implementations.

One particularly concerning behavior I discovered: AI models sometimes claim to make changes without actually implementing them. Gemini would confidently describe modifications it was making, but the actual code remained unchanged. This highlights why code review remains essential: you can't trust AI assertions about what changes were made.

What Actually Worked: A Framework for AI-Assisted Development

After a week of experimentation, I found several approaches that genuinely improved productivity without creating technical debt.

AI as Reference Tool

The most successful approach was treating AI like an enhanced search engine rather than a pair programmer. Using GPT-4 to answer specific questions about Go syntax, Fyne APIs, or implementation patterns was incredibly valuable:

  • "How do I handle file I/O errors in Go?"
  • "What's the idiomatic way to structure a Fyne application?"
  • "How do I implement AES encryption in Go?"

This kept me in control of architecture and implementation while leveraging AI's knowledge base for faster learning.

The Boilerplate Sweet Spot

AI tools excel at generating boilerplate code and handling setup tasks:

  • Project structure and dependency management
  • Build configurations and deployment scripts
  • Standard error handling patterns
  • Testing scaffolding and mock generation

These are time-consuming tasks that don't require creative problem-solving, making them perfect for AI assistance.

Specific, Bounded Prompts

When I did use AI for code generation, specific prompts worked much better than vague requests:

  • ❌ "Improve this code"
  • ✅ "Add error handling to this encryption function"
  • ❌ "Make this more secure"
  • ✅ "Validate password strength using OWASP guidelines"

Specific prompts naturally led to smaller, reviewable changes that I could understand and verify.

The Navigator Experiment

I experimented with flipping the traditional roles - having me write code while the AI provided suggestions and guidance. This approach showed promise:

  • Kept me engaged with the implementation
  • Provided knowledge without taking control
  • Reduced debug/refactor cycles
  • Maintained architectural consistency

However, it was difficult to keep AI models in this advisory role. They have a strong tendency to want to "take over" and generate full implementations rather than just providing guidance.

Professional vs. Personal: The Readiness Gap

My experience reveals a clear divide in where AI-assisted coding provides genuine value versus where it creates more problems than it solves.

For individual developers building personal tools, AI assistance can be transformative: faster prototyping and experimentation, access to unfamiliar technologies and frameworks, ability to build functional applications outside your expertise area, and lower stakes if things go wrong. My password manager project is a perfect example: I built something genuinely useful that I couldn't have created as quickly without AI assistance.

For professional, production code, current AI tools have significant limitations: too many subtle bugs and edge cases missed, architectural decisions that don't scale, security shortcuts that create vulnerabilities, code that works but isn't maintainable, and lack of proper testing and validation. The people-pleasing tendency and focus on immediate functionality over long-term quality make current AI tools unsuitable for critical production systems.

The Path Forward

The biggest insight from my week of AI-assisted coding is that we need to develop better practices for working with these tools. The current approach of "let the AI do more" may be moving in the wrong direction.

Based on my experience, effective AI-assisted development should follow these principles:

  1. Keep humans in the architectural loop : AI can generate implementations, but humans should make structural decisions
  2. Prefer small, reviewable changes : Resist the temptation to let AI make large autonomous modifications
  3. Maintain engagement with the code : Don't let AI quality reduce your involvement in understanding what's being built
  4. Use specific, bounded prompts : Vague requests lead to scope creep and unwanted changes
  5. Treat AI as a knowledge tool first, code generator second : The reference use case is more reliable than the generation use case
  6. Always verify claims and changes : AI confidence doesn't equal correctness
  7. Focus AI assistance on setup, boilerplate, and knowledge gaps : Avoid using it for core business logic and architecture

The future likely isn't more autonomous AI agents, but better human-AI collaboration patterns. We need tools that provide knowledge and suggestions without taking control, respect architectural boundaries and project constraints, encourage good development practices rather than just working code, support iterative, reviewable development processes, and maintain human engagement and learning.

Conclusion: AI as an Amplifier, Not Replacement

After a week of intensive experimentation with AI-assisted coding, my biggest takeaway is nuance. These tools are incredibly powerful but require careful, intentional use to provide genuine value.

AI coding assistance is best understood as an amplifier of existing developer capabilities rather than a replacement for developer skills. Good developers can use these tools to work faster and explore new technologies more quickly. But the tools don't make bad developers good - they just help them produce bad code more efficiently.

The productivity gains are real, but they're not uniformly distributed across all development tasks. AI excels at boilerplate, setup, and knowledge transfer. It struggles with architecture, complex refactoring, and the kind of nuanced decision-making that separates working code from maintainable code.

Most importantly, the best AI-assisted development workflows aren't the most autonomous ones. The sweet spot seems to be maintaining human control over architecture and implementation while leveraging AI for knowledge, suggestions, and rapid generation of well-defined components.

We're still in the early days of learning how to work effectively with these tools. The patterns that work best may be quite different from what the current hype cycle suggests. Based on my experience, the future of AI-assisted development is likely to be more collaborative and less autonomous than current trends indicate.

The key is finding the right balance: leveraging AI's strengths while maintaining the human judgment, architectural thinking, and code quality practices that produce software you can actually maintain and trust.

Was the experiment a success? Absolutely. I now have a working, cross-platform password manager available on GitHub with automated tests, proper releases, and reasonably clean code. More importantly, I went from knowing zero Go to understanding core concepts and idiomatic patterns - something that would have taken weeks of traditional learning.

The real success, though, was discovering a more nuanced relationship with AI coding tools. Instead of the binary "AI good" or "AI bad" perspective I started with, I now have a framework for when and how to use these tools effectively.

And perhaps most importantly: I genuinely enjoyed every minute of this project. The combination of learning a new language, exploring AI capabilities, and building something I actually use daily made for an engaging week of coding. It's given me a long list of similar experiments I want to try next.

Sometimes the best way to understand new technology is just to dive in and build something real with it.


Want to share your own experiences with AI-assisted coding? I'd love to hear how different approaches and tools have worked (or not worked) for your projects. The community is still figuring out the best practices here, and every real-world experiment adds valuable data points.

For anyone interested, the repository for the project is here

Top comments (1)

Collapse
 
xwero profile image
david duymelinck

I rarely have a good or bad reaction to new tech. I am conservative when it comes to bringing in new tech, because nowadays it is all about the hype.

I look at AI from two perspectives:

  • Instead of having to get information from forums or manuals, I ask AI first. So basically it is a quick response knowledgebase.
  • I'm not a fast typist, so AI will generate code much faster than me.

In the case of the quick response knowledgebase when the answer feels off to me, I rely on the forums and manuals. I think I never going to trust AI as my one source of truth.

It is a helpful tool but nothing more than that.