DEV Community: Trey Hutcheson

So You Want to Write an App?

Trey Hutcheson — Mon, 07 Jul 2025 23:55:33 +0000

Introduction

A couple of months ago I started a blog series on dusting off some of my engineering skills, and that quickly shifted to focus on learning how to leverage AI. What start as a silly exercise with entering bank transactions quickly changed to working on a side project.

The result is Builds. Builds is social media platform focused on car enthusiasts. It's mobile-first, using React Native. I knew it wasn't going to be easy, but I completely underestimated just how much work was required to get an app off the ground.

Concept Behind Builds

I've been into cars for a very long time. I started participating in car-related activities two decades ago, when I purchased a 2005 Infiniti G35 Sedan. I quickly found an online community for owners of that car (powered by vBulletin), and I deeply integrated into that community. The G35 wasn't a particularly special car, but that didn't matter. It was my car, and I did everything in it. I autocrossed it, I drag raced it, I attended car shows, dyno days, local meets and big national events. I heavily modified the car, and kept decent records. I had lots of dyno pulls and drag strip timeslips to measure the effectiveness of different mods.

That's when I first started to conceptualize this app. Though this was 2005, so pre-mobile. At the time I wanted to build a desktop app in c#/Winforms, that connected to an internet-hosted database (this was also pre "cloud"). I wanted to build a piece of software where I could keep a historical record of my car. It would support embedding media & attachments, such as dyno run data files and ECU datalogs.

The "killer feature", to me, was the concept of versioning; i.e., to apply the concept of version control to the automotive build log. I wanted to visualize changes over time, and include a diff viewer. There were secondary features, such as comparing data from timeslips or dyno runs, but versioning was the core concept.

I never actually started that project. I just never had time. Over the years, the concept evolved to incorporate social concepts. And of course that just meant the scope kept growing, which consequently meant I was even less likely to work on it.

Change in Personal Circumstances

In April of this year, I was laid off by my employer. I won't go into the details here, but it was a very difficult experience. Nearly 3 months later, I am still unemployed. I am diligently pursuing employment, with almost 200 applications submitted. But I also chose to use this time productively. I didn't take any "time off" after the corporate grind; instead, I've been working even more than before.

I will never claim to be thankful that I lost my job. But without this huge block of time, I never would have started this project.

Project Scope

So what is Builds now? The core concept remains the same, but with a social media twist. Users have a Garage which serves as their profile. A Garage can have 1 or more Rides, and each Ride can have 1 or more Builds. A Ride defines attributes like its Year/Make/Model/Trim, and each Build has an associated set of modifications, attachments, and custom attributes. Users can search for other users, follow/unfollow them, comment on rides/builds, share their own statuses and start discussions. Users can even create and check into events, with location support.

At a high level, this doesn't sound like a lot of work. But in actuality, everything I mentioned includes multiple screens and some fairly complex business rules. At this moment, there are 22 distinct screens. In the front end alone, there are over 600 unit tests and 100 journey tests. The front end is clocking in at 35k lines of Typescript, with the backend at 25k. Those aren't huge numbers, but there's no way I could have ever made it this far while still having a full time job.

Project Organization

I am using a monorepo, hosted in a private github repository. It currently has:

The mobile front end (React Native + expo, Typescript)
The backend service+api (Node/Fastify/Apollo GraphQL, Typescript)
A separate database migration tool
An administrative console with its own backend and front end
A data seeding tool to feed data into the application via its api
An external api data crawler to scavenge automotive Year/Make/Model/Trim data into my dev and prod databases

The backend has a rich suite of unit tests and integration tests. The integration tests use testcontainers and automatically apply database schema and seed data.

The frond end has its own unit tests and journey tests, including asserting on user telemetry. I haven't implemented on-device automated testing, but it's in the plans.

The database migration tool is feature rich, and has already saved me a few headaches. It also has a --dry-run option to the full application of all migrations before applying them to any of my environments.

Suffice it to say, there is quite a lot here, particularly for a solo developer. And I've leaned on AI heavily for this. First it was Cursor, and now it's Claude Code. AI has been absolutely indispensable.

Conclusion

This is the first in a new series of posts, following my experience in trying to launch my own app. I'm a complete novice with this process, and I'm sure much of what I've experienced will come as no surprise to many others. While I've found the technical side to be largely rewarding, there have been quite a few frustrations. I hope that my sharing those can help prepare others that are considering similar endeavors.

Claude Code is First Choice

Trey Hutcheson — Tue, 17 Jun 2025 18:15:34 +0000

Ok so I've now been using Claude Code for a little over a week. It has supplanted Cursor as my primary tool for AI assisted coding tasks, and the transition was very rapid. In fact, I'm not sure I'm going to continue to pay for Cursor.

Monorepo, Multiple Cursor Instances

One of the things that I discovered early on with Cursor was that it would frequently get "lost" in my monorepo. If it ever wanted to run an external command, such as installing another npm dependency, it would do it in the wrong directory (usually the repository root). The same thing would happen when trying to run the linter, or unit tests, etc. Almost every single time, I had to explicitly tell it to run a command in a given subdirectory, or just reject the command altogether and run it myself.

To address this problem, I began running searpate Cursor instances for each major layer/service. That helped, but created another problem. I could no longer ask Cursor to implement a feature across the database, api, backend, and front end. Basically, I had to repeat myself a lot. That became quite tedious.

That is not a problem with Claude Code. I run CC from my repo root, and it has access to the entire repo. This approach has certainly simplified introducing app-wide features.

Claude is my Design Assistant

I've come to value Claude as my design assistant more than being my coding assistant. Before I do any work, I thorough discuss what I want to do with Claude, iteratively refining the content. Only after the item is sufficiently refined do I let Claude run with the code. This approach has significantly improved the quality of the output; I'm spending fewer cycles telling Claude that it got something wrong and to try again.

Improved Workflow

I'd like to share my current workflow. Hopefully others can benefit, or adapt it to their needs. Keep in mind that I am a solo dev, so the workflow would definitely need to change in a team environment.

The Feature Log

This is the part that definitely would need to change in a team environment. Before I start any major work, I start by defining my vision for the feature. This is traditionally what Product does. My repo contains a directory I've simply named "feature-log", and in that directory I create a new markdown file for each feature I implement. In an enterprise environment, this information would be in somethiing like Aha or Jira. But the main thing here is that it's what gives Claude the necessary context to begin refinment, and it's part of the repo, so Claude doesn't have to reach out to an external tool for the information.

I've refined the format and content of the feature log and reached a model that works well. I spend some time up front describing the vision of the feature, and explicitly cite any open discussion points or technology decisions. When I think I've provided enough context, I move on to refinement.

Refinement

It is at this point that I fire up a new instance of Claude. My first instructions are to ask Claude to read my project's documentation, to read my "feature template" (more on that later), and to read the feature log for the current feature.

Claude dutifully reads all of the resources, and usually responds with suggestions on changes it can make immediately. I have to ask Claude to slow down and explain that I want to work through requirement or design/techincal decisions. At this point, Claude and I then engage in collaborative discussion. I find this step of the process immensely fulfilling. I offer Claude a chance to ask questions, and I in turn ask Claude my own questions. This process can take literally hours, depending on the complexity and up front work I've done on the feature definition.

After the refinement has achieved sufficient detail, I then ask Claude to define an implementation plan, and to update the feature log with that implementation plan. I'll review the plan, which offers another opportunity to iteratively improve things. When I think the plan is ready, I'll ask Claude to revise the plan in the feature log, and to proceed to implementation.

I normally give Claude specific instructions around my database schema and graphql api. If the feature requires changes to either, I request those be the first two items in the implementation plan. I also request Claude to show me the changes and let me validate them externally before moving on. This process has allowed to me catch a few database changes that passed the eye test but were in fact invalid. And by executing these two steps first, it saves time from replanning or re-executing the code portions if something is wrong.

Execution

After I give Claude permission to execute its implementation plan, it's time to sit back and do something else. Claude will usually need permission to do a few things, so I can't let it run 100% unattended. Claude will spend some time chugging through the code, and when it's ready, I will validate. Eventually my automatest testing story will be mature enough that hopefully Claude can perform the validation itself, but we're not there yet.

There are almost always problems. Sometimes it takes many attempts to work through the issues. But even with issues, the quality of the resultant changes is so much better than what I ws getting before.

Wrapping Up The Work

After all the work - validation, passing unit tests, no linter or formatting issues - I ask Claude to update the feature log with full details of what was actually done. That includes ideas for improvements, things that were possibly descoped, or areas that diverged from the implementation plan. I find it extremely valuable to compare what was planned versus what was implemented, and use that as a source to create new tickets in Linear.

At that point, I stage the files (I still don't trust the agent for this), and then ask Claude to commit with a good message. That concludes the work, and I am ready to move on to the next item.

I have found this process works for small items to large features. The only difference is the amount of prep work, and of course, larger implementations are more likely to require more iterations.

The Feature Template File

Earlier in the post, I mentioned a feature template file. After I followed this process a few times, I captured the main points into a file I called the feature template. It details the process I just outlined: feature vision/description, refinement, planning, execution, and documenting the actual results. I also included a section on some patterns implemented in various places (maybe this would be better in the CLAUDE.md file).

So when I start a new feature, I ask Claude to read this file so it understands the process I wish to follow. This has helped prevent Claude from being too eager to implement changes.

Conclusion

I've found a groove with Claude Code. I still have Cursor up so I can reference things more quickly, but my usage of Cursor for actually making changes has been reduced significantly. I'm not ready to completely give it up yet, but right now it's hard to justify it's continued use.

First Impressions of Claude Code: Where Does it Fit?

Trey Hutcheson — Tue, 10 Jun 2025 15:43:22 +0000

Introduction

Claude Code has been getting a lot of attention lately. I've been following its increase of the youtube mindshare, and I finally decided I needed to try it out for myself. Since I've been using Cursor so heavily, I was curious about what value Claude Code actually provided: how I would incorporate it into my workflow and if it complimented or replaced Cursor.

The topic of combining Claude Code with Cursor has been covered in some new videos in the past week, and I watched each of these:

Each video is interesting and informative in its own right. This is where I learned that Claude Code was now offered as part of their subscription plans, and is no longer limited to the metered api access. However, neither actually addressed how combining Code with Cursor actually improved anything. So I decided I needed to try it for myself.

Disclaimer - this is extremely early feedback. It's only been a single day, and I am sure my impressions will be completely different in a month.

Why I Hadn’t Tried Claude Code Before

To be completely honest, I haven't seriously considered Claude Code before now because I just have too many things in flight at the same time. Getting up to speed on a completely new technology stack, its attendent build chains, best practices, etc, is complex enough on its own. But that's why I'm using Cursor in the first place, to do most of the heavy lifting. I didn't want to throw something else in the mix, and get distracted by the tool and not be able to focus on the code.

Furthermore, as I said in the introduction, Code has been locked behind the metered/pay-as-you-go boundary. I have been afraid of using it to implement a complicated feature, and the cost spiraling out of control; potentially leaving me with something incomplete that a lesser model couldn't deal with. This fear of potentially unbounded cost was the major sticking point. Now that Code has been unlocked with the subscription plans, I could justify it.

Installation and Setup Experience

Claude Code is pretty simple to install, at least in common/supported environments. It requires npm, and most engineers have npm setup. On my ubuntu 24 machine, it was a one-line installation command. However, on my arch machine \, it was a different story. Claude Code is expected to be a global installation, and that did not match my arch box (to be honest, I don't remember setting up npm on that machine). Fortunately the error I ran into was a known issue, and Anthropic have a page with details on how to resolve it. The instructions are all based on bash, and if you use an alternative shell (I use fish), then you'll need to translate the steps to that shell.

After setup, launching code is as simple as running claude from the terminal. It was actually a breeze (except for the Arch+fish hiccup). I decided to run it from the terminal directly at first, and save any Cursor integration for later. I took this opportunity to create a brain dump of the overall project, describing the tech stack, domain model, monorepo layer, important patterns, user flows in the front end, etc, all in a requirements doc. I asked Claude to read the requirements doc and acknowledge that it understood or to ask any relevant questions.

This part was cool. Claude asked me a few things to made sure it understood relevant patterns or design decisions, where I was too vague in my document.

The First Task: A Rocky Start

So the initial setup experience was a breeze. It was time to give Claude some real work, and to try out the much-hyped Claude 4 models. This past week I went through another huge refactor, and it literally broke everyhing: the database, the backend, the api, the front end, navigation, supplementary tools such as data seeding - everything. I should probably write about that experience separately as it's not directly related to AI. So since everthing was broken, I decided I needed to toss out all of the backend's unit tests around the DAO and GraphQL resolver layers. This felt like a great opportunity for Claude 4 Opus to flex its coding chops.

I explained to Claude about the recent refactor, and my decision to replace the previous unit tests. I asked Claude to then implement a test fixture for a single dao unit, and to let me review it; if I liked it, we could use that as a pattern for testing the remaining dao's. Claude acknoledged, and spun for a good 5 to 10 minutes, and then did nothing. Claude started reporting errors in red; the requests to the backend were timing out. Claude attempted to retry up to 10 times, and then it just stopped. When it stopped, it stopped with no real discussion; Claude didn't tell me that there were problems. It didn't tell me where it was in it's implementation. It gave me no feedback at all, other than requests were timing out.

This was not a good beginning. I had literally just signed up for the $100/month MAX plan, and at 6pm central on a Sunday evening, Anthropic didn't have enough capacity to serve my requests. That was exceptionally frustrating. Furthermore, the user experience/ergonimics were lacking. I had to tell Claude that it wasn't working and to ask it to resume it's work.

Eventually, Claude was able to give me a candidate test fixture for review. But because of the capacity issue, it required multiple manual nudges from me to keep it going. The test was higher value than whatever was spit out by Cursor's "auto" agent two weeks prior. So I approved, and asked Claude to follow that pattern to implement tests for the remaining dao's.

Let me say this - Claude 4 Opus is slow. Like, really slow. I understand that's a function of its reasoning ability, but my initial impressions were not good. I had to babysit Claude for several hours as it attempted to write tests, run them, and work through an endless series of compile/type/lint errors. I eventually ran out of Claude 4 Opus requests within my 5 hour time window. The code being tested isn't particularly complicated; if Claude was having problems with it, how would it perform elsewhere?

I was quickly losing trust. It was getting late, so I decided to resume the next morning. When I did resume, it took another good half-hour to work through the remaining compile/lint/failing test issues.

The Missing IDE: /ide Misdirection

One of the "features" recently introduced with Claude Code is the ability to integrate with an ide; that is, if you're running CC from a terminal window within an IDE, it can be context aware. The extent of this integration wasn't exactly clear, and although it was mentioned in the videos, this feature wasn't explored in any depth.

So after we were done with the unit tests, I tried out the ide integration. Supposedly, it was as easy as:

/ide

But it wasn't. When I tried that command, Claude showed me the detected IDE's, and there was nothing in the list. I explained to Claude that it was running in a terminal from within Cursor, and Claude tried to debug the issue. It tried to set bash environment variables, and when that didn't work, fish environment variables. After several failed attempts to detect the ide (including restarts of the IDE, and even my gui shell), nothing worked. After roughly 30 minutes of debugging, Claude came back and said that the ide integration feature had been removed.

That was incredibly frustrating. It was a complete waste of time. Confusing things even more is the fact that this feature remains on anthropic's site. I guess I'll just continue without any kind of ide integration.

How It Compares to Cursor (So Far)

So how does it compare to Cursor? That's a tough question, and I'm not sure it's the right question.

It's way too early for me to know for sure, but I don't think it's an either-or situation. I think the tools can be complimentary, but I need more time to explore. I will say though that in the day I've been using Claude Code, I haven't actually done anything in Cursor.

Conclusion

Keep in mind that these are initial impressions and will definitely change. I was initially frustrated by the fact that functionality I had purchased (i.e., capacity) was not available. That's not a good look. The fact that I wasted time trying to enable Cursor integration, only to later learn that feature has been removed, was also frustrating. And lastly, I was unimpressed by the difficulty Claude 4 Opus had generating new, functional, unit tests for my DAO layer. In fact, I was surprised by how many iterations were required. But we eventually got there.

The next steps are to give it something more complicated, and to better learn its strengths and weaknesses and determine how to incorporate it into my workflow.

More Cursor Lessons

Trey Hutcheson — Fri, 30 May 2025 17:21:41 +0000

I have been using this blog to capture things I've learned as I've tried to develop Vibe Coding skills. This is another such post, but with some specifics that hopefully can help someone else.

Supervision Required

I cannot stress this enough - supervise whatever your agent is putting out. That's kind of anthetical to the core concept of "vibe coding", where you aren't even paying attention to the code. But you have to.

I just spent two entire days refactoring code I've teased out of Cursor over the past month. That was not a pleasant experience.

Cursor Rules

I learned this one the hard way - you absolutely need to leverage cursor rules. Spend some time familiarizing yourself with how they function, and browse rules that others have put together. I've come across the following while watching videos or just googling:

awesome-cursorrules - Github repo. Lots to choose from, will require time to pick rules that fit your situation. Some of the instructions are out of date.
playbooks - Great interface for finding/filtering useful rules.
cursor.directory - Searchable directory that also includes MCP servers

I had been putting off configuring meaningful rules, partly due to laziness, but mostly because I wanted to learn the tool (Cursor) and become comfortable with my workflow before I started changing that workflow. A couple of weeks ago I actually selected a set of rules and added them to my environment ... but I didn't do it correctly so the rules were never leveraged.

TypeScript Without Types - Why Bother?

A few days ago I was looking through some of my data access code and I noticed a complete lack of type definitions. Like ... anywhere. I'm embarrassed to admit that it took me several weeks to realize that I was basically just working with javascript. There were literally zero type definitions, anywhere. In my front end, or my backend. Not in function signatures or any variable declarations. There was not a single custom type or interface definition, or any type paraemterized functions. What's the point?

What's when I realized that my cursor rules were not being used. So for the time being, I've implemented a project rule that basically says "this is typescript - do typescript type things." I haven't generated much code since I've added this rule, so I'm not yet sure how effective this specific rule is.

Setup You Toolchain Up Front

This should be obvious, but take the time to properly configure your toolchain up front; that includes setting up your linter/formatter (such as eslint or clippy), if applicable . When I finally got around to configuring eslint for my front end, I had ... over 300 errors. Three. Hundred. Errors.

This project stared as a hobby project, but over the past couple of weeks its become much more serious. Given its roots as a hobby, I didn't invest time in early warnings/validations, or the rules. That was plainly a mistake.

I still don't have any CI/CD set up yet. No CD because there's nothing to deploy and nowhere to deploy it. And no CI just because I've spent 100% of my time coding. CI is much less important as a solo dev; I can run validation locally. But it still needs to happen.

Leverage Cursor for Refactoring

Sure, Cursor can generate code (that's the entire point right?), but use it to iterate on what's already there. For example, my backend had several sections that needed database transactions. Each of these had their own try/catch/finally/BEGIN/ROLLBACK/COMMIT implementation, which was all redundant. Yes I was aware of the redundancy as the code was being generated, but I was favoring feature throughput; I knew I'd need to revsit that.

When I made the decision to de-dupe that functionality, I described my intentions to Cursor and it was able generate a serviceable database transaction pattern. It was also able to refactor all of the existing code to follow this new pattern, so that was cool. However, it really tripped up when attempting to refactor the unit tests. When I say tripped I should say completely fell on its face. I eventually told Cursor to throw out the current unit tests and to reimplement them based on the latest code. That was much smoother.

Manual Refactoring Can Be Painful

My codebase is still really small; about 2,000 lines of typescript in the backend, and 4,000 lines of typescript/tsx in the front end. This is /src only, not including tests. When I realized I needed to fully implement types, and when I saw all those linter errors, I decided to stop implementing functionality and to refactor the entire project.

I am way out of Cursor fast requests, so all my agent requests are taking forever. I decided I needed to do a lot of the type related heavy lifting myself. Cursor's built in agent assistance was really helpful here; it was able to anticipate many of the changes I wanted to make. And even with that assistance, implementing full type safety into the front end's screens and navigation was just such a huge lift. I probably spent 12 hours on that alone yesterday.

One situation in particular was very painful. This is my first real experience with React, and I didn't know how to pass typed arguments through navigation to different screens. Cursor suggested a couple of different options, I selected one, and implemented that across all my screens (~15 as of right now). When I thought I was done, and had worked through the bulk of the linter errors, I ran into a type incompatibility issue I didn't have the expertise to resolve. When I asked Cursor for assistance, it's response was basically "the way you're doing it no longer works after React Navigation 6". I wanted to toss my laptop. When Cursor originally presented the various options, it made no mention of them being version dependent. That was an utter waste of 3-ish hours. Not to mention that my package.json is right there.

Summary

Be an active participant; be an active code reviewer. I know the entire point of "vibe coding" is to ignore the code, but I just don't think that's possible. All of the youtube videos of people doing this stuff gives one the impression that it's fire-and-forget, or that one just accepts whatever is generated. I don't see how that could produce anything sustainable.

Furthermore, do some of the leg work up front. Choose your cursor rules, configure your toolchain, etc. Make sure that the code you're actually getting has some minimum quality (and I'm not talking about testing here).

Additionally, don't abandon fundamental software development principals. Be empirical. Refactor often. Implement good patterns.

Github Repo - Open or Private?

Trey Hutcheson — Tue, 27 May 2025 18:22:07 +0000

Background

I started a new project not too long ago. I've briefly mentioned it in other posts; it's a mobile-first social app for automotive enthusiasts. The stack is Node/TypeScript/GraphQL/Supabase on the backend, and React Native/TypeScript/Expo/GraphQL/Supabase on the frontend. This is a personal project that I've wanted to do for nearly 20 years, and I'm finally making the time for it.

I had initially considered it a hobby project, and another opportunity to explore vibe coding more deeply. But my viewpoint has shifted. This has gone from being a hobby project, to something that I want to use myself, and hope others find useful. It's something I want to launch.

Repo is Private

I don't realistically believe this will turn into a money maker for me. But it's not beyond the realm of possibility. Considering this project may have some commercial viability, I decided to make the repo private. The primary justification is to protect any potential intellectual property.

There are some downsides to the repo being private. One is obviously personal marketability/visibility; I have little public presence on github, and if this repo were public it would be something I could share with confidence, enthusiasm, and pride. Also, I'm between jobs at the moment. It would be hugely beneficial if I could include this as part of my portfolio, but I've still chosen to keep the repo private.

Lack of AI Assisted Governance

The biggest downside, however, is lack of overall context from AI assistants. I talked to ChatGPT for at least a month before I wrote the first line of code. I've used it to help me through some technical decisions, how to model certain interactions/activities, to explaining monetization options and SaaS offerings in spaces where I have little experience. ChatGPT has my complete thought history of this project from inception to its current form.

It is a frequent occurrence that I will ask ChatGPT for some seed structure (sql staetments, graphql changes) based on our chats, but what it generates is not directly consumable because it's missing the underlying source. Since the repo is private, ChatGPT can't actually see the source code. So it's generating content that doesn't use the same symbols as in the code.

Likewise, I would like to ask ChatGPT to analyze certain slices of the repo, and give me an updated entity diagram, or tell me where what I've implemented isn't in agreement with a previously discussed design. That's not possible, because the repo is private.

Cursor & Limited Context Windows

On the other hand, Cursor has access to the entire repo (or individual folders for different tiers, in my case). From a code standpoint, it has all of the context ... of what's been implemented. It is missing all of the context of the historical discussion on the project. When I implement a given feature, I find that I have to repeat portions of previous discussions; that I have to explain certain parts of the domain model again, for example.

To put it another way, I can deploy Cursor tactically, but would like to leverage an LLM on the codebase strategically. And since the repo is private, I cannot.

Is a Public Repo Worth The Risk?

I am extremely tempted to temporarily make the repo public, and share it with ChatGPT (or Claude). That would give me a huge step up. But that comes with risk. Sure, the vast majority of repos totally escape notice. But once it's public, it's going to be crawled by some bot, and that can never be deleted. Knowledge of the domain model, the physical database structure, backend providers, etc, are all possible attack vectors even if I changed the repo back to private.

I don't have many readers, but I would genuinely appreciate some feedback. Have you successfully launched a product, your own intellectual property, from a public repo that *was not an open source project?

Where Do You Keep Your Holistic Context?

For others that are leaning heavily into AI assistants outside of the code, where is your full historical context? Are you leaving it in an LLM? Do you keep it in some kind of knowledge repo? Do you document things within your repo itself? If so and your repo is private, how are you bridging that gap?

More Lessons Learned with AI + Vibe Coding

Trey Hutcheson — Tue, 27 May 2025 15:57:16 +0000

At this point I think this series has evolved into an endless stream of lessons and Aha moments. It seems like every day something new happens that's interesting, or frustrating, or even both.

Lesson - Test the Stable Parts

I am not going to debate the value of unit tests or any other part of the test pyramid. I will say that I have developed the habit of asking the agent to generate new unit tests, or even to compare what's been implemented versus existing unit tests, and to cover gaps. The value from that practice has been inconsistent.

The majority of the generated unit tests are valuable, and they pass on the first time. But there are classifications of unit tests where no matter how I prompt, the generated tests don't work the first time. For example, unit tests for a function in a DAO unit that executes multiple statements within a transaction; I have several such unit tests that after tweaks I was able to make run correctly. Now when I need to create similar tests, I ask the agent to use the same pattern as unit X (tagging that file in context), and the generated code is usually mostly correct. But it hasn't been correct from the first prompt yet.

Overall I've found that generating tests for the backend has been most valuable. It's the most "stable", in that I am not iterating on it nearly as quickly as the front end. The front end, on the other hand ... that's been a completely different story. I have effectively given up on unit tests of any sort, for the time being. Once I achieve a certain level of functionality that I consider to be stable, I'll go back and implement unit and integration tests. But right now there's no point in testing every iteration of a particular screen when that screen make go through 15 iterations. That was just slowing me down.

I had one screen that, in my opinion, wasn't particularly complicated. Once I got the workflow where I wanted, I asked the agent to generate relevant unit tests for that screen. When I ran them, they all failed. That became a very frustrating experience - what started as an attempt to test some core logic and boundary conditions of input validation, etc, became a meandering journey through npm dependency hell, jest configurations, crazy mocking chains and racey behaviors due to callbacks. All of this was with the agent set to "Auto". I never could get the tests to run. After almost three hours, I threw out the test fixture altogether. I changed the agent to Claude 3.7, and started over. The resulting code was a little bit different, but after another hour, none of the tests were passing. I had spent an entire half-day trying to get tests to work for one particular screen, and had utterly failed.

Part of it is my lack of experience with the stack. If this was java or rust, I'm sure I could have figured it out. So for now, I've abandoned unit tests in the front end. Maybe when that codebase settles down a bit I'll give one of the Claude 4 models a shot.

Lesson - Use Wireframes to generate new screens

One thing that I've learned is that Cursor, even with the agent set to Auto, is pretty adept at producing functional screens/pages when provided a wireframe. So when I need to start on a new screen, this is my workflow:

Ask ChatGPT to generate a wireframe for the screen, taking into account my high level requirements. The result is a simple monochrome image (at least in my case). For example:
I then copy that image and paste it into the Cursor chat window. I ask Cursor to generate a new screen based on the image, give it a high level set of rules, and how to integrate it into the navigation flow.

Most of the time the output is 90% of what I need, and I tweak from there. I think this is one area where it saves me the most time. Granted, my app isn't particularly pretty, or even user-friendly at the moment. But I'll worry about polish once I achieve a certian level of functional breadth.

Lesson - Explicitly Ask Cursor to Implement/Fix

I'm on the Cursor $20/month plan, and I ran out of fast requests in like a week. I'm between jobs at the moment, so I'm basically doing this coding thing full time. But I was really surprised that I ran out of fast requests so soon; now all of my requests are in the slow pool.

Then I realized a pattern - I will request Cursor to perform some task, Cursor will analyze and then tell me what it will do, and then ask me if I want it to do what it recommends. Another example - some times a test will fail, and I will add some lines from the console to the chat context, and inform Cursor that the test failed. Cursor will respond by explaining why it thinks the test failed, and give me recommendations on what to do to fix the test.

In both cases, if I confirm or ask Cursor to implement the recommended changes, it's just another series of back-and-forth requests with the backing agent. So I highly suspect that I'm using 2x the number of requests I should be. I have now developed the habit of explicitly requesting cursor to implement the requested changes without asking for confirmation, or explicitly asking Cursor to fix a test or compiler problem.

Lesson - Try Different Models

I am on an extremely tight budget, so I am not going to use Max or meter based pricing. Therefore I've mostly just left the agent set on Auto. I did try 3.7 Claude 3.7-sonnet (which now seems to be unavailable) and my experience was similar to others' I've read online; sometimes the results were too complicated, or the agent eagerly implemented things that were not requested.

I also tried gpt-4.1, and to be honest I was completely underwhelmed. It's output seemed extremely ... naive, and I had to be much more explicit with instructions to get anything useful out of it.

Models are constantly changing, and it's likely that different models are better suited for different tasks. But the result is that you will likely have code in the same repo that all looks slightly different. In my case I've seen that different agents generte code with different tabbing/spacing, which I haven't gotten around to normalizing yet.

Things Are Improving

When I go back and re-read my posts so far, I think I come across as too negative. That is not my intent. This exercise can be frustrating at times, but it's still all very exciting. And things are definitely improving; in the past week I was able to implement double the features when compared to the previous week, and they were more complex.

There are still lots of things I can do to make further improvements; I want to leverage claude-task-master, I want to find a set of cursor rules that really makes sense for my setup, and I 100% have to polish the front end. But it does still feel like all the youtube bro's out there are getting different results / playing by a different set of rules than I am.

Vibe Coding: I'm doing it wrong

Trey Hutcheson — Mon, 19 May 2025 19:32:58 +0000

For the past few weeks, I’ve been using Cursor AI to help me build a mobile-first app with a brand new stack. I went in optimistic, ready to lean into the vibe coding workflow I’ve been exploring in this series.

In practice, it hasn’t felt magical. Not yet.

To be fair, I’ve thrown a lot at Cursor. I’m working with React Native, Expo, Supabase, and GraphQL—all technologies that are new to me (well, GraphQL not so much). So I expected a learning curve. But I didn’t expect to feel like I was constantly fighting the assistant meant to help me.

Every little thing I try to implement turns into a negotiation. Sometimes the AI nails it. But more often, it veers off course. For example, I’ve had to repeatedly remind Cursor that this is a mobile app—on multiple occasions, it generated web-only code. When I asked it to refactor utility functions into a new module, it moved the code but forgot to update all the references. These aren’t rare slip-ups. They’re part of my daily workflow.

I'm using "Auto" for the model, so I can’t tell if I’m hitting the limits of the model, not being precise enough in my prompts, or just not giving the right context. And maybe that’s the real issue: vibe coding seems to require a certain prompting fluency that I haven’t yet mastered.

That’s not to say the approach is flawed. In fact, I’ve seen others get fantastic results with it—much better than what I’m managing. When it works, it really does feel like the future: scaffolding components, wiring up queries, producing clean type definitions. But the inconsistency has left me wondering whether I’m getting in my own way.

So I’m not ready to call vibe coding overrated. What I will say is that it’s not effortless, and it’s not plug-and-play. At least not for me. It requires iteration, vigilance, and a deep enough understanding of your tools to know when the AI is leading you astray.

I’m still hopeful. Maybe with more practice—and better prompting—I’ll start seeing the magic. Until then, vibe coding feels less like autopilot and more like a bumpy ride with an eager co-pilot who sometimes grabs the wrong controls.

Misguided Prompts and Better Outcomes: Learning by Doing (Wrong)

Trey Hutcheson — Fri, 09 May 2025 15:56:18 +0000

Regretting Some Tool Choices

I have to admit something — my struggles with getting the spreadsheet navigation to function correctly left me with a bad taste in my mouth. It took much more effort than I anticipated. Some of that effort was just due to my learning and refining my prompts with Cursor. Some of the effort was really due to a lack of a cohesive plan; when I first started the project, I was not clear on how I wanted transaction entry to function, which is arguably the most important function of the app. If I had been more clear up front (or put another way, put more thought into it beforehand and documented my expectations), it probably would have taken less time.

At one point, I became so frustrated that I took a step back and created a rather lengthy markdown file to document my expected behaviors of the page, and asked Cursor to start over based on these new requirements. I had hoped the more thorough context would result in a better initial implementation, but to be honest, it was little better than what I had before. In the end, it was just continuous iteration that finally got me to something approaching usable.

Or was it?

Before I started this project, I asked LLMs for feedback on suggested frameworks. Based on various responses, I ultimately went with React because I thought that might directly translate into something I could use in future professional endeavors. When I thought back to my experience, it occurred to me that implementing spreadsheet-like navigation may be more or less difficult depending on the framework.

A New Approach

So I took a completely different approach: I asked Claude (not in Cursor) to generate a functioning spreadsheet-like interface using Vue.js. For this exercise, I just wanted to see a working demo — not something I wanted to incorporate into my existing codebase. I wanted to examine the output for relative complexity, and I wanted to see how close I could get to "one-shotting" it with a better prompt.

I decided that perhaps I had tripped up Cursor previously by even using the term "spreadsheet," because that implies all kinds of things I do not need, such as:

The ability to resize columns
Cell merging/splitting functions
Data filtering
Cell formulas (the actual purpose of a spreadsheet)

Therefore, I explicitly focused on navigation in my prompt, and also excluded these functions:

I want to build a simple application for entering financial
transactions. I have a version based on React and it's more
 complicated than I want. I would like to do something with 
Vue.js.

The primary interface needs to render and function like a
 spreadsheet; effectively a table where each row is a 
transaction. That means keyboard-based navigation 
(arrow keys, tab/reverse tab). 
I am not implementing a full spreadsheet; 
I don't need support for dynamic formulas, or 
splitting/merging cells, etc. The columns will be fixed. 

Can you please generate a very basic single page based on Vue.js 
that has spreadsheet-like navigation and rendering behavior 
(light borders around each cell, a thicker border to show focus, etc). 
Populate with a few rows of sample data.

There should be one column whose value is calculated dynamically
based on the data in the other cells in all rows (like a running balance).
The result was a self-contained HTML file at approximately 350 lines, with the first 100 or so being styling. The table definition was maybe another 100 lines, and the rest was just basic JavaScript and easy to understand. I am not making a direct Vue vs. React comparison — that’s just silly and not the point of this exercise. In terms of actual function, it was almost exactly what I was looking for. In the few minutes it took me to write out the prompt, wait for the response, and evaluate the outcome, I was already at a better place than I had been days before after literal hours of work.

Here’s a static screenshot for comparison:

Retrospective – What Went Wrong?

Why did it take hours and multiple attempts when I was doing this with Cursor and React, versus basically being able to one-shot it with Claude?

I mentioned several things in my opening paragraph. Do I believe the framework really made the difference? No. All the time I spent trying to tease out the correct functionality from Cursor was my fault — mostly my inexperience. Prompting, and using a tool like Cursor, is itself a skill. Like any skill, it needs practice.

So that's what I'm doing now: I am trying to spend some time each day exploring different scenarios within Cursor. I will share more as I continue with this process.

Vibe Coding with Cursor AI: My First Hands-On Experience

Trey Hutcheson — Sat, 03 May 2025 00:52:06 +0000

First Experience with Vibe Coding

I am not going to spend time defining Vibe Coding, or explaining its history/origin. It's been on my radar for a while, but only recently did I find time to really educate myself on the topic. The concept is pretty simple - let the AI/Agent do the work for you. But the practical implications are very interesting.

If you choose to do your own research (which I highly encourage), be aware that since this is all changing so fast, articles or videos from even a few weeks ago may already be out of date. That being said, my "Aha moment" came when I watched this video. Fair warning: it's three hours long, but the first hour is more than enough to see the potential.

Where to start?

There are all kinds of AI assisted (enabled?) tools and IDE's, but I selected Cursor AI. It's based on VS Code, and after watching the video, I liked its visual organization, and the ability to understand rules & requirements. So I downloaded Cursor (version 0.49 as of this writing) and signed up for a 14 day trial of the Pro plan (for the record, I am totally subscribing). After that I opened my previous VS Code workspace. Now it was time to start implementing features.

The first thing I wanted to do was to tidy up interacting with Accounts:

Add a new field (starting transaction id)
Modify the Accounts page to be a simple listing, and only have the details visible when adding/editing.
Add a date picker to the Start Date field when in edit mode.

And this is when the magic truly began. I entered a few prompts, and just accepted each change as Cursor responded. When I asked to add a date picker, Cursor informed me that I needed some new dependencies, and even offered to run them:

npm install @mui/x-date-pickers @date-io/date-fns date-fns

So not only was it able to modify the code, it was able to modify my environment to support it. That's pretty amazing.

Ergonomics vs a Browser-based Chatbot

That was just the surface. I continued to implement features, and it only took minutes versus hours or days. Some of the more complex things took many iterations and ultimately did take hours, but there's no way I would have been able to emulate spreadsheet behavior in React in less than a week, particularly because I have zero practical experience with React.

At its core, you're still interacting with an Agent/UI in a chat model; it's just integrated in the IDE instead of a browser window. This is far superior to what I was doing just days before, and it meant no more copy/pasting of generated code.

The agent is able to modify the files within the workspace directly. As I said, the agent can even manipulate the host environment by executing commands from the shell. The agent can even automatically spot and correct certain types of errors.

The interaction model is typically as follows:

In the chat panel, you enter a prompt.
The agent will do its best to fulfill the request, or may ask clarifying questions.
The agent will generate a series of modifications to the source, and ask for you to review. You can choose to accept the changes, or request further refinement.
Occasionally the generated code will result in problems (compiler errors, linter errors), and Cursor will automatically attempt to resolve the errors.

This workflow allows for some truly rapid prototyping, and depending on the tech stack / toolchain, greatly reduces the feedback loop.

To me, this is all just groundbreaking. I can't make a direct comparison to using GitHub Copilot from within VS Code because I just haven't tried it yet. It's possible much of this workflow is available there as well.

It isn't all roses

As cool as all of this is, there are some issues. I'd wager many of my problems were on me; AI prompting is a skill that will continuously grow. But these are some of the more frequent issues I ran into:

"Chats" have a finite length - A chat session will eventually become ... exhausted? I don't know; that's the best term I can use. But after enough time, the chat interactions slow down, and eventually the agent will act like it's just started a new conversation. When that happens, it's effectively lost it's "memory." On this note, as the chat ages, there will be a little note in the bottom right hand corner recommending opening a new chat. It's not the easiest thing to see and I hope Cursor makes this more prominent in the future
Lack of "big picture" thinking - even when limited to the context within a single file, multiple times I requested a certain behavior to be implemented, and when I tested the changes, they did not function as desired. I would respond to the agent, describing the behavior, and frequently the fix was to make more changes elsewhere in the same file. For example, when exiting one particular cell in a grid, a validation message was shown to the user twice. It was because the generated code invoked a custom function, and when analyzing the source file to make a change, didn't notice it was already invoking that function. I chalk that up to the underlying model, and not Cursor itself.
Functionality was occasionally "undone" - I remember at one point I requested that the Date column be formatted in the US style (mm/dd/yyyy). Later when I requested a change to how the Amount column was formatted, the Date column had lost its formatting.
Infinite Loops - this only happened once, but it was super annoying. I requested a particular change, the agent made changes, which resulted in linter errors, so it automatically attempted to correct those linter errors. Apparently those corrections caused other problems, and after a couple of attempts to fix them, the agent decided to abandon the changes and re-generate the original code again. This ended up producing a literally endless cycle where it just made the same changes, applied the same fixes, and it could not escape. I had to manually stop the cycle, discard the changes, and refine my request.
"Naive" implementations - sometimes the agent made some choices that just didn't make sense, and seemed like something that a very junior developer might put out. For example, I stated that the Balance field was a running total; SUM(transaction.amount) of all prior transactions. At one point while testing the interface, when I edited the Amount field on a particular row, the Balance field was updated on all prior rows (which is the opposite of what I requested). I scrolled back through my chat history and literally copy/pasted a previous instruction, and the agent responded with an "Aha, I understand now", and was able to correct the calculation. I don't understand why that prompt didn't achieve the desired result the first time. Another example: the Balance field was being updated in all rows on every keystroke when an Amount cell was in edit mode. That's just obviously going to be slow, so I suggested to the agent that a performance optimization would be to recalculate only when the edited value was committed. It was able to make the requested change, but I'm surprised the model didn't forsee that problem earlier (especially since the initial implementation already memoized this value).

Magical Moments

Hiccups aside, there were occasions where what I experienced can only be described as magic (in the sense of Arthur C Clarke's third law):

The initial implementation of the Accounts page had a section where a single account's fields could be modified, followed by a table of all accounts. This UI was busy and ugly, so I requested:

Ok let's work on the accounts page. 
When you click on the Accounts link from the nav bar, 
it shows the area to enter a new account, 
and below that is a listing of the defined accounts, 
I'd like to modify this form so that the default 
view is just the listing, and there should be a 
button to add a new account. When clicking the button 
to add an account, or the pencil to edit an existing 
account, it should transition to a separate page to 
specify the account values. The add account button 
should just be a blue button with a +, with tool 
tips that reads "Add New Account":

That's a medium level of detail, and it was able to do everything I asked in one go. See the following images:

Occasionally there were some runtime errors, and the page would cease to render. When that happened, I opened the dev tools to show the console logs (with error and stack trace), and then took a screenshot. I informed the agent that rendering had stopped functioning and there was an error, and pasted in the screenshot, and the agent was able to fix the problem. Think about that one; the AI analyzed the screenshot, was able to understand the text of the stack trace, and correlate with the code to postulate the cause and make a fix. This happened several times. That's truly amazing to me.

Final Implementation

With several hours of iterations, I was able to get the agent to generate code that allowed the UI to mostly function as a spreadsheet. This is pretty sophisticated behavior for a UI, so it's no surprise it took so many iterations to get there. But keyboard navigation works: tab/reverse tab, arrow keys, pressing enter to accept a row, etc. It's not perfect and there are lots of tweaks to make, but considering I don't know React at all? This is just amazing.
Here's an example of what it looks like currently:

Next Steps?

I have just touched the surface. And although I will continue to iterate on application features, I'm curious about exploring Cursor AI in more depth. For example:

Start adding some unit tests
Dive into the code and ask it to explain a few things
Start suggesting refactoring
Explore how to leverage Cursor AI in a test driven manner; can it execute tests before applying changes?
Explore the limits of workspace size and context. For example, if I am working across multiple layers (database, backend service with an api, front end that consumes that api), can I make vertical changes across all modules?

Conclusion

There are some serious limitations to GenAI assited coding in general, but specifically "vibe coding". Some of the limitations are the tools; some are the models. This is a space that is changing very rapidly; it's worse than trying to stay abreast of front end frameworks. But I am confident that these limitations are only temporary, and I'm excited for the potential. I'm going to do my best to ride the wave and see where it goes.

Letting AI Build My Frontend: A Tale of Two Chatbots

Trey Hutcheson — Wed, 30 Apr 2025 19:00:13 +0000

TLDR; Source Available

The source code (in its final form) is available here. There were too many iterations with multiple LLM's to record & commit each iteration. Unfortunately, that means you can't follow along this journey with point-in-time code. But if you are interested in skipping to the end, the repo is there and it should be functional.

AI Generated Code: First Steps

So for those that haven't really used any of the genai chatbots, where do you start with generating code? That part is actually pretty easy; you choose a provider/LLM and give it a simple prompt, and iterate on the prompt. Usually the chatbot will acknowledge the request, and ask clarifying questions or make suggestions. For example, this is what I used:

I want to build a very basic app for entering financial transactions.
I can describe the model and the very basic business rules. 
But before we start, I'd like help in deciding on the 
front end framework to use.
For now I just want to build a simple single page 
application that I can run from my workstation/desktop; 
I will not be hosting it publicly at first

I tried this prompt with two different LLMs:

Gemini (2.5 Pro)
Claude (3.5 Sonnet)

And the results from each were wildly different. The mechanics of interacting with the chatbot - providing the initial prompt, and refining via continued prompts and possibly incorporating suggested feedback, were the same. But the questions and suggestions were completely different, and in the end produced highly divergent results.

Observations between Gemini & Claude

I will admit that since I'm still a novice, I didn't put a lot of thought into the actual models I used. I have tried to familiarize myself with the basics, and generally both Claude and Gemini received high praise for the quality of the generated code. Although I'll say most people have said Claude 3.5 Sonnet is better than 3.7, but that requires a Pro subscription so I stuck with 3.7.

Initial instructions: Gemini

My experience with Gemini was very interesting. The prompt I gave was very generic; I didn't give any hints about frameworks or technologies to use. Gemini responded by suggesting options including Vue.js, React, Svelte/SvelteKit, and Angular. It contrasted each selection with a list of Pros and Cons, and concluded that I should look at the Getting Started guides for each and reach my own conclusions. I responded that I just wanted to go with React:

ok let's go with React for now, and use the 
best-in-class or default libraries where 
needed (such as for routing and state management).

Gemini then provided instructions on how to bootstrap the application with npm and Vite to create the scaffolding of the project. Gemini concluded by asking for more information about the data model and business rules.

Initial instructions: Claude

I provided the exact same prompt to Claude, and it provided the exact same list of candidates as Gemini. However, there was much less detail here than with Gemini, and no compare/constrast amongst the candidates. But like Gemini, Claude did ask for more information about the data model/biz rules.

Based on the quality of the first response, my instinct was to stick with Gemini. But I stuck with trying to provide the same prompts to each LLM to make sure I was comparing apples to apples.

Data Model: Gemini

I supplied a single sentence response to Gemini:

Ok let's move on to the data model.

And Gemini automatically generated javascript for a Transaction object, and provided a detailed explanation of each field. It then asked me if I liked the model, or if I wished to make any modifications. To be honest, the generated model only had a vague resemblance to what I had in my head. But to be fair, I had not provided any guidance other than in the initial prompt when I stated I wanted to enter financial transactions. The lesson to be learned here is that it does take some work to tease out what you want, and the more context you can provide, the better.

So I spent some time writing up a very detailed description of the model that I use in my spreadsheet. All of the detail (context!) significantly changed the generated models. Gemini produced typescript definitions for an Account object, a Category object, and a Transaction object. It suggested improvements to how I modeled certain data in my spreadsheet, such as moving the running account total from the transaction to the Account object, and that made complete sense. In the case of a spreadsheet, it makes sense for the running account balance to be a formula based on the last transaction value, but not so much in actual code.

Gemini further included statements about some assumptions it made, and asked me to confirm. It also made a suggestion that I didn't care for, as it didn't match my personal workflow. I informed Gemini of this difference and it accepted my feedback, and updated the typescript definitions. It even asked me how I should handle the concept of refunds. You have to admit, that's an insightful question and not something I was expecting from an LLM.

Data Model: Claude

Gemini asked for some interesting clarification, so I updated my model prompt to include that clarifying context, and fed it to Claude. Claude gave me a bulleted list of components/objects (Account/Category/Transaction) with a brief description of each, and then suggested an outline for the React application. Claude then offered to create a starter React project, define a basic Reduxe store, and to generate basic UI components for entity listing and data entry. Again, in comparison to Gemini, the response was very brief and lacking in meaty detail. I asked Claude to proceed with its suggestions, and it started to generate some code. About half way through this process, I had apparently exhausted my use of Claude for a few hours, and to resume the next day. After it listed all of the generated code, it provided instructions on how to bootstrap the application and set up Tailwind CSS. Two of the steps were as follows:

npm install @reduxjs/toolkit react-redux react-router-dom tailwindcss
npx tailwindcss init

I followed the instructions, and the npx command failed with an error. I provided that error to Claude, it suggested a couple of solutions. Unfortunately, each solution it provided did not work on my workstation. I'm confident that if I had spent my own time debugging the issues, it was likely some npm dependency issue and a conflict in my environment. But for now, I'm trying to rely on the AI as much as possible, and ultimately I just removed Tailwind CSS.

After that, it was a matter of manually copying/pasting all of the generated code into the appropriate source files in the workspace. At this point, I was able to run the app and access it via the browser. It was bare bones, and since it lacked styling, it was aesthetically displeasing.

Back to Gemini

Since I couldn't get Claude to figure why npx and tailwind were not functioning in my environment, I went back to Gemini, and asked it to generate the starter project. It provided the following instructions:

# 1. Create the Vite project with the React TypeScript template
npm create vite@latest financial-tracker-app --template react-ts

# 2. Navigate into the newly created project directory
cd financial-tracker-app

# 3. Install the project dependencies
npm install

# 4. Install React Router for handling navigation
npm install react-router-dom

And then gave me step by step instructions on which files to create, including the source (mostly typescript in this case), and explained the purpose of each source file.

Unfortunatly, what it generated was too basic to be useful and lacked any of the functionality I was seeking. Basically, it was just a home page that listed dummy data for categories, accounts, and transactions. I was really surprised by this, because Claude assumed from the beginning that I wanted full CRUD functionality for each entity.

I kindly informed Gemini:

I need the ability to list, add, modify, 
and delete both accounts and categories. 
Can you generate these components/forms/pages?

Gemini dutifully performed as asked, and generated a bunch of new code. All of this had to be copy/pasted into the code editor (VS Code at this point). But it was still ugly. So I asked Gemini to style the app, using Material.

Lots of Progress, and an Error

More code was generated. More code was pasted into the IDE. Now, it was at least somewhat visually appealing:

It still lacked the core purpose of the application, however; there was still no method of actually entering a transaction. I therefore asked it to generate a page for transaction entry. More code generation, more copy/pasting. And then a typescript error.

I've used typescript off and on over the years. I am not the biggest fan for a number of reasons, but that's another topic. The typescript compiler was complaining about something not being a valid string. It appeared to be an issue with double or even triple-escaping certain character sequences (embedded typescript string interpolation within a React TSX snippet, but the typescript string itself has embedded escaping), but I really didn't know how to resolve it. I shared the error and offending code with Gemini, along with my theory. It confirmed my suspicion and fixed the issue. After one more copy/paste, the transaction entry form was able to render.

Summary & Initial Impressions

I'll admit that I have been somewhat of a skeptic in terms of what AI can really accomplish. I am happy to say that is no longer the case. I knew AI could generate source code, but I am genuinely surprised by the insight and perceptiveness; its ability to ask meaningful questions to improve what it outputs. Furthermore, the experiencing of providing the AI a compiler error message and the broken line of code, and it actually fixing it? This is some pretty cool stuff, and I'm excited.

The workflow of working with a given chatbot in its native UI, and manually copying/pasting code from a browser tab into an IDE becomes repetitive very quickly. The chatbot UI helped me learn what I wanted in terms of interactions and their capabilities. It's time to move to an improved workflow, and my next line of investigation is to use Cursor AI.

I've also learned that context is super important. The more guidance one can provide, the better the results are going to be. AIs are surpringly good at dealing with open ended questions, but when it comes to generating code and emitting anything useful, more information is better. In fact, I am beginning to believe it might be better to be prescriptive when possible.

Conclusion: It was a good use of my time

This exercise was not one contiguous block of time; I worked on it here and there over about two and a half days. But in clock time? I had two working examples (of varying functionality and ugliness) in a matter of hours. Considering I don't have any experience with React, had I done this on my own following tutorials, it likely would have taken me several days or even a week. And I don't think I'd have had full CRUD operations for all three entities in that time.

The goal was to generate an SPA that mimicked the behavior of what I currently do via spreadsheet. The data is not persistent, and there are some rough edges, but I believe I accomplished my goal. There's still lots of room for improvement:

data validation
implementing business rules
adding some components
streamline transaction input

But will leave all of that for a future exercise.

The Application Candidate: Personal Ledger

Trey Hutcheson — Tue, 22 Apr 2025 20:27:35 +0000

A few years ago (2021) I went through a significant life change and realized I didn’t actually know how to manage my own finances. (Side note: don’t be like me—get a handle on your finances before your mid-40s.) All of a sudden I needed to do basic things like set up a budget. I tried a few online tools / apps, but they were of limited value because I simply didn't have much historical data. Or to put it another way—I had plenty of financial history, but no access to it (don’t ask).

I decided that was a skill that needed immediate development, so I went old school and did everything manually, and I still use this process today. I have a spreadsheet (simple named Ledger) in which I record every financial transaction in each of my accounts: checking/savings/credit cards. A few times each week, I log in to each institution's website and manually enter these transactions in my ledger. That may seem silly today, considering all the apps and integrations available (I do miss Mint). But I prefer this method—it gives me oversight into every single transaction in all of my accounts, and I am able to categorize transactions as I see fit. The latter point is really important to me - I've had no end of issues with apps assigning incorrect categories to transactions and not respecting or remembering manual classification.

This process works for me. It gives me the warm fuzzies in terms of visibility and control, and only takes a few minutes of my time each week.

Existing Software

While this process is entirely manual, I've written some software. One of the first things I did was to define a schema for a postgres database in which I would import the data and create custom queries/run reports. I then wrote a simple command line program in rust to import CSV files into the database. After I had a year's worth of data, I was able to gain some insights into my own spending, based on my own categorization. Compared to what I was getting out of Mint at the time, I felt it was more accurate.

Next Steps: Frontend or Backend?

After I decided to get my hands dirty with code again, I considered my personal ledger. I have virtually no front-end experience (beyond 15 year old stale knowledge of html/css) , so I thought this would be an opportunity to learn some of those skills. In other words, why not create a front end? After that decision was made, the next question was obvious - what was going to power that front end? I have a database and the ability to import data into that database; if I have a front end, I'd need a backend too. So before I even started on the front end, I started throwing together a little service with a REST api implemented in Kotlin (I have had a love affair with Kotlin for years).

Iteration 1 - Simple SPA

I went down the backend / Kotlin rabbit hole for a few hours before I decided I was overcomplicating things. If I really wanted to learn the front end, I didn't need a backend. I could do everything in local storage. All of the business rules could be implemented in the app itself. Since I'm doing this as a learning exercise, I don't need things like authentication or even to integrate with any external services; i.e., it could be entirely self contained and run from the dev environment. So that's the next step - to get a dead simple SPA up and running. I don't really care about the polish or even if it has too many bugs; I just want to reach a point where I understand the code flows—and explore supplementary topics like the build chain.

How Much AI for this iteration?

As I indicated in my previous post, this series is really about capturing my experience using AI tools to assist in coding. I briefly considered jumping into the vibe coding game, exploring tools like Cursor, and possibly generating the code entirely through AI. I think that will be my ultimate goal, but for now I'm going to take a more conservative approach. I am going to describe my vision for the app to an LLM (still deciding on which), define the domain model and business rules, and document my experience with what I got out of that approach.

Wrapping Up

This is just the beginning of my journey to modernize my manual workflow and learn new tools along the way. The plan is to keep things lightweight, stay focused on learning, and let the application evolve organically. I’ll share what works, what doesn’t, and how AI tools influence the process—whether they streamline development or introduce new challenges. If you’ve ever thought about turning a personal spreadsheet into a living app, or are just curious how far you can get with a little AI help and a lot of curiosity, I hope you’ll follow along.

Rediscovering Code: My Late-Career Leap into AI

Trey Hutcheson — Mon, 21 Apr 2025 19:04:30 +0000

Starting Over (Sort Of): A Late-Career Dev Dives into AI

Hi, I'm Trey — and I've been in software development for almost 30 years. Most of that time has been spent as an individual contributor, though I've also held tech lead, team lead, and architectural roles. For the past three years, I've been in management.

Now, I’m at an interesting crossroads: do I continue growing in management, or do I return to being a hands-on developer?

The Joy of Management

Management has been rewarding. Coaching and mentoring engineers (and other managers), helping teams through reorgs, and even supporting teammates on personal milestones like gaining permanent residency — it's all been incredibly fulfilling. Making a difference in someone's life is powerful; I've been able to mentor engineers with difficult career decisions including helping transition some into management themselves; and I was even able to assist a refugee in obtaining permanent residential status.

But lately, I’ve felt the pull to get my hands dirty again.

Back to the Code

The demands of managing several teams has little time to continue to hone my coding skills. Now, if I return to development, I'll be playing catch-up with many trends that passed me by.

I started in the late 90s doing desktop development in Visual Basic. I participated in the rise of Java and .NET, helped several companies through the waterfall-to-scrum transition, and built software at companies ranging from startups to multinational corporations.

In the 2010s, I moved into architectural roles. It was a perfect balance: working on interesting tech while also growing my leadership skills.

My Technical Background

Over the past decade, my focus has been distributed systems — from on-prem monoliths to cloud-native microservices. I’m fascinated by distributed computing problems: consensus, replication, and the complexities of reliability at scale.

Here’s a quick snapshot of my tech experience:

Languages/Platforms: Java/Kotlin (heavy focus), .NET (early on), with a little bit of Rust
Frameworks & Tools: Spring Boot, Quarkus, Kafka, Kubernetes
CI/CD: Jenkins, Azure DevOps
Infrastructure: Linux, containerization, and Ansible (for IaC)

Architectural roles pulled me away from day-to-day coding, but I stayed close to system design, scale, and operations.

Left Behind by AI (Until Now)

Despite staying aware of AI and LLMs conceptually, I’ve had no real hands-on experience. I’m changing that now.

This blog will document my journey from zero to (hopefully) proficient in applying AI to real-world software problems. Some of it will be slow, some clunky, and much of it will be me learning in public.

If you're a late-career engineer like me — curious about AI, but unsure where to start — this blog is for you.

What's Coming Next

I’m starting with something personal: my own checkbook/ledger. In the next post, I’ll explain why that’s the perfect launch point.

From there, I’ll see how far AI-generated code can take me. The series will cover:

Tool selection & configuration
Coding experiments & outcomes
Lessons learned along the way

All the source code will be open and available to follow along.

Thanks for reading — and if you're on a similar path, I’d love to hear from you.