Irvin Gil

Posted on Apr 11

Delving into the Rabbit Hole of Agentic Coding: What Went Wrong, What Worked, and What I Should Have Done Differently

#ai #programming #showdev

TL;DR: In this post, I go through my experience attempting to solve real problems with the help of AI agents and agentic software development 🤖. I walk through the steps I took 👣, the thought process and decisions I made along the way 🧠, from planning to development to testing 🧪. This is a story of my attempt to build two tools that solve day-to-day workflow issues, one that failed and one that succeeded, and what I learned from both 😁.

With all the hype and writeups about AI and (air quotes) ✌ agentic software development ✌, I think it's about time that I do a writeup about my personal experience with this shiny new tool. It may not be new now for most of you, but I still find myself just scratching the surface of artificial intelligence. And so that is my first disclaimer — I am not an agentic development guru but just a regular developer talking about his experience with AI.

Before we get into the nits and bits of everything, I wanted to express my appreciation for my employer 😉 for granting us access to these amazing tools. I do not belong to the lot and majority that can afford a subscription to Claude and OpenAI. What I am trying to say is that if I was not working under them, then maybe I would still be living under a rock with nothing but Google, Stack Overflow, and the official documentation to aid me in my day-to-day developer workflow.

And if you're expecting this post to be some detailed handbook for building an agentic development workflow — no, this is not one of those blogs. So now, you have the chance to turn your back and continue on your merry way 😁👋.

What I am going to talk about in the next sections is my experience with making two personal side projects using agentic development practices. Well, it's just as far as agentic dev process that I understand. And right about this year, I learned through time and experience that AI is now more than just a chatbot. The reason why they are calling it agentic is that it can do stuff like create files as well as organize them, perform research and create a report, understand your entire codebase and act like an assistant when you try to do some proof of concept work for a new feature, and it can do many more things — with your guidance or autonomously. AI is basically comparable to a senior engineer that knows pretty much more than you can learn in a huge span of time, but would rely on you, the user, to point it in the right direction and make the decisions for it.

And so I, like any other excited and optimistic engineer with their eyes glowing to try out the new stuff, made it my side quest to try these bad boys out. I've pondered a lot on what would be a good problem to solve or something that I would like to automate within my workflow so that I could build a software or a tool with AI's help. And two problems readily made themselves visible to me.

Timesheet Chrome Extension (that failed 😞)

What is the problem that I was trying to solve

As with any company that requires their employees to have a daily time record of their working hours, ours too had our own unique timesheet process. We're given the responsibility to log our time ins and outs, and to flag the days that we're out of office for payroll purposes.
Personally, there is a part of me that thinks that this is something that can easily slip out of someone's mind and be neglected. So I made it my first quest to create a tool that would take care of this for me.

My thought process

After pondering for some time, I've come up with the following pain points:

Updating timesheet requires opening up spreadsheet software and manual copying and pasting.
- Prone to human error.
- Unconscious saving of items when sheet is on “autosave”
Many extra steps if you want to make updating the timesheet a daily habit.
No notification/alarms of timesheet deadlines

And with the pain points that I've listed down and after racking my brain for hours, I've arrived at the user story: ”As an employee I want a browser extension that could help me create/update my time sheet records per salary cut-off without leaving my browser. And it should also warn/notify me of the deadline of the timesheet.” Sounds like the problem I needed to be solved 👏. I wanted to use vanilla JS as this tool's implementation language since it's going to be a browser extension, and I wanted to have it as a stateless tool since it's dealing with timesheet data (my data). So I also strongly specified that the tool must store the data in a spreadsheet file — not in a local or external database. Little would I know that this gap would cause my undoing later on 🤦‍♂️. And then I proceeded to list and draft more of the tool's features (but I won't talk more about that here for brevity 😁).

What I did (an overview)

With the pain points defined, user story crafted, and features enumerated, it's now time for building the tool 🔨✨.
This may be late to talk about in this portion of the blog, but what I've done — defining the features and listing them down — is the key to defining the specifications of the software that you're going to build. It's a practice in AI-assisted engineering called “Spec driven development”.

”Spec-driven development means writing a “spec” before writing code with AI (“documentation first”). The spec becomes the source of truth for the human and the AI.”
This approach treats specifications as the primary artifacts and describes the language, structure, and instructions for an AI agent on how to “generate” code.

With this approach, the spec is a first-class citizen in your development lifecycle and would evolve together as the requirements change. There are many benefits to this if you're truly invested in AI-assisted development, such as that the specs can be agnostic and be reused to re-create the software in an entirely different programming language or with any other components. You now have the specifications and the clear “goal” of what you envision the complete software can be, and you can use any AI agent to create this for you.

Taking my requirements, I've leveraged GitHub's speckit project to help me define the specs for this greenfield side project (fancy word 🤭). By using speckit, I've made use of a ready-made structure and tooling to help me with spec-driven development. This is a huge boost rather than me trying to understand a lot of things and build structured tooling from scratch, saving a huge amount of effort. The great thing about the speckit project is that it has versions that support any other AI agents. So you don't have to worry about making it work for Claude, OpenAI, or Copilot since the developers of the project made it so that it can work with any of the most used and popular AI agents in the market 🦾.

And then down the development rabbit hole I go 🐰, using the slash commands / that are readily available to me while reading the documentation of speckit back and forth. This took me another hour going back and forth as well as reviewing the agent's output and adding additional prompts in hopes of steering it in the right direction.

What was the outcome

It took me less than an hour of handholding with AI to finish all the checklists for the tool. And I was genuinely excited to try out the results. So I enabled the developer options on Chrome and added the extension tool.

My assessment of the UI is that it looks bare minimum and not pleasing — the agent just used the base font and no fancy styling on the extension's UI. Well, I think I actually deserved this since it was more focused on the functionality and not on how it looks.
But what about functionality? It was no good either. On the good side, the extension's UI had the correct validations that were specified in the specs. But it was not able to persist data into a spreadsheet in the specified directory. The tabs for "Period" and "Settings" on the extension UI did not work, making the extension pretty much unusable.

What I learned and what I think I should have prompted better

What I think went wrong was when I completely neglected the research and feasibility study of the technology that the software would be built upon. I just assumed that everything would work out and that AI would identify these gaps, propose alternatives, and do the necessary work for scoping, research, and feasibility checks for the technologies to be used.
Much to my disappointment, it did not do those things. It pretty much took what I (the user) told it to do with the tools — "this and that" — and pretty much made it its life mission to make it work without a single thought of researching if the technology combination that I indicated would yield a desired outcome.

Even if the specs for the software looked good in the editor, it did not help much since I hadn't told the agent to do "research" to find and catch these gaps. I am sure and 100% confident that agents also work well as soundboarding tools for developers. But yes, it takes a good amount of awareness not to skip the research and feasibility part of the software development lifecycle.

Go Data Extractor (success 🙌)

What is the problem I was trying to solve

For my second problem — in one of my tasks for setting up an end-to-end test tenant for two different systems, we had data on one of the systems and I needed to somehow backfill the users' data on the other. The thing that made it tricky is that the data needed to be in a CSV format so that it could be consumed by the system that I was aiming to load the data into.

The process would go as: (1) get all the users' data from system-1, then (2) transform and map that JSON data into a CSV format with the correct fields, and finally (3) load the CSV file into system-2 to backfill the user data on that system.

This is an internal process so I am making things sound simpler for brevity. There is also no complex data transformation involved halfway through process 2.

My thought process

Recognizing the need to create a tool for this since it would be something that other engineers on the team would stumble upon, I got to work on drafting the requirements for the tool that I wanted to build 🤩. It's a fairly simple use case and I think that it can be a stateless solution that would consume data from an exposed endpoint and just map it into a specified format.

This time, I just had one user story and that is: "As an internal system engineer, I want a tool that could help me extract all user data from an exposed API endpoint and transform the collated data into a specified CSV format. And it should be stateless (not store any user data) and accept only an authentication token from the user to be used for the API call."

Another thing that's different about this problem from the first one is that I knew exactly that the technology I would use would work and that there were ready-made and available libraries for it to be built 😼.

What I did (an overview)

I've taken the user story and created a draft specification. I have also collected sufficient data from the endpoint response as well as gone as far as to create an OpenAPI specification so that it would be easier for the agent to consume and reference during the planning and build stage. So I have written a lot of my files and created references, linking them altogether to the initial specifications draft file. And doing it this way in high hopes that it should steer the agent into having a clear understanding of what the end goal is and how success should look like for the tool.

I'd like to mention that whilst my first attempt of using speckit to develop the first timesheet tool failed, I am still using it with this second project in order to orchestrate the AI agent and leverage it to develop and build the tool for me.

To shed some light on this vague wording, here are the things I've done👇:

Created a directory with sample response JSON files from the endpoint to fetch the user data from
Created an OpenAPI specification file of said endpoint
Listed down the things that should be required on the endpoint that will be consumed by the user
Created a PlantUML sequence flow diagram on how the processing flow should work in the tool
Created conditioning and handling of edge case scenarios such as fallback logic for empty field values and how the implementation logic should handle mapping failures for each user's data
Created realistic scalability challenges and accounted for massive volume of user data to extract (10,000+ users)
Planned and scaffolded the architecture decisions using AI as a soundboard. I've gone back and forth with the AI agent on the architecture since I wanted the application to be stateless, use Golang as the programming language, use standard libraries, and make security one of the top requirements and a non-negotiable aspect of the tool.
Taken more time crafting and reviewing the specification. This has gone together with the architecture decisions, where I also included TDD practices and "Clean Architecture" principles incorporated.

Note that what I did differently this time is that there were back-and-forths between me and the AI agent 🤖. I did not follow the happy path of letting the AI agent one-shot the implementation with the requirements that I'd evaluated. Instead, I worked through with it on the challenges and gaps that it flagged as "needs clarification." There were multiple times that I had to go back and forth using /plan and /spec in order to direct the agent to dive into research, planning, closing the gaps, and enhancing the specification files of the project.

And so when it finally came time to tell the agent to break down the spec into tasks and implement it, it was more like a walk in the park knowing that all the gaps in the specifications had been properly scoped, clarified, and proven working — or at least achievable. No more implementation rabbit holes considering edge cases and many "what-if" scenarios in the feature flow. Just watching the agent do the work of a week in just under an hour.

And after it was done, it was still not a guaranteed clean one-shot success. There were failures when running some make commands due to some issues with it using an incompatible version of a Go image in the Dockerfile for running the application. And so it was another 20 minutes of Googling and looking up potential fixes. And when it finally got to run, there were still some bugs when I took it out for a few E2E testing — such as the path I'd given in the OpenAPI spec being incorrect and an issue with the data mapping in the tool when parsing the date format: "2022-06-09T05:04:20.048Z".

In order to address that, it was back to the drawing board with the agent 🔧. But this time, it was just adding more to the specifications and requirements. The good thing about spec-driven development is that you don't lose the documentation for the behaviors entirely or have it anywhere else 💡➕. It just lives together with the implementation in the project repository, and you can apply changes to the behavior if you want some kind of refactor to be done to the codebase. And so it was not very hard to account for the new requirements since the AI agent does a very good job of understanding and adding them to the existing ones 👏.

What was the outcome

The final output was a simple Go web service tool that would have an exposed API via localhost when run, and the user can call this API and receive a file in CSV format as a response. AI-driven development finally worked in my favor this time as the tool captured all the requirements and did what its expected behavior was based on the specs that were written.

I've used it against two test tenants that I needed to set up, with tenant 1 having >3,000 user data and tenant 2 having >1,500 user data. And the output CSV files from the two tenants were good and readily accepted when I proceeded to load them into the target system to backfill the user data.

As a bonus, I get to keep the history of the spec, the plan history and how it evolved into some of the workable items of the development, as well as comprehensive documentation of the architecture, data models and entity relationships, and a quickstart document that is easily understood by developers who want to use and get onboarded to the project. Not to mention that the project was made with containerization in mind from the start, meaning you can use it regardless of your environment and machine as long as you have Docker or an alternative containerization technology installed on your machine.

But since what I have made is an internal tool, I would not be sharing any links to the project in this post. If I do, I would risk revealing the structure of our internal system's workings and a few architectural secrets 🔒😶.

What I learned and what I think I should have prompted better

First off, using AI to one-shot development is a fallacy. There will ultimately be handholding and back-and-forths, and this is important in closing gaps and discovering if the strategy and approaches would work and identifying if they are the proper decisions. You can't just throw your requirements at the agent and expect a working product at the end. It doesn't work that way.

Second, you have to provide more concrete examples of how you think success should look like. Do the hard and dirty work of gathering acceptance criteria, creating the feature flow, doing discovery and scoping for the feature. The more things are clear to you, the better you can prompt to guide the agent to have a clearer vision of what you want built. If you yourself don't know what "done" looks like, then how would the agent know?

Third, hold the reins and focus on every step the agent is doing. They say that the real agentic engineering is granting AI access to every tool it needs and letting it run free. I would think this is horseshit. You MUST read and understand what it is doing step by step. I experienced times that the agent would ask me to install things just because it was not on my machine. Something that I've not approved. And so, I would advocate that you should only accept the agent to run a command if you fully understand it. The human should make the decision and advise the approach.

And lastly, stick close to SDLC practices. I believe that my biggest mistake on my first project is that I mostly devalued the research phase of the development. Not fact-checking if the technology I was thinking of using would be realistic enough to be achievable. And so I concur that the essence and principles of the traditional SDLC are still relevant in the "agentic" software engineering practice. It did not disappear — it just manifested in another form.

Summary of learnings and takeaways

So after going through two side projects with agentic development, one that flopped and one that actually worked, here's what I'm taking away from the whole experience:

AI is not a magic wand that one-shots your project. There will always be handholding, back-and-forths, and iteration. Expecting a working product from a single prompt is setting yourself up for disappointment.
Don't skip research and feasibility. My biggest mistake with the timesheet tool was assuming the tech stack would just work without verifying it first. The agent won't do that research for you unless you explicitly tell it to.
Spec-driven development is your best friend. Writing clear specifications before letting the agent code gave me a massive advantage. Tools like GitHub's speckit helped me structure the specs so that the agent had a clear picture of what success looks like.
Do the dirty work upfront. Gathering sample data, creating OpenAPI specs, drafting sequence diagrams, defining edge cases, and scaffolding architecture decisions before the build phase made all the difference between a failed project and a successful one.
Work with the agent, not just through it. The back-and-forths using /plan and /spec to close gaps and clarify requirements were where the real value came from. Don't just let the agent run free on the happy path.
Stay in the driver's seat. Read and understand every step the agent takes. Only approve commands you fully understand. We (human) should make the decisions and advise the approach agents take.
The SDLC didn't disappear. Research, planning, scoping, feasibility checks, and architecture decisions are still just as important in agentic development. They just manifest in a different form, through specs, prompts, and agent conversations instead of traditional meetings and documents.
It won't be perfect on the first run, and that's okay. Even my successful project had bugs on the first try. The key is that with good specs, fixing issues is just a matter of updating the specifications and letting the agent work through the changes.

DEV Community

Delving into the Rabbit Hole of Agentic Coding: What Went Wrong, What Worked, and What I Should Have Done Differently

Timesheet Chrome Extension (that failed 😞)

What is the problem that I was trying to solve

My thought process

What I did (an overview)

What was the outcome

What I learned and what I think I should have prompted better

Go Data Extractor (success 🙌)

What is the problem I was trying to solve

My thought process

What I did (an overview)

What was the outcome

What I learned and what I think I should have prompted better

Summary of learnings and takeaways

Top comments (0)