One of the side projects that I've been working on recently was a multi-agent system to automate the process of creating ready-to-use feature complete templates for sale on Gumroad. It started off as an academic exercise and a good way to investigate agents, multi-agent orchestration, and get familiar with integrating with LLM APIs. I've since pivoted to something a bit more hands on but here are couple of learnings from the short attempt at creating a side project to generate some hybrid-income.
The code is available here for those who wish to check it out and maybe build upon it.
Lesson 1 - Spend time on your prompts and various different
configuration documents
Whether it be the initial prompt that you're using to generate tasks and ideas, or the various different configuration documents like AGENTS.md, it's critical to spend the time upfront in tailoring it as much as possible to your liking. This is where I didn't spend too much time initially and only later started seeing the impact of not having spent enough time on it. Apart from the guardrails that might be provided by your AI tool of choice or the guardrails you write into your system as part of building it, the prompts and configurations are the only times when you can heavily influence the direction in which the agents will go. So definitely invest the time up front to tailor it to your liking and also be very explicit about what you don't like as that is equally, if not more, important.
Lesson 2 - Context management is a pain
This is probably obvious by now for many people who have used AI tools for a while and there are certainly a lot of different options and ways of mitigating this. But it's especially difficult in a multi-agent scenario when a level of linear continuity is expected from one handover to another. An example here would be, if you had a collection of agents that simply need to share context to have awareness of what the question or task is while performing different duties, it may not always be perceivable or even important that 100% of the context is kept. Compacting in most such cases is good enough to give the perception or level of quality that is acceptable to the user but in a scenario where you need the handoff to amplify the context rather than compacting the effect can be detrimental.
For me, I first really noticed this when I needed to introduce chunking for the API calls to Anthropic as there were hard input token limits. So I iterated through on how to approach this by first just naively chunking and slowly getting to chunking strategies that were more context aware. This was particularly evident during the build stage where the lack of context that was created by a naive chunking strategy led to duplicate components being created and the integration of all the different pages and components led to a load of rubbish. As I made changes to more aggressive chunking strategies accompanied by additional context the quality of the output improved with more consistency across files and less QA time needed by me to review what was created.
I don't think I've perfected this for my self but I am actively exploring options for my pivot to better tackle this issue with a more robust review and intervention process.
Lesson 3 - Be very very explicit
I'm sure there are a lot of different schools of thought here and some people would be happy to delegate the thought and the decision making process as well but I personally like to have a bit more control over the process. I see the value of agentic-workflows being that it makes the process of going from an idea to a solution quicker but it still requires a decision maker which should be the operator - you.
I made the mistake of being somewhat vague in my requests like "follow the latest engineering trends" or "stay up to date with 2026 versions" which the agent interpreted to mean "take a guess at what's going to trend in 2026 and use that" which resulted in a lot of experimental features being used that require canary versions in what was meant to be a production ready template. This is the part where being a subject matter expert in a particular field can be very helpful as it would allow the operator to know which questions to ask and what guardrails to create. I've learned this the hard way and it's something that I will take into my next project.
Lesson 4 - Pick a tool that works for you
Given the rate of development and various different benchmarks it's quite easy to be tempted to constantly jump between tools. I've tried Cursor, Windsurf, VSCode with Copilot, Jetbrains with Junie, and tons of others. I've found some of them to be better suited for assisting with existing production / legacy codebase management but not so good at creating something from nothing. Antigravity on the other hand is an IDE that has opinions somewhat split but it's one that works for me for kickstarting new projects. So I'd say ignore the reviews, play around with the different options out in the market and create a workflow that works for you. Given the primary role of the operator is to ask the right questions cutting the distractions where possible worked well for me.
If this was interesting you can checkout my site on ssong.dev or follow me on GitHub.
Top comments (0)