DEV Community: GroverTek

Coding Agent Frustrations

GroverTek — Tue, 05 May 2026 22:55:46 +0000

I've been writing code a long time. The modern method of using coding agents (pi, opencode, etc.) has so much promise and can be made to do a fair bit of work. But these systems break down as the projects get more complex, larger, or even just more mature.

My primary frustrations stem from 3 main points:

My own skill/knowledge (or lack of).
My local coding environment (Ollama, local LLMs whenever possible, ~~cheap~~ free cloud LLMs, etc.).
Agent focus and capabilities.

I want to discuss the last point and my frustrations here.

Session Based

Coding agents tend to be session based. Token limits are the primary reason behind this. It also is a simple method to provide some organization to what can otherwise be a very complex conversation. The end result though is we need to stay very focused on one particular topic or problem domain. Straying from that topic leads to unnecessary token usage, and confuses responses. For example if your focus is how best to structure your data models, talking about color themes for design purposes briefly (or something else off topic) can then influence the rest of the responses even if the main data model structure conversation does not return to graphic design considerations.

One Project

Want to really confuse your coding agent, try working with two projects simultaneously within one session. Back everything up though - it is very likely to become corrupted. The issue here is the agents assume your current working folder is your primary workspace/project (in most cases). Asking the agent to modify files in other folders or projects can easily lead the agent to get confused about which folder or project it should place new content in. So again we are back to being very focused on a single topic - our current folder/project. We need to exit the agent and start a new instance in a different folder or project to shift focus to that project.

Natural Language pretense.

Try talking to a person the same way you interact with the agents. My wife has strongly informed me this is a bad approach. The problem being that our person to person conversations rely on past memories and a shared context the words may not explicitly define. And we shift topics rapidly and constantly. We just can't talk to agents like that at this time. Doing so shows the limits of technology.

Imperfect Understandings

Words and phrases often have different meanings. Or the current folder may not be where the work needs to happen. The LLMs sometimes misunderstand our requests and undertake actions that lead to errors.

Using a Plan Mode helps. This is where you can have the LLM discuss what it wants to do and how to do it before any changes are made. Plan mode helps with ensuring your intents are understood and gives us a chance to see the misunderstandings early before code is written. However this understanding issue still creeps into the execution/build phase periodically. Asking the agent to undo a misunderstood direction takes time and great care in how you word the instructions. This is frustrating in that it detracts from the main focus/goal.

Code quality.

There is a trade off in play here.

Does the generated code accomplish the desired results
Is the code written in a way that considers security, best practices, in-house coding standards, etc.
Was the solution implemented the way you would do it? Is it close enough to not matter?
Is the work definition something you have sufficient expertise in to make these judgements? Or do you need learn more before judging this code.

Getting this trade off balance right can be tricky. Going "YOLO" can be risky. Inspecting everything can be slower than just writing the code yourself. The right answer is somewhere in the middle, and changes for every function definition.

Getting past the frustrations

I see a few ways to get past the frustrations:

Patience. Stay Calm and Carry On. Accept that frustrations happen and deal with the issues as they come up. This seems to be the designed for solution in the industry thus far.
Fix the issues. Contribute to the agents to work on parts that lead to the frustrations. Improve the technology. If you have the time and know how that is.
Use a different agent and/or LLM. Some agents are better at some tasks than others. Some agents respond better to specific workflows.
Create your own system that solves the issue. This is more effort up front but might pay off in the long run.

Note: I'm aware of the tools like Hermes Agent, OpenClaw, etc. My experience still shows the same frustrations though, even if they are more capable tools.

Based on these options I'm choosing the last option. This builds on my existing research projects and helps improve my overall understandings.

Next time I will talk about how I'm building a system that can address these frustrations:

continuous discussion without sessions
awareness of previous topics and AI Coding sessions
multi-project capability from the start
a personal "valet" to work with you

The foundational elements are in place. Next I need to develop the "product" to make it useful. More on this in my next post though.

Stop Configuring the Same LLMs Over and Over: Introducing LLMC

GroverTek — Thu, 23 Apr 2026 22:23:50 +0000

As I dive deeper into the world of LLMs and AI Agents, I found myself trapped in a tedious loop: every time I tried a new tool, I spent an hour repeating the same setup process. I'd find the models that actually work for my workflow, only to manually copy those configurations into every new agent I installed.

I finally hit a breaking point and decided to automate it.

Introducing LLM Chooser (LLMC)

I created LLMC (LLM Chooser) to serve as a "single source of truth" for my AI model preferences.

Instead of updating five different config files, I define my preferred models in one place. LLMC then automatically syncs those preferences into the configurations for tools like opencode, pi, and other coding agents.

Check out the README for the technical setup, but I want to talk about why this felt necessary.

The "Artificial Lock-in" Problem

Currently, the AI agent landscape is incredible, but there's a subtle, growing pressure toward "ecosystem lock-in."

Take Claude Code: while you can use other models, there is a persistent nudge suggesting that things "just work better" if you stay within the Anthropic paid subscription. We see similar patterns with GeminiCLI, Qwen Code, and OpenClaw.

This pressure gives me pause because I've seen this movie before:

Networking: The battle between Novell and Banyan Vines, which eventually gave way to the open standard of TCP/IP.
Operating Systems: The fragmented era of DOS, AmigaOS, and OS/2 Warp before Linux and Windows solidified.
Web Dev: The endless cycle of "The One True Framework" (React vs. Vue vs. Next.js vs. Astro).

Corporations want conformity because it's profitable and easier to manage at scale. But for the individual developer, conformity often means giving up liberty and choice.

Resisting the Walled Garden

Using a hosted AI provider is convenient. But that convenience comes with a trade-off: your logic flows through their servers, and you hope they aren't using your data for training. If their service goes down or their pricing pivots, you're stranded.

By decoupling my model preferences from the agent itself, I'm reclaiming a bit of that control. I don't want my "preferred stack" to be dictated by the tool I'm using; I want the tool to adapt to my stack.

How LLMC Works (and where it's headed)

LLMC manages a ~/.config/aimodels folder containing three key files:

providers.json: System-level provider configuration.
agents.json: Definition of which agents are active.
models.json: Your curated list of "gold standard" models.

Any model in your models.json is synced to your desired agents. This means my preferences are now portable and independent.

The Road Map

It's early days, and there are things I want to improve:

The UX: The CLI is currently a bit clunky. I'm considering a TUI (Terminal User Interface) or a simple web dashboard to make model selection more intuitive.
API Key Security: While I use keyring to avoid plain-text storage in LLMC, the agents themselves often require plain-text keys in their own config files. I'm exploring how to resolve this "last mile" security gap.
Agent Expansion: I'm working on verifying compatibility across more agents like Hermes and OpenClaw.

Is this just "scratching my own itch"?

I've built this to solve my own frustration, but I'm curious if others feel the same. Do you find yourself fighting with config files every time you switch AI tools? Or is the current "walled garden" approach acceptable for the convenience it provides?

If this resonates with you, I'd love for you to clone the project and give it a spin. I'm looking for feedback, bug reports, and suggestions for other agents that should be supported.

👉 Check out LLMC on GitHub

Why I Can't Fully Automate With Ollama (Yet)

GroverTek — Sat, 11 Apr 2026 23:14:38 +0000

Intro

Ollama works. It is great. At the same time I can't trust it for full automation tasks. I'm posting this for others who may be looking to do the same things, and hopefully get some feedback educating me where I'm going wrong with my setup.

Background context

I'm running Ollama, installed via the standard curl method and configured to respond on all network addresses for the host (i.e. 0.0.0.0 vs 127.0.0.1). I can connect to the ollama services from opencode, hermes-agent, open webui, and others. In simple chat mode, this is fantastic and gives me a great starting point. I'm a developer though, and like most developers I want to automate the boring stuff, er, I mean I want to find ways to make more efficient use of my resources. This is where I am finding limitations of the Ollama ecosystem.

To make ollama work reliably you are limited by your hardware. In my case I'm running a 4070 Ti Super with 16GB VRAM. Not the latest and greatest, but meets my needs in every other way - other than the limited VRAM. Still 16GB isn't a bad starting point. Ideally I would use models that can be loaded inside 16GB, including room for the context window (KV Cache). So this requires me to aim for models that are approx 10GB in file size, allowing for me to define a 32k or large context window as needed. It turns out finding a good and capable model that fits into 10GB is feasible, but there are trade offs. For instance I can easily run a smaller model like llama3.2 which only takes up about 2GB, but I lose the reasoning and complex tool usage capabilities.

Newer models such as qwen3.5, gemma4, nemotron, and others have arrived on the scene and are very capable and some can be made to fit into that 10GB scale. These can handle many of the more complex tasks and reasoning, but only for a short term. Yet using a cloud based equivalent of the same models works just fine over long periods.

So the question is why I can use these models via the cloud in a more or less reliable fashion (at least until I hit daily rate limits), but running the same or similar models locally with the same tasks are less reliable.

Disclaimer: It is not my goal to diminish the awesome work of those working on Ollama or the LLMs in general. No, my focus here is how this impacts my planning and implementations.

The problem

The core issue is tool calling, it seems. It looks like Ollama is expecting specific responses from the models to indicate when a tool should be called. However in some cases, the models return a tool call INSIDE the thinking block, which breaks what Ollama is expecting. When this happens the model ends up waiting for a response to the tool call, and ollama is waiting for a thinking block to end, followed by a call to the tool. So we have a deadlock and the process just stops.

This is a known issue - see a subset of the issues at:

I've used open source for a long time and know these issues will eventually get resolved or worked around. But I don't know how long that will be, and I have projects on the go right now. So I need to find a way to work around this. Luckily there are some options.

The secondary problems

Privacy:

Part of the goal of running Ollama locally is to find a way to process data in a way that is not passing intellectual property to a third party that has a vested interest in making use of all the information it can access. Essentially, we have to (potentially) give away our business edge in order to make use of the subscriptions. That's not a palatable solution in most cases. There are licensing agreements and such that are supposed to protect you from such abuse, and for the most part this can be considered an extreme edge case that can be rationalized away. However the reality of today's world is that once the data leaves your network you no longer control it. Abuse happens - intentionally or not. That is a real risk decision makers want to know about as it can cost them large sums of money in privacy leaks, lost opportunities, and more.

Rate limiting:

With a limited budget and a privacy oriented mindset, I'm hesitant to adopt paid services such as subscribing to OpenAI, Anthropic, Gemini, etc. Rate limits still apply to subscriptions though - they are just delayed or happen much later. It would be too easy though to do a loop over the contents of a database table run some process that calls the LLMs, and still hit a rate limit before we reach the end of the table. Or get hit with a HUGE cost for doing a simple operation.

Stalled processing:

Imagine we have a process that works well with our initial test data. So we turn that process loose an run it against the full database. And it turns out that one (or more) of the loop iterations encounter a situation where we experience to Tool Calling deadlock issue described above, Or we encounter rate limiting. There is a real chance our process just stops and waits. Essentially we now have a dead thread. So now we need to put timeouts on the process, adding complexity. Even if we have all the correct error handling, request throttling techniques, etc. There is still a chance the process breaks. Now we either have a stuck process, or data that is only partially processed. We cannot rely on our data unless we know the process ran successfully. Yes we can alter the process to handle partial completion, again though we are adding complexity. I'd rather just have a process I can rely on in the first place. That's the magic of wrapping complex steps in a function call. But this stalled processing issue makes that black box magic much harder.

The workarounds

1. Don't use Ollama.

Instead call another provider directly (Anthropic, OpenAI, etc.) (Note I am intentionally ignoring using a cloud based model through Ollama - I have not yet tested this but suspect the issue is with Ollama itself, not the models used regardless of the source)

Pros: This usually "Just Works" and allows the process to continue
Cons: Loss of privacy, still subject to rate limiting, possible unexpected costs, and potentially still dealing with stalled processing due to tool call issues.

This is a trade off, and usually (for me) the solution. In my case, I shift to calling opencode directly with a Zen model (like Big Pickle), instead of using the Ollama based model. This means I need to implement delays between requests to mitigate the rate limiting concerns though and that means a process that would normally take a few minutes might now take many hours.

2. Change your prompt.

Ensure the prompt you use to call your models make it clear what to respond with. Perhaps disable "thinking". Use structure output (JSON/XML). And so on. If you can get your prompt correct your local Ollama model may never see the problems described above.

Pros: Easy to implement
Cons: There are no guarantees you will never see the issues crop up. You are only left with a slightly higher degree of confidence in your process. So you still need to test and monitor for the problem states.

3. Contribute to Ollama

Review the existing issues and contribute a solution. Assuming you have the skills and capacity to do so.

Pros: Possibly fix the issue for EVERYONE. Contributes to the community. Gets you lots of "street credz" and respect.
Cons: Not everyone has the skills for this (I feel I fall into this class of developers). And even those who do may not have the time, inclination, or expertise to contribute.

The end result

Ollama is a great tool. It works well for many of my use cases, depending on the model choice. For instance the project I allude to in How I'm Using AI Agents to Find My Next Product Idea calls Ollama with the "ministral-3:8b" model (adjusted for 32K context window) to process thousands of records and has been reliable thus far in this limited use case. But I had to tailor the prompts involved extensively to improve performance and avoid issues. Even there many of the records experience a timeout that may be caused by the tool call issue.

The problems still come up though. Whether in an interactive session with opencode / hermes, or in an automated coding project. Take into consideration I'm being overly general in my observations here and a good deal of this might be a user skill issue. However, because I keep running into these issues, I do not feel I can fully rely on Ollama for my more complex processes. Yet. I do see lots of potential though and will continue with my efforts. And I know that overall Ollama, the models, and the entire ecosystem will sort itself out and hopefully this becomes a quaint memory of the early phases of AI.

I have not seen a lot of discussion on these topics in the community. There are obviously those who are hitting the issues, but I'm not seeing a lot of blog posts, video references, etc. to this. Does this mean most found away to resolve the issue? What are your experiences? Anyone else see this issue crop up?

Running Gemma 4 Locally with Ollama and OpenCode

GroverTek — Mon, 06 Apr 2026 00:08:48 +0000

First steps:

The usual first step with getting Gemma 4 running on Ollama is to pull the model:

ollama pull gemma4:e4b

See the available models and select the correct version for your system.

The e4b variant is a good starting point if your hardware can support it.

Use the ollama list command to ensure your version is now available to Ollama.

Testing

Now, run the model to ensure it works as expected:

ollama run gemma4:e4b

Ask a simple question or just say "Hello", then use /bye to exit.

Immediately run ollama ps. You should see something like this:

NAME          ID              SIZE     PROCESSOR    CONTEXT    UNTIL              
gemma4:e4b    c6eb396dbd59    10 GB    100% GPU     4096       4 minutes from now

Pay close attention to that CONTEXT value. If you see 4096 like this, then Ollama is using the default 4K context window. This will bite you when you try to work with the model in OpenCode. Symptoms of the small context window might be the model constantly stating "Just let me know what you want to do", or similar. The cause is the system prompts eat up the bulk of that available context space and your prompt gets truncated, skipped. To fix this we need to use a larger context window.

OpenCode has support for specifying a context window size in its configuration, but I have not seen that work with Ollama based models. Instead, we need to set a different context window size within Ollama. We create a new version of the model with the desired context size, then use that model in OpenCode. The easiest way to do this is by using Ollama itself.

ollama run gemma4:e4b

# then within the Ollama model prompt run these commands:
/set parameter num_ctx 32768
/save gemma4:e4b-32k
/bye

# then confirm the new model is in place
ollama list

A note about the num_ctx value: It should be divisible by 2, or more to the point it should be a power of 2. In this case a 32K context window is '32768' bytes. You can experiment for your use case/needs to see if a smaller 16K window would be sufficient, or if you need a larger 64K or 128K window. Keep in mind that bigger values means more VRAM / memory usage - adjust for your hardware as needed.

The /save model name is arbitrary - give it a unique name that works for you. I normally just add on "-32k" or whatever size context I've adjusted for.

Using the model in OpenCode

I assume you are familiar with opencode, and will skip most of the tutorial level items.

We need to tell OpenCode the new model is an available option. We do that by adding an entry into the opencode.json file - either globally (mine is at ~/.config/opencode/opencode.json) or in your specific project folder.

{
  "$schema": "https://opencode.ai/config.json",
  "default_agent": "plan",
  "compaction": {
    "auto": true,
    "prune": true,
    "reserved": 8192
  },
  "provider": {
    "ollama": {
      "name": "Ollama",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://localhost:11434/v1"
      },
      "models": {
        "gemma4:e4b-32k": {
          "name": "gemma4:e4b-32k",
          "_launch": true,
          "id": "gemma4:e4b-32k",
          "tool_call": true,
          "options": {
            "temperature": 0.1
          },
          "maxTokens": 16384
        },
        "ministral-3:8b-OC32k": {
          "_launch": true,
          "id": "ministral-3:8b-OC32k",
          "name": "Ministral 3 8B (32k)",
          "options": {
            "temperature": 0.1
          },
          "maxTokens": 16384
        }        
      }
    },
  }
}

This example may not match your specific setup. The important part is where the gemma4:e4b-32k model is defined. You can copy/paste that block and then just modify it to fit your system if needed.

"name" can be made more friendly if you'd like - i.e. "Gemma 4 (32k)". This is what you will see when selecting a model in opencode TUI.
"id" is what you will use to reference the particular model - for example if you were to call opencode run PROMPT --model gemma4:e4b-32k from the command line.
"_launch" - I don't know if this is really needed, but I have had better luck if it is set to "true". When you call ollama launch opencode this is the value that is set.
"tool_call" - if this is not defined and true, then Gemma will run but will just stop when it tries to use the "task" tool call.
"options" - set as you need for your use case. There is a lot of documentation on options available.
"maxTokens" - set this to a reasonable size that will cover the size of the output you expect from a prompt.

Using OpenCode

Once you have the opencode.json file configured you can launch opencode:

cd project/folder
opencode

This launches the opencode TUI. Once in there use the model selection tool - either via ctrl-p or ctrl-x m. I use the "ollama" text to filter to my ollama based models. You should now see the gemma4:e4b-32k model listed. Select it.

Now you are using the local Gemma model. Interact with it as needed, ask questions or build your empire. The performance will only be as good as your hardware, and you will experience some issues you wouldn't if you were using a cloud instance of the model. But for most use cases this should be up to the task. The performance is reasonable on a 16GB VRAM based system and only sees a second or two delay for a response once the model has been loaded.

Observations

I've only just started trying to use Gemma4 for real world tasks. It is capable, but I can see I need to guide it a little more than other models - like the default Zen Big Pickle model. But that is not an apples to apples comparison either. Still so far the model is capable. The real task will be when I try to use it with my market analysis project.

Conclusion

I hope this has been helpful. Let me know if my information is incorrect or can be improved.

What's your experience with running LLMs locally? Let me know in the comments.

How I'm Using AI Agents to Find My Next Product Idea

GroverTek — Sun, 05 Apr 2026 00:36:17 +0000

I had no domain expertise, no budget for market research, and no desire to spend months learning an industry manually. So I did what developers do: I built a tool. This is a diary of using AI agents (Ollama + opencode) to automate market research - what worked, what didn't, and what I'm still figuring out. If you're a developer curious about using AI agents for something beyond toy projects—this one's for you.

Introduction

Am I shooting myself in the foot? Did I miss an "easy" solution? Or am I on the verge somthing good? These are the questions that are currently haunting me.

I found myself in a situation where my projects were wrapped up and I was waiting for the next big task to come along. I hate waiting. So I decided I will build a product the industry needs. But what product? What industry?. That's a large topic and my local area has a number of industries. My expertise is in software development and technology. I want to gain expertise and knowledge in a different domain and improve that whole "what value do I bring" topic.

So I need help doing research. I need to determine what industries are available, what software tools already exist for those industries, what are the limitations of those tools have, and what do the users really want from the tools, or really hate about the tools. I feel with that knowledge I can identify the "right" product to build that aligns my skills with a demand from an industry that will likely value the solution.

That's a tall order. Being very independent minded and budget aware, I don't really want to pay for market research, and most of the information I can find is a year or more out of date. If only I had a tool that could do current research for me... Wait! I'm a developer, I can build that. And the concepts sound pretty simple. And using an interactive AI Agent like opencode can make it even easiser. (I think this is what is commonly referred to as "foreshadowing" - easy is very subjective)

So I've built a system. Or at least have the starts of one. It is working but I've identified issues. I'm sharing here to give the foundation and background of the problem space, and see if I can get some community feedback.

I've enjoyed this project thus far as a way to research using AI for real world tasks. And I've learned a lot - I think the potentials are awesome. But I'll leave that assessment to you.

What I'm trying to build (and why):

I want to build a tool that can research, analyze, and present useful information to me. I'll use this information to decide what product/tool I will build, what features I must include, and what magic I can include to set the tool apart from the incumbents. The additional information I gain can guide how I try to market the product or how to build the support mechanisms/business behind the tool.

I could just hit the govenment statistics sites and get the list of industries, their market details, and possibly even a list of tools. Make a choice and dive in. But these resources are often out-dated the moment they are published, and I'd be left stapling my needs ontop of their use case which is often not a good fit. And I might be wasting time on a tool that already exists or isn't wanted. So I want to do this right and know that I have improved my chances of success before I write the first line of code.

I want to see if AI can be made to give me relatively current information - days, or weeks old, not months or years. The data gathering step has a lot of subjective needs involved where AI can shine. For instance the simple task of getting the "features" for 10 random software tools - the information cannot be guaranteed to be available at a "/features" page every time, and even if it was it cannot be guaranteed to use the same HTML/CSS structures, or even the same tone or categorizations. What one tool calls "user mangement" may be referred to as "distributed team controls" in a different tool. So the simple act of finding the right information is a challenge for traditional discrete logic approaches.

I also want to keep as much of the processing "in house" as possible. The use of subscription based AI models is leading to inadvertent sharing of the secret sauce that makes a company unique. (i.e. the Intellectual Property). It also tightly couples the business to the subscription, which cannot be easily changed in some cases. It is like putting your business logic into stored procedures or custom database functions. Works great - until you need to change the database system due to licensing or end of life situations.

The Stack

I've chosen to handle this task using the following tools:

Ollama + the Ministral-3:8B model. I found this model gives me the reasoning capabilities and tool handling abilities to work well with my current hardware and problem space. Other models like Qwen3.5 or Nemotron-3-nano run into issues - either a lack of reasoning capability, or failures in tool use. Even then I found I had to ensure Ollama would use a 32K context window for the models to begin working correctly. In cases where I need just some simple reasoning and not tool use, I found the Llama3.2 model - with a 32K context - works reasonably well to give me a structured response.
opencode. I use this CLI agent to handle the calls that need the reasoning and tool use. By running opencode run "Find the features for {product}" I can get the websearch and webfetch tools applied without having to set up my own Playwright interpreter. Other tools like Claude Code, GeminiCLI, etc are reasonable candidates here. I've tried a bunch of them, and found opencode "clicks" for me much better than the others.
Python - I've chosen to let the AI create python tools. While I could do other languages, I'm finding the models tend to all "just work" with python or even expect python to be the language of choice. And I enjoy using python as the happy middle ground between low level C/Rust/Go and the higher level frameworks like React/Vue/Astro, etc.
SQLite - small compact database needs. But using SQLModel and SQLAlchemy to allow upgrading to Postgres later if needed.

The free cloud models are great. I think the "Zen Big Pickle" model is awesome for the tasks I'm throwing at it. But it does have a rate limit. I can only use so much before I'm blocked and need to wait some time before I can use it again. Because of this rate limiting, I have set up a handful of "free" models I can use when needed. When I hit the rate limiting, I just switch to a different free model and continue my work. This is not really an option though when you have automated code running. It is not always easy to see if/when you have been rate limited. Letting code run over night only to find out you hit the rate limits in the first hour is not a pleasant experience. Using Ollama with a capable model removes the rate limit issue altogether. I can do this with opencode by calling `opencode run PROMPT --model "ollama/my-model-of-choice". I do need to ensure opencode is configured to know about that model first though.

The Workflow

Data Gathering

Using opencode, I asked for a list of industries in my geographic area. This came back with the usual high level things like "Oil & Gas", "Energy", "Healthcare", etc. So I had to ask for some expansion to get into some of the sub-industries. That eventually gave me a list of 100ish industries that served as a starting point.

I then had a discussion with opencode covering what we were trying to do and planned out how we might structure a database to handle our needs. Once we had that fairly well defined, I had opencode create the database and populate the initial industries into it.

Next I asked opencode to create a python project that could discover the software tools used by the industries. I suggested the interface for that tool to be via a uv run app research discover command. It would loop over the industry records and run opencode run PROMPT --model MODEL for each, where the Prompt was the specific instructions needed.

Snag #1. The code worked and it would try to find the software tools. But it was trying to find ALL of them in one step, and would lead to timeouts. I resolved this by only getting 10 tools at a time and altering the prompt to skip those we already know about. I could run the command a few times to fill out the remainder tools, allowing the database to become more complete with each run.

Now I had a list of software tools (currently sitting at approximately 1400 tools). Each tool is tied to the industry(ies) that used them. This was a good starting point. If nothing else this gave me a place to start manual research if needed.

Next I asked the code to generate another command. This time we wanted to find the features, pricing, and complaints for each tool.

Snag #2. The process worked, but could take up to 10 minutes or more per tool. This just seemed wrong as I know I could simply load the tool's web page and cut/paste the details in a minute or two. Digital is supposed to be faster than manual. After digging into this I realized what was happening. Each "tool" process would result in many different web pages being loaded, interpretted, and then fit into the structured output we wanted. Each page load takes a little time and may or may not result in useful data. The solution then was to separate the tasks. Instead of asking "tell me about {tool}", I broke the requests into "What are the features for {tool}?", "What is the pricing for {tool}?", and "What are the complaints for {tool}?". So now I had commands like uv run app research features. This lead to next snag.

Snag #3. This was a structural issue. One big loop to do work for every tool in every industry. And if the process stopped in the middle we had to start from the beginning again. To solve this I introduced a "queue" system. Each "task" could only work with one tool. The task had to be marked "completed" or "error" accordingly. Restarting the research process would always then start at the next "pending" task. Then I created a uv run app research run --limit 10 command. This would process the next 10 tasks in the queue. If I set the limit to something large like 999999, then the entire queue would run.

It took a day or two to run through everything then, but afterwards I had approx 27000 features, 7500 complaints, and 2000 pricing entries.

The only real performance item left is getting the features. My prompt for this is currently:

IMPORTANT: You MUST use web search and fetch tools to find features. Search the tool's website, or other sources. DO NOT search local files - they contain no relevant data.

List the key features and capabilities of "{tool_name}"{industry_context}.

Provide ONLY the features list:
{{
  "features": ["feature1", "feature2", "feature3"]
}}

List as many features as you can find.

This is still a broad area and results in a lot of websearch/webfetch steps. But when a successful run is encountered, it is only taking a minute or two per tool. There are a number of failures though that end up seeing timeout issues, if not actual errors. But this was expected - some websites use anti-bot techniques, some use authorization requirements, and some just don't give you details - the amazingly helpful "contact for more details" statement is SOO useful. Luckily that is mostly on the pricing side though.

Question: let me know if you can see a way to improve this prompt or get the current feature list for a tool in an easier fashion.

Analysis

Now that we have some basic information, we can begin to analyze the results.

The first issue is the different ways to say the same things - especially for the features and complaints. I have attempted to use a sampling of these records to generate categories, and then fit every entry into one of the categories. This helps but suffers from a bias. Seeing as the bulk of the tools I have found thus far are in the "Oil & Gas" and "Agricultural" industries, I end up with categories like "Blow out management", or "Livestock management". I'm experimenting with different ways to get refined categories. I might need to go down the path of doing multiple passes on the data, generating a category for each record, de-duplicate categories, finding the top X list of categories, and then revisiting the records to fit into the smaller list of categories. But there is a balance needed here - perhaps "Livestock Management" is an important category in some industries. Which suggests then that maybe I need categories per industry... Stuff to explore.

What I'm really looking for here are things like "user management", "role based access control lists", "regulatory compliance with OHSA", etc. This tells me what need to build into my future product beside the obvious "user management" items.

In the process of working on this I had the realization that I did not need reasoning and tool calling capabilities here. So the simple categorization process could be handled by a much smaller LLM - like llama3.2. Doing so reduced that categorization process to a few seconds per tool versus many minutes when using the more capable model.

Knowing I would need to revisit this exercise in refining my categories, I forged on.

Next I asked it for some simple charts. I ended up with feature frequency and feature heatmap charts. These are very interesting, but will be more useful when the categories can get refined to a reasonable sized list. Presenting a chart with hundreds of columns/rows is noisy.

I also asked our analysis process to give me a list of industry reports and tool reports. This shows promise but is not especially helpful yet. These reports are simple markdown documents and are intended to provide the details a decision maker would need.

This is an example industry report in its current form:

```markdown
# Accounting & Bookkeeping - Market Analysis Report

## Overview

- **Tools Available**: 10
- **Feature Categories**: 7
- **Complaint Categories**: 9

## Top Tools

| Tool | Vendor | Features | Complaints |
|------|--------|----------|------------|
| Tipalti | Tipalti Inc. | 79 | 12 |
| Sage Accounting | N/A | 61 | 20 |
| QuickBooks Online Advanced | Intuit | 29 | 9 |
| Deel | Deel, Inc. | 56 | 9 |
...

## Common Features (by category)

| Category | Count |
|----------|-------|
| Automation and Integration | 162 (38.2%) |
| Data Management | 82 (19.3%) |
|...

## Common Pain Points (by category)

| Category | Count |
|----------|-------|
| Pricing and Cost | 21 (15.9%) |
| User Experience and Interface | 16 (12.1%) |
| Technical Issues and Bugs | 16 (12.1%) |
...

## Visual Analysis

![Feature Distribution](../overview/industry_feature_heatmap.png)
![Complaint Distribution](../overview/industry_complaint_heatmap.png)

---

*Report generated on 2026-03-30*

```

This needs more detail. The industry market size, number of businesses in the industry, etc. The usual data you would need to get a more complete assessment for the industry. That needs data we have not collected yet. So we'll have to revise the industry research process to gather that detail.

Presenting the information via charts would probably be helpful too.

Note: I need to be careful here - I don't want to post one of the sample Tool reports as that could potentially be inadvertently defamatory to the tool, possibly with incorrect information, and I don't think I'm in a position I can publicise anything yet with any authority.

The Tool reports have a similar format as the industry reports. I'll have to revise how these are getting generated. One file per tool adds to 10s of thousands of files in one directory. This might need to be an on-demand generation instead of a bulk generation process.

Recommendations

I have not automated the recommendations side of things yet. I feel I can't do that until I have the reports getting generated better. But what I do have thus far is showing some value in that I can get a feeling just how saturated an industry is, or if the nature of the tools an industry needs is within my capabilities.

With that said, I do want to get to a point where my solution can do an automatic analysis, perhaps doing SWOT, or similar, and suggest the top 20 products that might be reasonable to undertake.

What Worked

I have a list of tools. This list is incomplete, but grows each time I run a discovery process. That discovery process does not use a subscription based LLM. Other than requesting public web pages, everything about that research is performed using my local computer.

I have some details gathered for each tool. This is improving on each run as well. I can manually review and assess those tools now, without having to go visit each one directly.

Using AI via OpenCode is working well. Both as a development tool and as a part of the developed process. It is interesting using it this way. And experienced developer can curate what is getting built and probably be successful. At the same time a newer developer - or even someone with zero development experience - can build applications as well. The second case would lean more into the vibe coding approach though. Whereas the expereinced developer is having the computer do the typing for them - faster but still managed and personal.

What Needs Work

Code cleanup. I don't think the code is in the "slop" category (yet). But I do need to reorganize it. The list of uv run app ... commands is getting excessive.
A web interface might be useful here. Even if it is just on browsing the data.
Data gathering improvements. We need some data we are not currently collecting. How we collect the current data could be improved.
Analysis improvements. The analysis is not complete. We need some more data collected. We need a better classification system - probably a more complete hierarchical taxonomy system.
I've recently learned about external tools like kaggle. We may be able to utilize these resources to improve the data acquisition and analysis phases.
Better reporting. I'd like the reports to be useful. If you are a founder, or decision maker, the reports should give you the information needed so you can make informed choices about what directions you decide to go, or avoid.
Recommendations. This is more for me than anything. But it would be awesome if I can get the system to give me a list of 10 or 20 possible new applications/tools that would be well received by the industry. Then I can perhaps build a business around that project.

But this all comes down to time, focus, and perhaps money.

Lessons Learned

"Simple idea, complex project" - The concept is simple - just get a list and compare it. The details of doing so, especially if you want CURRENT information, is often difficult and has a number of hurdles. Massaging that data into a form you can work with is a complete task unto itself. And generating useful reports or recommendations will be an ongoing process of improvements.

Open Questions

Can I speed up the data gathering steps? Especially with regards to tool features.
I need to research different charting techniques to provide useful information. The data I have suggests more nuanced comparisons can be done - perhaps doing a correlation between specific features with complaints and pricing, thereby identifying "useless" features "or the "oh my god I really want this to work right" features. So far, the frequency bar charts, and heat map charts are useful, but seem to miss a lot too.
Would it be useful to setup a "changes over time" system here? It seems like it wouldn't really apply for what I'm trying to do, but I can see concepts like "up and coming features" versus "diminishing importance" features could be handy. An over time element could help with that analysis but introduces a lot of complexity.

I'm at an inflection point. I know I need to gather more data, and need some different data. I find myself questioning if the database structure I have is sufficient for what I'm trying to do. But I'm also questioning if "what I'm trying to do" has changed since I started the project. But this is more of a personal and philosophical question. When I can answer it, I'll know if I need to revisit the structures.

Next Steps

This is ongoing.

The interface needs to be streamlined a little . I'd like to be able to say "start" and have it just run, collecting data and improving the dataset over time. This is not the case though at the moment. I have to manually indicate I'd like to discover more industries/tools, then when those are done I need to indicate research the tools we don't have data for yet. Then run the analysis steps individually, etc. This can all be made more automatic with a little effort.