My Next Steps with Wilmer

#architecture #devjournal #llm

When I first started Wilmer, it was for a very specific reason: I wanted a semantic router, and one didn't yet exist. The routers that were available were all specifically designed to take the last message, categorize that, and route you that way. I needed more, though; what if the last message was "ok"? How do you route that?

At the time, Llama 2 had been out almost half a year, and there were so many finetunes- everything from coding to medical to language specific. Compared to the big proprietary models, these little models didn't stand a ghost of a chance; but when you got into their specific domain? They at least fared better. This made me think "What if a bunch of these finetuned models were to compete, together, against the proprietary giants like ChatGPT 4?" Combine that with managing speed by sending less important requests to smaller and weaker models, and you get the perfect combination to take on Goliath.

Somewhere in the first couple of months, workflows got added in. They were the result of a suggestion my wife made: let folks modify the prompts. Somehow that turned into chaining prompts, and before I knew it Wilmer was becoming as much workflow oriented as router oriented.

Perception is Everything

Early on, folks really struggled to see the value in the workflow side of it. n8n existed, and folks used it for targeted tasks, but using workflows to improve chatbot functionality? They were far more interested in either speed (meaning no overhead), or trying every agent they could get their hands on.

One of the most common things I remember hearing, when I'd say I had to wait 3-5 minutes for a response, but the response was great, was "I'm not waiting that long". Fair enough.

Eventually, that tune changed. Reasoning models came out and trained everyone to wait for their responses; suddenly waiting minutes for a really strong response, like Gemini's Deep Think, became a worthwhile endeavor.

I still think n8n's explosion of popularity specifically came from the introduction of reasoning models, and folks being willing to wait longer for quality.

And Then Proprietary Catches Up

When ChatGPT 5 came out, the big talking point around it was that it used intelligent semantic routing to send the prompts to the appropriate model, allowing them to mix fast and slow/powerful models for the perfect experience.

... why does that sound so familiar? =D

3 days ago, ChatGPT unveiled yet another amazing feature: workflows =D Their AgentKit, which people say apparently killed off a few thousand startups the day it released lol.

With that, proprietary models have pretty much consumed the existing purpose of Wilmer; and as you can imagine, a multi-billion dollar mega-corporation full of the best and brightest this country has to offer will produce a far more impressive product than a dev manager tinkering on the weekends =D

But you know what? That's ok.

Wilmer's Purpose

Wilmer was always about local. It will always be about local. Wilmer is about you being able to take a computer completely off of the internet, have 1 or many different LLMs on your network, and continue to do work and get the best quality you can out of that setup. Wilmer was never about trying to compete, on their own home turf, with proprietary models and tooling.

For as long as you can't load up ChatGPT's AgentKit or run ChatGPT 5 on a laptop in airplane mode, Wilmer has a purpose.

Plus, at the end of the day, this bad boy is a passion project. I make it because I use it. I improve it because I want to do more things with it. I won't stop because, realistically- it's just fun.

The Roadmap

There's still a lot I want to do. The offline wiki api was never meant to be the only offline search available to users. There are amazing datasets on huggingface for everything from coding questions to medical questions, etc etc. I want to add them all in. I spent the past 6 months doing a massive refactor of the codebase to make it easier to work in, cleaner, and to add high code coverage unit testing (it's up to 91%!)

I've added some proprietary model support (Claude support incoming) and Im working to swap out Flask for something better. I plan to make this thing more friendly for multi-user use, keep heading down the direction of weird stuff like recursive workflows and the like.

I've got some cool ideas for use-cases around it too that I want to try out, and make more videos on... but if we're being honest, I suck at making videos. lol.

So Yea...

I'm not stopping yet. Not by a longshot. Wilmer, and the future projects I'll create, aren't being designed to be hyper-popular or anything like that. But they have a niche; a specific goal, and I'll keep working within that niche for as long as I can.

DEV Community