Hey everyone,
I wanted to share a little side project I cooked up over the last week. So, long story short, I only started diving into the LLM world in February, and honestly, it’s been a wild ride. I started with LM Studio, but as many of you know, by the time you get comfortable with one tool, a new "insane" feature post drops on r/LocalLLaMA and the software is already playing catch-up. I eventually settled on using plain llama.cpp because it seems to be the gold standard, but I kept hitting a wall: the update cycle is so fast, and manually updating it feels a bit ... clunky, especially since there's no integrated updater bundled, especially for those juicy new beta versions that get released so often.
So.. about a week ago, while watching The Wire (adhd at its finest), for some reason I had the idea that basically: Why isn't there an nvm but for llama.cpp?
Coming from the Node.js world, I was missing the simplicity of nvm, so I wanted something that lets me swap, install, uninstall and manage versions on the fly without a headache. So, alongside Claude and my local Qwen 35B (mostly Qwen), I decided to "vibe code" it into existence (I can't believe I'm using this term). The models suggested Go (since it's great for CLI tools), and even though I don't actually know how to write a single line of Go, we made it work.
The gist:
It’s a lightweight version manager that handles the heavy lifting for you. Instead of hunting GitHub releases, you just do:
-
lvm install latest(Gets the right build for your GPU) -
lvm use(Switches active version, there's a selection prompt) -
lvm ls(See what you've got installed)
It uses "shims" to make sure commands like llama-cli or llama-server always point to whatever version you currently have selected as active. So no more manual PATH hacking every time a new build drops. Now, I understand that many people use docker to create containers of different versions and whatnot, but I wanted something simpler for the regular guy.
Disclaimer:
This is a "vibe code" project. It took me about a week, and while it works surprisingly well for what I need, I am definitely not a Go developer. There are edge cases to polish, more testing to do, and things I probably overlooked because I don't know the language deeply. I don't want to spend too much time on this, but I wanted to contribute something small back to the community, at least for the time being. If there are any Go wizards out there who see potential in this, please grab it! Star it, Fork it, fix the bugs, polish the edge cases; help me turn this from a "fun experiment" into a polished tool.
Check out the repo here: https://github.com/asertym/lvm
I’d love to hear what you guys think. Is this something that would actually make your workflow smoother, or am I overthinking a problem that doesn't exist? And again, if anyone who actually knows Go wants to take the reins and turn this into something robust, I would be incredibly stoked.
Let me know your thoughts!
Top comments (8)
Version managers are weirdly the most underrated category of dev tool — nobody writes blog posts about them until they exist, then nobody can imagine working without them. llama.cpp specifically benefits because the perf cliff between releases is real; a quant kernel that flies on b3145 can regress hard two weeks later.
One thing I'd push on: have you thought about pinning per-project rather than just system-wide?
nvm-style.llamacpp-versionfiles in a repo would let inference benchmarks stay reproducible across collaborators, which is the single biggest pain I hit when teammates try to repro a perf number from a notebook. The other adjacent win is exposing the build flags (CUDA vs Metal vs CPU-only) as part of the version identity — same source SHA, different artifact. Curious if your manager treats build variants as separate "versions" or folds them into one.Hi Max, thanks for leaving a comment.
I was actually initially planning to do it like that, but I didn't see how this can be useful. It's not a problem to implement this functionality, it's more of a question about how useful will that be? I'm just not too familiar with that use case.
Manager basically works by pulling the release from github api that has all the release zips, e.g.:
llama-b9430-bin-ubuntu-vulkan-x64.tar.gzllama-b9430-bin-win-cpu-x64.zip
llama-b9430-bin-win-vulkan-x64.zip
Since ggml team does such a good job at naming these archives, we can get the right llama by splitting the string so we get an array
.
We know second item in array is the version, 4th is the OS, 5th is the backend, 5th also OS.
This is how it chooses the right variant.
Using Go and shims to build a version manager for llama.cpp feels incredibly nostalgic—it’s got that pure open-source geek energy.
Using a local Qwen 35B to execute a cross-language CLI tool without knowing the syntax is a brilliant example of modern "walking & coding." This kind of pure creativity—ignoring low-level syntax constraints to focus entirely on architectural pain points—is the true engine driving the open-source community forward!
I feel this. I went through the same pipeline — LM Studio → Ollama → raw llama.cpp — and the update cycle is genuinely a pain. An nvm-style version manager is one of those ideas that seems obvious in hindsight. Nice work!
Appreciate the comment, would love to hear more feedback on the tool so I can upgrade it further. I've only tested on my Windows PC and WSL, which seemed to work well. Feel free to share this tool with the community if you find it useful! Thanks again!
The LM Studio → Ollama pipeline is almost a rite of passage at this point. What finally made you switch? For me it was the OpenAI-compatible API — being able to swap models without touching the integration code saved a ton of time.
Appreciate you taking the time to reply! Happy to test on Linux/AMD too once I get a chance. Followed you 🚀
llama.cpp version-manager is the kind of niche tool the dev community needs but no one wants to maintain - perfect vibe-coded niche. honest q: how many edge cases did the AI fully resolve vs needed manual cleanup? if u ever want to wrap this as a saas (paid tier + analytics + auth), ill send a free moonshift run - 1 prompt -> wired stack into ur own gh + vercel. $3 per ship after.