Built an open-source picker that recommends the right self-hosted LLM for your hardware

#opensource #ai #localai #llm

Built this because every "which LLM should I self-host on my [hardware]"
thread ends with "depends" without anyone actually doing the math.

You tell it:

Platform (NVIDIA, AMD, Apple Silicon, Intel Arc, CPU-only)
Available VRAM or unified memory
Use case (chat, code, long-context, math)
License preference (any vs permissive-only)

You get a ranked list of open-weight models that actually fit in your
memory budget with 15% safety margin, the right GGUF quantization picked
automatically, and copy-paste install commands for Ollama or llama.cpp.
The picker runs entirely in your browser — nothing sent to a server.

The site also has:

A curated model directory with explicit license labels colour-coded (permissive / open-weight / non-commercial)
Three install guides for Ollama, llama.cpp and LM Studio
A glossary in plain English for newcomers
A live trending section from Hugging Face, refreshed weekly via a GitHub Action that commits the snapshot back to the repo (full diff history in git)

Source code is MIT, content is CC BY 4.0. No accounts, no analytics,
no ads, no affiliate links.

Picker: https://runlocal.blog/picker
Site: https://runlocal.blog

Feedback welcome, especially on the memory estimates and the picker
scoring formula (downloads + likes + recency, weighted). If a model
you'd want is missing from the catalog, drop the name in the comments.

Top comments (2)

Harjot Singh • May 31

A picker that matches a self-hosted LLM to your hardware solves a real friction point, because the gap between this model is good and this model runs well on my box is where most local-LLM attempts stall, people grab something too big, hit OOM or single-digit tokens/sec, and give up. Encoding the hardware-to-model fit (VRAM, quantization, context length, realistic throughput) into a tool removes a lot of trial-and-error and meets people where their machine actually is. The framing I'd add is that the right model is task-dependent as well as hardware-dependent: the picker answers what can I run, and the next question is what should I run for this job, since the winning architecture is usually routing, the local model handles the cheap, private, latency-tolerant majority and you escalate the genuinely hard slice to a frontier API. So a picker that also nudged toward use it for these kinds of tasks, not those would be even more powerful. The detail that makes recommendations trustworthy is grounding them in measured throughput on real hardware, not spec-sheet guesses, because tokens/sec at a usable context is what decides whether a model is actually usable. Match the model to the hardware and the task, grounded in real benchmarks. That right-size-to-the-machine-and-the-job instinct is core to how I think about cost in Moonshift. Are the recommendations driven by measured benchmarks per device, or computed from VRAM/spec heuristics?

Harjot Singh • Jun 1

this is a solid tool for anyone diving into self-hosted LLMs. the way you’ve streamlined the recommendations based on hardware and use case is super helpful. at moonshift, we build apps quickly too-get a full next.js + postgres + auth setup deployed in about 7 minutes, and you keep the code on your github. let me know if you want to give it a shot for free.