Most local LLM discussions start with the wrong question:
"What is the biggest model I can run?"
But a better question is:
"Which model actually makes sense for my CPU, RAM, GPU, and use case?"
That is why whichllm caught my attention today.
Repository: https://github.com/Andyyyy64/whichllm
License: MIT
Language: Python
Python requirement: 3.11+
Package status: available on PyPI, according to the project README
What it does
whichllm is a command-line tool that recommends local LLMs based on your actual hardware.
According to its README, it can:
- detect GPU / CPU / RAM;
- rank models from Hugging Face that fit your system;
- simulate a GPU before you buy hardware;
- compare upgrade candidates;
- find the GPU needed for a target model;
- output JSON for scripts;
- generate copy-paste Python snippets.
The quick start is simple:
uvx whichllm@latest
You can also simulate a GPU:
uvx whichllm@latest --gpu "RTX 4090"
And if you use it often:
uv tool install whichllm
Why this is useful
The local LLM space is noisy.
People compare models by parameter count, benchmark screenshots, leaderboard rankings, and GPU flexing. But for a solo developer or small team, the practical question is much more boring:
- Will it fit on my machine?
- Will it be painfully slow?
- Is a smaller but newer model actually better?
- Should I upgrade my GPU, or pick a different model?
- Can I automate this decision in a script?
whichllm turns that into a CLI workflow.
That matters because local AI is not just about owning the biggest model. It is about matching the model to the machine.
The interesting idea: fit is not enough
A simple hardware checker would say:
"This model fits your VRAM."
But whichllm aims to rank models using hardware fit plus model quality signals and recency-aware benchmarks.
That distinction is important.
A bigger model that barely fits may not be the best choice. A smaller or newer model may be faster, more practical, and good enough for daily coding, writing, search, or agent tasks.
This is especially relevant for one-person AI workflows. If your machine is limited, every wrong model choice costs time.
A practical solo-company use case
For a one-person AI studio, I would not treat whichllm as a magic answer machine.
I would treat it as a decision assistant:
- Run it on the current machine.
- Save the top recommendations.
- Compare them against actual task needs.
- Use the JSON output in a small model-selection dashboard.
- Re-run it after hardware or model-list changes.
That could become part of a lightweight "local model routing" workflow:
| Task | Model choice logic |
|---|---|
| summarization | cheap and fast model |
| coding helper | stronger local coding model |
| long document reading | context length matters |
| offline privacy task | local-only model |
| agent experiment | speed and tool stability matter |
What I would be careful about
A few caveats:
- I have not installed or benchmarked it on my own machine yet.
- Hardware detection and model rankings should be treated as recommendations, not final truth.
- Real performance can differ depending on quantization, runtime, drivers, memory pressure, and background processes.
- Windows support is likely possible because the package declares OS Independent, but the README examples are shell-centric and should still be tested before claiming a smooth Windows experience.
- The tool depends on live model metadata and benchmark assumptions, so results may change over time.
So do not read this as "this tool guarantees the perfect model."
Read it as:
"This is a useful way to stop choosing local LLMs by vibes alone."
Why I am watching it
Local LLM adoption has a hidden bottleneck: hardware confusion.
Many people want to run models locally, but they do not know whether their laptop, desktop, or old GPU is enough. That uncertainty creates friction.
Tools like whichllm make the local AI stack more approachable because they turn a messy research problem into a command-line recommendation flow.
My takeaway:
The next useful local AI tools may not be bigger models. They may be better model-selection tools.
If you are experimenting with local LLMs, whichllm is worth a look — especially before buying new hardware or wasting a weekend trying models that never had a chance to run well on your machine.
Top comments (0)