Quick Summary: 📝
whichllm is a command-line tool that helps users find and run the best-performing Large Language Models (LLMs) locally on their specific hardware. It benchmarks models based on real-world performance and hardware compatibility, rather than just parameter count, providing a ranked list of suitable LLMs.
Key Takeaways: 💡
✅ Automatically identifies and ranks optimal local LLMs for your specific hardware.
✅ Goes beyond simple size checks, considering performance and generation for best results.
✅ Offers flexible configuration for conservative or ambitious model recommendations.
✅ Enables hardware simulation to plan upgrades and ensure model compatibility.
✅ Streamlines local LLM deployment, saving developers time and effort.
Project Statistics: 📊
- ⭐ Stars: 5322
- 🍴 Forks: 279
- ❗ Open Issues: 16
Tech Stack: 💻
- ✅ Python
Choosing the right large language model (LLM) to run locally can feel like a daunting task. With countless models available on HuggingFace and varying hardware capabilities across different machines, it's easy to get lost in a sea of specifications and benchmarks. This is where whichllm steps in as an incredibly useful tool for any developer looking to leverage local AI.whichllm simplifies the process by intelligently analyzing your system's hardware, including your GPU, CPU, and available RAM. It then scours HuggingFace to identify and rank the top LLMs that are not just technically runnable, but actually perform well on your specific setup. Unlike simple 'does it fit?' tools, whichllm considers factors like partial RAM offload and near-edge VRAM fits, giving you recommendations that optimize both performance and quality. It even accounts for model generations and real-world benchmarks to ensure you're getting the best possible pick, not just the biggest.The project offers flexible options for different needs. If you prefer a more conservative recommendation, similar to what you might find in tools like LM Studio, you can easily adjust parameters to prioritize models that fit entirely within your GPU's VRAM and leave extra headroom for runtime overhead. This ensures a smoother, more reliable experience.Beyond just identifying models for your current machine, whichllm provides powerful simulation capabilities. Thinking about upgrading your hardware? You can simulate different GPUs, like an 'RTX 4090' or even '2x RTX 4090', to see which models they would best support. This feature is invaluable for planning future investments and ensuring compatibility before you buy. You can also use it to compare upgrade candidates directly or even determine what GPU you'd need to run a specific model. For developers, this means less guesswork, faster setup, and more time building amazing things with local AI.
Learn More: 🔗
🌟 Stay Connected with GitHub Open Source!
📱 Join us on Telegram
Get daily updates on the best open-source projects
GitHub Open Source👥 Follow us on Facebook
Connect with our community and never miss a discovery
GitHub Open Source
Top comments (0)