
I just found an interesting open-source tool called Ollama. It's a command-line application that lets you run Large Language Models (LLMs) on your ...
For further actions, you may consider blocking this person and/or reporting abuse
Try the
qwen2.5-coder
model family. Yes, 4GB is insufficient to run anything useful. I'm trying withqwen2.5-coder:14b-instruct-q2_K
(so low quantization and higher parameters) and it's not bad at all. The speed and quality is decent all considering. You'll need about 20GB of RAM, however. Be aware I got Chinese language only replies when running 1.5B models of that family.Thanks for the tip! I’ll definitely check out
qwen2.5-coder
ollama.com/library/qwen2.5-coder/tags You'll have to experiment the smallest models with different quantization levels and avoid swapping to disk during inference.
Thanks for the link! I’ll explore the smallest models and test their performance. Thank you for the suggestion!
Totally get the RAM struggle with local LLMs, I had a similar bottleneck running anything larger than a 3B model too.
Have you found any tricks to make chat-style workflows smoother in the CLI, or do you just keep it basic?
Since I'm still learning the concept and getting an understanding of how everything works, I'm sticking to the basics for now
Hey,
Welcome to the genAI techspace.
There is nothing wrong in using smaller models, i resort to them all the time.
If you are interested you could try a much smaller model like smollm2:135m or qwen:0.5b they should be much more responsive with your hardware.
Also typically Ollama tries to run models on using the GPU or at least partially if you have a compatible one.
I hope this helps.
Yes, I will check out the smaller models. Thanks for the useful advice.
Ollama is Great. You can also use Docker Model Runner for this
Yeah, ollama is a valuable tool. Thanks for sharing.
Thanks for being clear about the hardware limits. Many people try to run local LLMs, thinking it will just work, then get frustrated when it is slow or crashes. Posts like this help save a lot of time and confusion.
Appreciate that! I'm glad the post was helpful.
This is extremely impressive, love how you documented the process and called out the RAM struggle directly. Makes me wanna try it on my old laptop now
Thank you for the appreciation. Ollama is definitely worth a try.