Discussion on: Running Local LLMs, CPU vs. GPU - a Quick Speed Test

View post

Replies for: How did you get to use 100% of the CPU?, which config or settings did you have?

You can offload all layers to GPU (CUDA, ROCm) or use CPU implementation (ex. HIPS). Just run LM Studio for your first steps. Run kobaldcpp or kobapldcpp-ROCm as second. Then try to use python and transformers. From there you should know enough about the basics to choose your directions. And remember that offloading all to GPU still consumes CPU

This is a peak when using full ROCm (GPU) offloading. See CPU usage on the left (initial CPU load is to start the tools, LLM was used on the peak at the end - there is GPU usage but also CPU used)

And this is windows - ROCm still is very limited on other operating systems :/