DEV Community

El Bruno
El Bruno

Posted on • Originally published at elbruno.com on

CPU vs GPU: Which Wins for Running LLMs Locally?

Introduction

Running large language models (LLMs) locally has become increasingly accessible, thanks to advancements in hardware and model optimization. For .NET programmers, understanding the performance differences between CPUs and GPUs is crucial to selecting the best setup for their use case.

In this blog post, we’ll explore these differences by benchmarking the Llama 3.2 Vision model using a locally hosted environment with ollama running in docker.

Watch the Video Tutorial

Before diving in, check out the video tutorial for a quick overview of the process and key concepts covered in this blog post.


Goal of the Comparison

The goal of this exercise is to evaluate:

  1. Execution Time : How fast can the model process queries on CPU vs GPU?
  2. Resource Utilization : How do hardware resources (memory, power) compare between the two setups?
  3. Suitability : Which setup is better for different programming tasks?

By the end, you’ll have a clear understanding of the trade-offs and be equipped to choose the most appropriate setup for your projects.


How to Run the Test Using the Sample Code

1. Setup

Start by cloning the sample GitHub repository:


git clone https://github.com/elbruno/Ollama-llama3.2-vision-Benchmark
cd Ollama-llama3.2-vision-Benchmark

Enter fullscreen mode Exit fullscreen mode

Ensure you have the necessary dependencies installed:

For .NET > Install .NET SDK

2. Model Download

  1. Install Docker.
  2. Run the Docker container for GPU and CPU:
    • GPU running on port 11434 (default)
  • docker run -d –gpus=all -v ollama:/root/.ollama -p 11434:11434 –name ollama ollama/ollama
  • CPU running on port 11435

  • docker run -d -v ollamacpu:/root/.ollamacpu -p 11435:11434 –name ollamacpu ollama/ollama

    1. On each container pull the llama3.2-vision image. Run the command ollama run llama3.2-vision
    2. You will have docker running 2 instances of ollama, similar to this image:

docker running 2 instances of ollama

3. Run Code

For benchmarking, we are using BenchmarkDotNet for .NET.

Open the OllamaBenchmark.sln and run the solution.

4. Results Analysis

The .NET solution will output detailed performance metrics, including execution time and resource usage. Compare these metrics to identify the strengths of each hardware setup.


Conclusions

  1. Performance : GPUs consistently outperform CPUs in execution time for LLMs, especially for larger models like Llama 3.2 Vision.
  2. Resource Efficiency : While GPUs are faster, they consume more power. CPUs, on the other hand, are more energy-efficient but slower.
  3. Use Cases :
    • CPU : Best for lightweight, cost-sensitive tasks or development environments.
    • GPU : Ideal for production workloads requiring high throughput or real-time inference.

Running benchmarks is a straightforward way to determine the best hardware for your specific needs. By following the steps outlined here, you can confidently experiment with LLMs and optimize your local environment for maximum efficiency.

For a detailed walkthrough, check out the video tutorial.

Happy coding!

Greetings

El Bruno

More posts in my blog ElBruno.com.

More info in https://beacons.ai/elbruno


Hot sauce if you're wrong - web dev trivia for staff engineers

Hot sauce if you're wrong · web dev trivia for staff engineers (Chris vs Jeremy, Leet Heat S1.E4)

  • Shipping Fast: Test your knowledge of deployment strategies and techniques
  • Authentication: Prove you know your OAuth from your JWT
  • CSS: Demonstrate your styling expertise under pressure
  • Acronyms: Decode the alphabet soup of web development
  • Accessibility: Show your commitment to building for everyone

Contestants must answer rapid-fire questions across the full stack of modern web development. Get it right, earn points. Get it wrong? The spice level goes up!

Watch Video 🌶️🔥

Top comments (0)

AWS Security LIVE!

Tune in for AWS Security LIVE!

Join AWS Security LIVE! for expert insights and actionable tips to protect your organization and keep security teams prepared.

Learn More

👋 Kindness is contagious

If you found this post useful, please drop a ❤️ or leave a kind comment!

Okay