Amir H. Moayeri

Posted on Jun 15

How I Tested 5 Small LLMs on a Weak PC (Intel i5, No GPU) – And Found a Winner

#ai #llm #productivity #tutorial

A practical guide to running LLMs on budget hardware: real speeds, real stories, and real conclusions

📌 Table of Contents

My Setup (The "Weak" PC)
Why I Did This
The 5 Models I Tested
The Test Method
Model 1: LFM2.5-350M – The Speed Demon
Model 2: Qwen3 0.6B – The Balanced One
Model 3: LFM2.5-1.2B-Instruct – The All-Rounder
Model 4: Gemma-3-1B-Uncensored – The Comedian
Model 5: DeepSeek-R1-Distill-Qwen-1.5B – The Misfit
Final Comparison Table
Key Lessons Learned
My Final Recommendation
Conclusion
Final Note

My Setup (The "Weak" PC)

Before anything, let me show you what I was working with. No GPU. No high-end hardware. Just a regular office PC:

Component	Specification
CPU	Intel Core i5-10400 @ 2.90GHz (6 cores)
RAM	16GB DDR4 (Single Channel – important!)
GPU	Intel UHD Graphics 630 (128MB – basically useless)
Storage	238GB SSD
Software	LM Studio (GGUF format models)

Key limitation: The single-channel RAM creates a memory bandwidth bottleneck of ~20 GB/s. This is the real reason speeds can't go much higher.

Why I Did This

Most LLM benchmarks and reviews assume you have:

A high-end NVIDIA GPU (RTX 3060+)
Or at least a Mac with Apple Silicon
Or a server with 32GB+ VRAM

But what if you have none of that? What if you're a developer on a budget, a student, or someone with an old PC?

I wanted to find the best small LLM (under 2B parameters) that actually runs well on hardware like mine. No theory. Real tests. Real speeds. Real stories.

And yes, I made each model write a funny cat story to test creativity, coherence, and humor.

The 5 Models I Tested

#	Model	Size	Format
1	LFM2.5-350M	350M	GGUF (Q4_K_M)
2	Qwen3 0.6B Instruct	0.6B	GGUF (Q4_K_M)
3	LFM2.5-1.2B-Instruct	1.2B	GGUF (Q4_K_M)
4	Gemma-3-1B-Uncensored	1B	GGUF (Q4_K_M)
5	DeepSeek-R1-Distill-Qwen-1.5B	1.5B	GGUF (Q4_K_M)

The Test Method

For each model, I did:

Loaded the model in LM Studio
Measured real token-per-second speed on my hardware
Asked for a 500-word funny cat story (same prompt for all)
Evaluated coherence, humor, originality, and structure

Model 1: LFM2.5-350M – The Speed Demon

Speed: 36 tokens/second

This was the fastest model by far. The response appeared almost instantly.

Cat Story Excerpt:

"Milo the cat lived in Milo's tiny, fussy home. One sunny afternoon, he'd tried to sneak into the kitchen for coffee—only to be caught by a curious squirrel named Sammy..."

Analysis:

Aspect	Score	Notes
Coherence	7/10	Mostly logical, but names got confusing ("Milo" = cat AND owner)
Humor	6/10	Tried hard but felt forced
Originality	7/10	Creative premise (a cat wanting coffee!)

Verdict: Perfect for summarization and quick tasks. Not ideal for creative writing.

Model 2: Qwen3 0.6B – The Balanced One

Speed: ~20 tokens/second

Solid speed. A noticeable step down from 350M, but still very responsive.

Cat Story Excerpt:

"Whiskers wasn't your average cat—he had a knack for solving puzzles faster than you could say 'purr'..."

Analysis:

Aspect	Score	Notes
Coherence	7/10	Decent structure, no major confusion
Humor	6/10	Acceptable but predictable
Originality	6/10	Standard "clever cat" tropes

Verdict: A solid general-purpose model. Nothing special, but nothing broken.

Model 3: LFM2.5-1.2B-Instruct – The All-Rounder

Speed: 13.5 tokens/second

The slowest of the "good" models, but the quality jump was worth it.

Cat Story Excerpt:

"Once upon a time, in a quirky little town named Pawsville, lived a fluffy gray tabby cat named Whiskers. Whiskers wasn't your average cat—he had a knack for solving puzzles faster than you could say 'purr'... In this magical realm, animals were talking animals—dogs with tiny glasses, birds with tiny hats, even a wise old owl who wore a monocle..."

Analysis:

Aspect	Score	Notes
Coherence	9/10	Excellent structure from beginning to end
Humor	8/10	Genuinely funny ("owl with a monocle," "squirrel trying to juggle carrots")
Originality	8/10	Rich world-building, consistent characters

Example of good humor:

"He met a grumpy old turtle named Timmy, who kept guarding a treasure chest filled with shiny seashells. The turtle was so stubborn, he'd stare at Whiskers for hours, refusing to let him in."

Verdict: The best all-around model for CPU-only systems. Use this for chat, story writing, summarization, and daily tasks.

Model 4: Gemma-3-1B-Uncensored – The Comedian

Speed: 10 tokens/second

The slowest, but with a unique personality.

Interesting behavior: The model "thought" for 1 minute 27 seconds before responding. This is likely due to its uncensored nature exploring multiple response candidates.

Cat Story Excerpt:

"Mittens squeezed her eyes shut and jumped right into the hole. She tumbled down a dark slope, landing in a pile of old magazines and a bag of catnip she had been hiding behind the sofa for later... The human just laughed, shook their head and said: 'That's why I left my laptop open.'"

Analysis:

Aspect	Score	Notes
Coherence	7/10	Slightly chaotic but entertaining
Humor	8/10	Dry, adult-oriented humor. The punchline was genuinely unexpected
Originality	8/10	Very unique voice

Verdict: Great for personal entertainment if you want a different flavor of humor. Too slow for daily use.

Model 5: DeepSeek-R1-Distill-Qwen-1.5B – The Misfit

Speed: 10.4 tokens/second

Thought for 33 seconds before responding. This is a "reasoning model" designed for math and logic, not storytelling.

Cat Story Excerpt:

"Uh-oh! exclaimed a neighboring neighbor, Squidward... Uh-oh, he said again... Uh-oh, Whiskers said again... Uh-oh, Whiskers said once more..."

Analysis:

Aspect	Score	Notes
Coherence	3/10	Extremely repetitive, characters appear/disappear randomly
Humor	2/10	"Uh-oh" repeated ~15 times is not funny
Originality	4/10	Some creative elements but lost in chaos

Verdict: Do not use for creative writing. This model is for math, logic, and step-by-step reasoning. I misused it, and the results show why.

Final Comparison Table

Rank	Model	Speed (t/s)	Coherence	Humor	Best For
🥇	LFM2.5-1.2B-Instruct	13.5	9/10	8/10	Everything (chat, stories, summarization)
🥈	LFM2.5-350M	36	7/10	6/10	Fast summarization, always-on assistant
🥉	Qwen3 0.6B	20	7/10	6/10	General-purpose backup
4	Gemma-3-1B-Uncensored	10	7/10	8/10	Personal entertainment (adult humor)
5	DeepSeek-R1-Distill-Qwen-1.5B	10.4	3/10	2/10	Math/logic (NOT stories)

Key Lessons Learned

1. Speed ≠ Quality

The 350M model was 3x faster than the 1.2B Instruct, but the story quality was noticeably lower.

2. Architecture Matters More Than Parameter Count

LFM2.5-350M (350M params) outperformed Qwen3 0.6B (600M params) in multiple benchmarks.

3. Don't Use Reasoning Models for Creative Tasks

DeepSeek-R1 is amazing at math but produces repetitive, incoherent stories. Use the right tool for the right job.

4. On Weak CPUs, 1-1.5B Is the Sweet Spot

Models larger than 1.5B drop below 10 t/s on my hardware. Models smaller than 1B sacrifice too much quality.

5. Liquid Models (LFM2.5) Are Optimized for CPU

They consistently outperformed competitors in both speed and quality on my Intel i5.

My Final Recommendation

If you can only install ONE model:

👉 LFM2.5-1.2B-Instruct 👈

13.5 tokens/second
Great at chat, stories, summarization, and instruction following
Best balance of speed and quality

If you want TWO models:

Primary: LFM2.5-1.2B-Instruct (daily tasks)
Fast backup: LFM2.5-350M (quick summarization)

If speed is your ONLY priority:

LFM2.5-350M (36 t/s)

If you want adult-oriented humor for entertainment:

Gemma-3-1B-Uncensored (but expect 10 t/s)

Conclusion

You don't need a $2000 GPU to run LLMs locally.

With a humble Intel i5, 16GB RAM, and no graphics card, you can run LFM2.5-1.2B-Instruct at ~13 tokens/second and get genuinely useful results for:

Daily chat assistance
Creative writing (cats with monocles!)
Document summarization
Personal AI agents

The models are getting smaller, faster, and smarter. LFM2.5 proves that 1.2B parameters can deliver quality that rivals larger models.

Go try it yourself. Download LM Studio, grab the LFM2.5-1.2B-Instruct GGUF file, and start experimenting.

Final Note

The tests I ran were focused on a single, simple scenario: generating a funny cat story on a specific hardware setup. While this gave me clear, comparable results across five models, it's important to remember that LLM performance can vary significantly depending on the task. A model that writes a decent story might struggle with code generation, mathematical reasoning, or multi-turn conversations. Likewise, your hardware, software version, quantization settings, and even the phase of the moon (okay, maybe not that last one) can affect speeds and output quality. So take my findings as a useful data point, not a universal truth. You can also test models on your own workloads before making a decision.

📌 Table of Contents

My Setup (The "Weak" PC)

Why I Did This

The 5 Models I Tested

The Test Method

Model 1: LFM2.5-350M – The Speed Demon

Speed: 36 tokens/second

Cat Story Excerpt:

Analysis:

Model 2: Qwen3 0.6B – The Balanced One

Speed: ~20 tokens/second

Cat Story Excerpt:

Analysis:

Model 3: LFM2.5-1.2B-Instruct – The All-Rounder

Speed: 13.5 tokens/second

Cat Story Excerpt:

Analysis:

Model 4: Gemma-3-1B-Uncensored – The Comedian

Speed: 10 tokens/second

Cat Story Excerpt:

Analysis:

Model 5: DeepSeek-R1-Distill-Qwen-1.5B – The Misfit

Speed: 10.4 tokens/second

Cat Story Excerpt:

Analysis:

Final Comparison Table

Key Lessons Learned

1. Speed ≠ Quality

2. Architecture Matters More Than Parameter Count

3. Don't Use Reasoning Models for Creative Tasks

4. On Weak CPUs, 1-1.5B Is the Sweet Spot

5. Liquid Models (LFM2.5) Are Optimized for CPU

My Final Recommendation

If you can only install ONE model:

If you want TWO models:

If speed is your ONLY priority:

If you want adult-oriented humor for entertainment:

Conclusion

Final Note

🔗 Resources