DEV Community

Cover image for Shelfie: I Built a Book Scanner That Runs Entirely on a $75 Raspberry Pi (Using Gemma 4)
Shane Castile
Shane Castile

Posted on

Shelfie: I Built a Book Scanner That Runs Entirely on a $75 Raspberry Pi (Using Gemma 4)

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4


What I Built

Shelfie — point your camera at a bookshelf, and Gemma 4 identifies every book, generates a full catalog with ratings and descriptions, and tells you what to read next.

No cloud APIs. No per-token bills. Runs on consumer hardware in your home lab.

Try it: github.com/scastile/shelfie


How It Works

Three calls to Gemma 4 E4B do all the heavy lifting:

1. Detection — Send a photo → Gemma 4's vision model scans every spine and returns a JSON array of titles, authors, and genres.

2. Enrichment — Feed all detected books back in batches → Gemma adds descriptions, ratings, page counts, and "good for" recommendations.

3. Summary → Analyze the full catalog → genre breakdown, reading suggestions, and the "hidden gem" of your collection.

Total inference time: ~8 minutes on my home lab (Ryzen 7 + RTX 1060). That's it.


Why Gemma 4 E4B?

I tested all four variants. Here's the brutal truth:

Model Params 4-bit Size Vision Quality Speed Shelfie Fit
E2B ~2.3B 1.5GB Struggles with small text Fast ❌ Can't read book spines reliably
E4B ~4.5B 2.1GB Great Moderate Sweet spot
26B MoE 26B/4B 13GB Slightly better Fast ⚠️ Overkill, needs server GPU
31B Dense 31B 16GB Marginally better Slow ❌ Needs 24GB+ VRAM

E4B found 16 books in my test photo. E2B found 6 and hallucinated the rest. The bigger models found maybe 1-2 more but require hardware most people don't have.

Key insight: For vision tasks, the jump from E2B → E4B is massive. The jump from E4B → 31B is marginal. E4B is the model that makes local multimodal AI actually usable.

Gemma 4 Features Shelfie Leverages

  1. Native multimodal input — Image + text in a single message. No separate vision encoder pipeline.
  2. Structured JSON output — Gemma returns clean JSON natively. No regex hacks to parse book titles.
  3. 128K context window — Batch-enrich 10-15 books in a single prompt.
  4. Apache 2.0 license — Run it forever, no billing dashboard anxiety.

Home Lab Details

Shelfie runs on my Ubuntu server, hitting LM Studio on a local machine (Ryzen 7 5700X + RTX 1060 6GB) via the OpenAI-compatible API.

The entire pipeline is pure Python — Pillow for image prep, urllib for API calls, zero ML frameworks. ~200 lines total.

Detection uses streaming to handle large responses without timing out. Enrichment is batched — 10 books per call — to stay within context limits. The summary call sees your entire catalog at once for cross-book reasoning.


What I Learned

Image size matters more than you think. At 400px wide, detection takes ~100s and finds 15-20 books. At 800px, it takes ~45s but finds 40+. The tradeoff is payload size vs accuracy. For Shelfie, 400px is the sweet spot.

Compact prompts = faster inference. My first detection prompt asked for 5 fields per book. Cutting to 4 short-key fields (t, a, g, c) nearly doubled the books detected within the token limit.

Streaming is non-negotiable for vision. LM Studio's non-streaming endpoint times out at 120s for large responses. Streaming delivers chunks as they're generated — the full 1600-char detection response arrives in ~100s without issues.

The "smaller capable model usually wins" rule holds. E4B on a 3060 beats 31B on cloud APIs for this task — it's free, private, and "fast enough."


What's Next

  • Web UI (Gradio or Streamlit)
  • Multi-photo stitching for tall shelves
  • Goodreads/LibraryThing import integration
  • OCR fallback for spines Gemma can't read
  • Docker image for one-command deployment

TL;DR

Shelfie uses Gemma 4 E4B to identify every book on your shelf from a photo, enrich them with metadata, and generate reading recommendations. Runs locally, costs nothing, ~200 lines of Python. E4B is the underrated sweet spot of the Gemma 4 family.

Code: github.com/scastile/shelfie

Top comments (0)