DEV Community

Evgenii Engineer
Evgenii Engineer

Posted on

Heavy AI agent frameworks were too slow for my Raspberry Pi. So I built a different one

The problem

I’ve been experimenting with AI agents on a Raspberry Pi 5, and I kept hitting the same issue:

most agent frameworks felt too heavy for small hardware.

They often bring a full stack wit multiple services, extra infrastructure, a lot of moving parts and on a Raspberry Pi that quickly turns into slow startup, more memory pressure, and too much complexity for simple tasks.

I didn’t want that.

I wanted something that would stay small and still be useful.

So instead of building yet another agent framework, I started building a lightweight runtime with a different approach to routing.

The project became openLight:

github repo

The idea

What I wanted was not “LLM for everything”.

For a lot of requests, an LLM is unnecessary.

If a user wants to check CPU, disk, logs, or run a known action, that should go through a predictable path.

So openLight is built around a mixed model:
• deterministic routing where possible
• LLM-based classification where needed
• validation before execution

That keeps the system much more practical on small hardware.

How routing works

The flow looks like this:

Telegram message
   ↓
Auth
   ↓
Deterministic routing
   ├─ matched → execute skill
   └─ no match → LLM classifier
                     ↓
               chat / skill
                     ↓
                 validate
                     ↓
                 execute
Enter fullscreen mode Exit fullscreen mode

In practice, that means:
• every Telegram message first goes through auth and persistence
• then the runtime tries deterministic routing
• if there is a direct match, the skill executes immediately
• if not, the system uses the LLM to decide whether the request is just chat or should be mapped to a skill
• skill execution is validated before running

So the LLM is part of the system, but it is not the whole system.

That was important to me from the start.

Why this works better on Raspberry Pi

On small hardware, every extra layer matters.

If every request goes straight into an LLM-driven loop, the system becomes slower, less predictable, and more expensive to run.

With this design:
• obvious commands stay fast
• known actions remain deterministic
• the LLM is only used where classification is actually useful
• validation reduces the chance of random execution paths

For Raspberry Pi and homelab use, this feels much more natural than a heavy agent stack.

What openLight is trying to be

I don’t see it as a huge agent framework.

It’s closer to a small runtime for personal infrastructure.

Right now the main interface is Telegram, but the bigger idea is wider than that: a lightweight agent runtime that can combine deterministic skills with LLM-based interpretation without dragging in a huge platform around it.

Why I built it

Mostly because I like small tools that are easy to run and easy to understand.

I wanted something that:
• works well on Raspberry Pi
• stays lightweight
• does not depend on a huge framework
• uses LLMs where they help, not everywhere by default

That’s the direction behind openLight.

If you want to take a look:

github repo

Top comments (4)

Collapse
 
vibeyclaw profile image
Vic Chen

Really like the "LLM only where classification is actually useful" stance here. On constrained hardware, deterministic routing + validation is a much better systems design than forcing every request through a full agent loop. Curious whether you've measured latency / memory deltas between the deterministic path and the fallback LLM path on the Pi 5 — that would be a great proof point.

Collapse
 
evgenii_engineer profile image
Evgenii Engineer

I haven’t published a full benchmark table yet, but I do have a first Pi 5 data point now.

On the same natural-language request, the actual status skill execution was only ~150ms. The big delta was entirely in the LLM classification path: local Ollama with qwen2.5:0.5b took ~19.8s for route classification plus ~22.6s for skill classification, so about 42.5s end-to-end. The same flow with gpt-4o-mini was ~1.35s + ~1.77s, so about 3.3s end-to-end.

So yes, this is exactly why the design is deterministic-first: when a request can be routed directly, you avoid paying the classifier tax entirely. I still need to do a clean memory comparison, especially separating agent RSS from local model residency, but the latency difference is already pretty stark.

Collapse
 
klement_gunndu profile image
klement Gunndu

Deterministic-first routing is smart — most agent frameworks burn LLM tokens on tasks a regex could handle. Curious if you've measured the ratio of deterministic vs LLM-classified requests in practice.

Collapse
 
evgenii_engineer profile image
Evgenii Engineer

Not yet as a published metric, but it’s something I want to expose.

Right now each routing decision already carries a mode (slash, explicit, alias, rule, llm), so breaking requests down into deterministic vs LLM-classified traffic is straightforward. I’d actually prefer to publish the full split rather than just a single ratio, because it shows how much is handled by direct commands/rules before the classifier is even touched.

Anecdotally, command-shaped and routine ops are exactly what the deterministic path is meant to absorb, and the LLM is there as the overflow path for natural-language requests. Turning that into a real counter/dashboard is probably the next telemetry improvement I should make.