Evgenii Engineer

Posted on Mar 14

Heavy AI agent frameworks were too slow for my Raspberry Pi. So I built a different one

#ai #raspberrypi #go #llm

The problem

I’ve been experimenting with AI agents on a Raspberry Pi 5, and I kept hitting the same issue:

most agent frameworks felt too heavy for small hardware.

They often bring a full stack wit multiple services, extra infrastructure, a lot of moving parts and on a Raspberry Pi that quickly turns into slow startup, more memory pressure, and too much complexity for simple tasks.

I didn’t want that.

I wanted something that would stay small and still be useful.

So instead of building yet another agent framework, I started building a lightweight runtime with a different approach to routing.

The project became openLight:

github repo

The idea

What I wanted was not “LLM for everything”.

For a lot of requests, an LLM is unnecessary.

If a user wants to check CPU, disk, logs, or run a known action, that should go through a predictable path.

So openLight is built around a mixed model:
• deterministic routing where possible
• LLM-based classification where needed
• validation before execution

That keeps the system much more practical on small hardware.

How routing works

The flow looks like this:

Telegram message
   ↓
Auth
   ↓
Deterministic routing
   ├─ matched → execute skill
   └─ no match → LLM classifier
                     ↓
               chat / skill
                     ↓
                 validate
                     ↓
                 execute

In practice, that means:
• every Telegram message first goes through auth and persistence
• then the runtime tries deterministic routing
• if there is a direct match, the skill executes immediately
• if not, the system uses the LLM to decide whether the request is just chat or should be mapped to a skill
• skill execution is validated before running

So the LLM is part of the system, but it is not the whole system.

That was important to me from the start.

Why this works better on Raspberry Pi

On small hardware, every extra layer matters.

If every request goes straight into an LLM-driven loop, the system becomes slower, less predictable, and more expensive to run.

With this design:
• obvious commands stay fast
• known actions remain deterministic
• the LLM is only used where classification is actually useful
• validation reduces the chance of random execution paths

For Raspberry Pi and homelab use, this feels much more natural than a heavy agent stack.

What openLight is trying to be

I don’t see it as a huge agent framework.

It’s closer to a small runtime for personal infrastructure.

Right now the main interface is Telegram, but the bigger idea is wider than that: a lightweight agent runtime that can combine deterministic skills with LLM-based interpretation without dragging in a huge platform around it.

Why I built it

Mostly because I like small tools that are easy to run and easy to understand.

I wanted something that:
• works well on Raspberry Pi
• stays lightweight
• does not depend on a huge framework
• uses LLMs where they help, not everywhere by default

That’s the direction behind openLight.

If you want to take a look:

github repo

Top comments (7)

Harjot Singh • Jun 1

it's great to see you taking a different approach with a lightweight runtime for your Raspberry Pi. focusing on simplicity and efficiency really makes sense for small hardware.

if you're interested, Moonshift can help you deploy a full next.js + postgres + auth app in about 7 minutes, and you own the code on your github. happy to offer you a free run to give it a shot.

William Wang • Mar 15

This is the right approach. Most agent frameworks assume you have a beefy GPU and unlimited RAM. On constrained devices the overhead of just loading the framework can eat your entire budget before the actual inference starts. What latency are you seeing per tool call on the Pi?

Evgenii Engineer • Mar 15

Thanks! That was exactly the motivation behind the project.

Right now on Raspberry Pi 5 the actual tool execution is usually very fast (around ~100–200ms). The bigger cost is the LLM routing when natural language is used.

With a local small model via Ollama it can take ~10–20s depending on the prompt, while using OpenAI it’s usually around ~2–4s.

But the goal is that most common commands never hit the LLM at all. If the router can match them deterministically (aliases, patterns, etc.), the whole request finishes in well under a second.

So the LLM becomes more of a semantic fallback rather than the core control loop.

klement Gunndu • Mar 14

Deterministic-first routing is smart — most agent frameworks burn LLM tokens on tasks a regex could handle. Curious if you've measured the ratio of deterministic vs LLM-classified requests in practice.

Evgenii Engineer • Mar 14

Not yet as a published metric, but it’s something I want to expose.

Right now each routing decision already carries a mode (slash, explicit, alias, rule, llm), so breaking requests down into deterministic vs LLM-classified traffic is straightforward. I’d actually prefer to publish the full split rather than just a single ratio, because it shows how much is handled by direct commands/rules before the classifier is even touched.

Anecdotally, command-shaped and routine ops are exactly what the deterministic path is meant to absorb, and the LLM is there as the overflow path for natural-language requests. Turning that into a real counter/dashboard is probably the next telemetry improvement I should make.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.