What I Learned Building a Lightweight Local AI Agent

Evgenii Engineer — Fri, 08 May 2026 20:52:37 +0000

When I wrote the first post about openLight in March, the project was a 1.0 commit, a Raspberry Pi, and a Telegram bot. I had a Pi running tailscale, a small Matrix homeserver, and I was tired of ssh pi@raspberrypi.local && systemctl status … from a phone keyboard. So I wrote a Go binary that talked to a Telegram bot, kept state in SQLite, and could fall back to a local Ollama model when the rule-based router didn't recognize the request.

Two months later the binary is still ~25 MB, still one config file, still SQLite. Almost everything underneath has been rewritten at least once. The identity I'd write today is shorter than anything in the v0.0.1 README:

openLight is a lightweight operational layer for personal servers, not a generic AI assistant.

That sentence didn't exist in March. It took two months of building, deleting, and re-building to find it. This is the retrospective I wish I had read before starting.

Five moments where it earned its keep

A train station, 11pm. Synapse is down on the VPS. I tap Restart on the Telegram alert. It comes back. I don't open a laptop.
A grocery line. Tailscale shows yellow. I tap Logs, see a known transient peer issue, tap Ignore for 15 minutes. Done before checkout.
A flight. I'm offline. The watch loop runs anyway. When I land, three resolved-incident messages are waiting.
A Mac mini at home. It runs Ollama and a few Docker services. CPU > 90% for 5 minutes triggers a watch I'd forgotten about. My own background job is the problem.
A friend's homelab. /restart matrix from my couch hits a remote docker-compose service. Same UX as the local Pi.

The architecture below only matters because of those moments.

What changed technically

Routing: from flat to deterministic-first

The original router was flat: try slash commands, try a few regexes, fall through to Ollama, run whatever the model picked. This works for about a week. Then you notice that half your latency is the model warming up on a Pi, the model picks plausible-but-wrong tool names ~10–15% of the time, and you can't separate "I don't know" from "I'm 51% sure, do the thing."

The current router has five layers before the LLM is ever consulted, and the LLM path itself is two-stage:

The semantic layer pays for itself within a week: покажи логи tailscale, show tailscale logs, and tail -f tailscale all normalize to the same skill identifier without ever waking the model. On a Pi, that's the difference between sub-100ms and 2–5 seconds.

The screenshot above is the same /status skill, reached without an LLM call: the Russian phrase normalizes deterministically.

The two-stage split matters because the failure modes get easier to reason about. "Can't pick between service_logs and service_status" is not the same problem as "can't tell whether the user wants services or notes." Splitting the decision splits the latency cost, the error mode, and the prompt size.

The CLI subcommand wires up the same agent, the same registry, and the same auth as the Telegram path. This is also what the smoke harness drives in CI.

From localhost to named SSH nodes

v0.0.1 could only operate the box it ran on. A Telegram bot is a singleton, so deploying one openLight per host wasn't an option. The model became: one openLight, many nodes, where a node is just a named SSH target in config. Service specs grew a node:vps:compose:/opt/matrix/docker-compose.yml:synapse syntax that resolves to the right backend on the right host.

This forced the services module to become an interface, not a function:

Six backends behind one skill. The user sees one command. Auditing sees one skill call. The temptation to expose the backend distinction was strong and wrong.

From request-response to a monitoring loop

The biggest behavioral change is watch. v0.0.1 was reactive: I asked, it answered. v0.1.0 added a polling subsystem that holds rules, opens incidents, and initiates messages with inline buttons: Restart, Logs, Status, Ignore.

The interesting part isn't the polling. The interesting part is that alert buttons reuse the existing skill surface. When I tap Restart on an alert, the runtime calls the same service_restart skill I'd call manually. One allowlist check, one audit row, one logging path. There is no separate "automation" surface.

I think this is the single design decision I'm proudest of. The fashionable direction in agent frameworks is to give the LLM a sandbox, a shell, and trust. The unfashionable direction — and the correct one for infra — is to make manual and automatic actions go through the exact same validated code path. Automation is a button press, not a permission level.

What changed philosophically

Skills are the only safety boundary

In v0.0.1 I had a mutating_execute_threshold knob — a confidence floor for state-changing actions. By v0.0.2 it was gone, and I'd internalized the rule: the LLM can only choose among already-registered skills, and skills enforce their own allowlists in Go. No threshold, no soft gate. Either there's a Go function the LLM can name, or there isn't.

The model is a classifier of intent, not a holder of permissions. Permissions live in the code that runs after the classification.

Core vs. Optional, made explicit

The README has two lists: core modules (always on) and optional ones (off by default). Every time I added a fun module — vision, voice, browser — I felt a pull to make it on by default, "because the demo is so cool." Three weeks later I'd be debugging why the Pi's memory was full of Playwright. Making "off by default" structural, not a config preference, is how I keep myself from drifting the project into a generic AI assistant.

The same registry is exposed to the LLM classifier and to the user's /skills reply. There is no parallel surface.

From a Raspberry Pi project to personal infrastructure

Somewhere during the third or fourth refactor I realized openLight was no longer about a Raspberry Pi. The Pi was just the smallest machine I happened to have. What I'd actually become interested in was the broader category: small, always-on computers running useful local software without cloud-scale tooling and without enterprise infrastructure. A Pi 4 in the closet. A Mac mini M1 on the shelf running local models. A used NUC behind the TV.

These machines have something in common that mainstream infra tooling does not optimize for. They run on residential power. They reboot when the cleaner unplugs them. They have one operator, who is also the developer, who is also the on-call rotation. Kubernetes is not the answer. A Datadog agent is not the answer. What this hardware needs is a small, reliable, observable layer that gets out of the way until it's needed.

The Mac mini in particular changed the project's scope. Once I started deploying to darwin/arm64 with launchd instead of systemd, openLight stopped being "a Telegram bot for Raspberry Pi" and started being a personal-infrastructure agent that happens to use Telegram. The Pi case became the smallest case of a more general thing, instead of the only case.

I don't have a clean term for this category. "Homelab" skews hobbyist; "self-hosted" skews ideological. Personal infrastructure is the working name. It's the layer between "my laptop" and "the cloud," and a non-trivial amount of useful software is going to be built for it.

What I got wrong

The "alternative to OpenClaw" framing. The original README defined openLight by what it wasn't. People who don't know OpenClaw don't care; people who do read it as defensive. Worse, it gave me a permission slip to make decisions by negation. Define yourself by what you are.

"Structured tool calling" on the v0.0.1 roadmap. The right vocabulary, the wrong target. What I needed wasn't more sophisticated tool calling — it was a stronger pre-LLM router. The skill set is small enough that 80% of intents are reachable by deterministic parsing, and the remaining 20% are better handled by classification into an existing skill than by generating a tool call.

Dumping the full skill catalog into the LLM prompt. Early on, the classifier saw every registered skill with full descriptions. This blew the prompt past 4K tokens, slowed routing, and made the model worse — too many near-duplicates. The two-stage classifier fixed it: stage 1 sees only groups, stage 2 sees only skills inside the chosen group. The model's input budget is also a design constraint.

Files: too closed, then too open, then re-gated. Three rewrites of the file skill, each triggered by me almost doing something stupid via Telegram on a tired night. None of those mistakes shipped. They came close enough.

What practice confirmed

Telegram is the right interface for homelab ops. Mobile-native, group-aware, real bot API, already in your pocket. I tried Slack briefly and a web UI for a weekend. Both lost to "I'm at the airport and Synapse is down."

SQLite is enough. Watches, incidents, settings, message history, skill calls — all of it lives in one file with modernc.org/sqlite (no CGO). Backup is cp.

Single Go binary is the right shape. No runtime dependencies, no service mesh, no Helm chart, no Postgres. scp and a systemd unit. Deploys in under a minute.

Local LLMs are good enough for routing. A 0.5B Qwen on a Pi handles the 20% of intents that don't deterministically parse. I don't need GPT-4 to decide whether "show me what's broken" means /status or /watch list.

Allowlists beat permission dialogs. Asking the user "are you sure?" feels safe and is theatre. Forcing the operator to write services.allowed: [tailscale, synapse] in YAML is the actual safety boundary.

Where it's going, and why I think it matters

I don't have a v0.2.0 manifesto. The next things on the list — durable LLM memory, richer filesystem with tracked changes, on-device voice (ffmpeg + whisper-cpp), a read-only browser skill behind a strict allowlist — will all be off by default. If they start drifting toward "generic AI assistant" in code review, they don't ship.

I want to end on something larger than the project, because I think the project is a small data point in a bigger argument.

The dominant story about AI agents right now is enormous: cloud-scale models, autonomous multi-agent systems, infinite tool catalogs, generalized assistants that do everything for everyone. I think that story is going to underdeliver in a specific direction. The most useful AI infrastructure of the next few years is not going to be the cloud agent. It's going to be the small, boring, reliable system that runs close to the operator.

Local-first, because the operator and the hardware are in the same place. Deterministic by default, because the LLM is a great classifier and a poor decision-maker, and infrastructure does not need a poet. Observable, because the operator is also the on-call rotation. Repairable, because something always breaks. Cheap enough to leave running forever, because the moment you can't, you stop running it.

openLight is a single-person project trying to be a small, honest example of that. It's not trying to replace engineers and it's not trying to be smart. It's trying to reduce the friction between an engineer and the small systems they already operate.

Code lives at github.com/evgenii-engineer/openLight. The architecture doc is honest about the seams. The CHANGELOG doesn't oversell. If you run a homelab or a Mac mini or a closet full of small machines and you operate them from a phone, this might be useful. If not, it almost certainly isn't, and I'd rather you know that now than after you've cloned the repo.

Heavy AI agent frameworks were too slow for my Raspberry Pi. So I built a different one

Evgenii Engineer — Sat, 14 Mar 2026 18:38:45 +0000

The problem

I’ve been experimenting with AI agents on a Raspberry Pi 5, and I kept hitting the same issue:

most agent frameworks felt too heavy for small hardware.

They often bring a full stack wit multiple services, extra infrastructure, a lot of moving parts and on a Raspberry Pi that quickly turns into slow startup, more memory pressure, and too much complexity for simple tasks.

I didn’t want that.

I wanted something that would stay small and still be useful.

So instead of building yet another agent framework, I started building a lightweight runtime with a different approach to routing.

The project became openLight:

github repo

The idea

What I wanted was not “LLM for everything”.

For a lot of requests, an LLM is unnecessary.

If a user wants to check CPU, disk, logs, or run a known action, that should go through a predictable path.

So openLight is built around a mixed model:
• deterministic routing where possible
• LLM-based classification where needed
• validation before execution

That keeps the system much more practical on small hardware.

How routing works

The flow looks like this:

Telegram message
   ↓
Auth
   ↓
Deterministic routing
   ├─ matched → execute skill
   └─ no match → LLM classifier
                     ↓
               chat / skill
                     ↓
                 validate
                     ↓
                 execute

In practice, that means:
• every Telegram message first goes through auth and persistence
• then the runtime tries deterministic routing
• if there is a direct match, the skill executes immediately
• if not, the system uses the LLM to decide whether the request is just chat or should be mapped to a skill
• skill execution is validated before running

So the LLM is part of the system, but it is not the whole system.

That was important to me from the start.

Why this works better on Raspberry Pi

On small hardware, every extra layer matters.

If every request goes straight into an LLM-driven loop, the system becomes slower, less predictable, and more expensive to run.

With this design:
• obvious commands stay fast
• known actions remain deterministic
• the LLM is only used where classification is actually useful
• validation reduces the chance of random execution paths

For Raspberry Pi and homelab use, this feels much more natural than a heavy agent stack.

What openLight is trying to be

I don’t see it as a huge agent framework.

It’s closer to a small runtime for personal infrastructure.

Right now the main interface is Telegram, but the bigger idea is wider than that: a lightweight agent runtime that can combine deterministic skills with LLM-based interpretation without dragging in a huge platform around it.

Why I built it

Mostly because I like small tools that are easy to run and easy to understand.

I wanted something that:
• works well on Raspberry Pi
• stays lightweight
• does not depend on a huge framework
• uses LLMs where they help, not everywhere by default

That’s the direction behind openLight.

If you want to take a look: