DEV Community

I Built a Self-Hosted AI Agent That Runs on a Raspberry Pi

GDS K S on April 05, 2026

Most AI coding tools live in someone else's cloud. Cursor, Devin, GitHub Copilot: useful, but your context and conversations flow through a third-p...

Read full post

Archit Mittal • Apr 9

The BullMQ-backed task queue with dead-letter queues is what separates this from most open-source AI tools that just wrap an API call in a while loop. That's actual production infrastructure. The Pico mode at ~140MB is really smart too — I've been running smaller LLM-adjacent services on Pi 4s and the constraint forces you to think carefully about what actually needs to be in-memory vs. what can be lazily loaded. Curious about the multi-agent orchestration scoring — how are you computing capability scores for routing? Is it rule-based (e.g., if task needs tool use, route to model X) or are you doing something more dynamic like a lightweight classifier on the task description? That routing layer is where a lot of the value is for teams running mixed model pools where you want Claude for reasoning-heavy work but Groq/Llama for fast simple tasks.

GDS K S • Apr 10

Right now it's mostly rule-based - capability tags on each adapter get matched against what the task description signals it needs (tool use, file ops, reasoning depth, etc.). There's no classifier yet, but that's exactly where I want to go next. The interesting problem is that "reasoning-heavy" isn't always obvious from the task text itself - a short prompt can still need deep context. So I'm thinking a lightweight embedding similarity pass might work better than keyword matching for that layer. The mixed model pool use case (Claude for reasoning, Groq for fast/cheap) is basically the whole point of keeping routing pluggable.

Archit Mittal • Apr 11

The embedding similarity idea for routing is spot on — keyword matching breaks down fast when someone says "summarize this contract" (needs reasoning) vs "summarize this CSV" (needs tool use). One approach that's worked well in my automations: run a small classifier on the first ~100 tokens of the prompt against a labeled dataset of past task categories. Way cheaper than embedding the full prompt and you can retrain incrementally. For the Claude/Groq split, have you looked at latency-based fallback too? Route to Groq first, and if confidence is low, escalate to Claude — gives you speed by default with quality as a safety net.

Mykola Kondratiuk • Apr 7

self-hosting the context is the real unlock for teams with compliance constraints. cloud agents work fine until someone asks who has access to your code context - then suddenly the Pi tier looks very attractive.

GDS K S • Apr 10

Exactly this. "Who has access to your code context" is where the cloud pitch falls apart for a lot of teams. The Pi tier was originally just a cost play but the compliance angle ended up being way more interesting. Once context stays on your network, a whole class of enterprise objections just goes away.

Mykola Kondratiuk • Apr 10

Yeah the compliance angle surprised me too - started as a 'we save money' argument, turned into the thing that got legal sign-off. 'Who owns the data that feeds the model' closes more deals than benchmarks do.

Denys Nyzhehorodtsev • Apr 8

This is actually the direction I find much more interesting than "just another AI coding tool". Control over the environment is a huge deal. Once you start dealing with real projects (or even just sensitive internal tools), sending everything through a third-party API quickly becomes a blocker, not a feature.
Also really like the idea of separating agent runtime from model. Most tools today bundle everything together, which makes them easy to start with - but painful to extend or adapt later.
The Raspberry Pi angle, is especially cool. Running something like this on cheap, local hardware makes AI feel more like infrastructure, and less like a subscription.
That said, I'm curious how you're thinking about reliability and safety here. With so many tools + multi-agent orchestration, it feels like things, could get unpredictable fast without strong guardrails.
Overall, this feels less like a "tool", and more like a foundation layer. Kind of like what Docker did for deployment, but for AI agents.

GDS K S • Apr 10

Love the Docker analogy, that's exactly the vibe. On guardrails - fair concern. Right now there's task-level sandboxing and the dead-letter queue handles runaway agents before things get messy. But honestly multi-agent at scale is not a solved problem and I won't pretend otherwise. The thing I keep coming back to is: observability first. If you can't see what agents are doing, guardrails are just vibes. Better tracing is in progress.