Grega Snoj

Posted on May 4

I just wanted to chat with my Raspberry Pi.

#ai #python #raspberrypi #llm

The Moment It Hit Me

I had just finished writing a DC motor control script on my Raspberry Pi.

The setup was simple: two pillars about 10 meters apart, a motor on one side, a wheel on the other, and a string between them carrying a moving target. The goal was to keep the target moving unpredictably, changing speeds and directions.

During development, tweaking parameters was easy. But out in the field? Not so much. At one point I had to plug in a laptop just to adjust a few values. Standing there, balancing the laptop on one hand, cables everywhere, it just felt wrong.

That's when it hit me: Why can't I just chat with the Raspberry Pi and tell it what to do?

The Original Intent

I started thinking about building something that would live on a machine. The Raspberry Pi wouldn't just run scripts. It would be its environment, its body, its house.

I wanted to plug in hardware, describe what I needed, and have it write the code to make it work. No laptop. No manual tweaking. Just interaction.

That was all I wanted... at the time.

Early Versions

The first thing was naming this "something". I called it Talos. You know, the bronze automaton from Greek mythology, patrolling the island of Crete and protecting it. It felt... appropriate. Still does.

Then I started building. I basically bombarded frontier LLMs with feature requests, and the first version came together surprisingly fast. Slack became the transport layer, I wired in a couple of LLM providers, and built a simple runtime around it.

It was... ok-ish. It responded to Slack messages, could use tools, and even generated sample scripts directly on the Raspberry Pi. But as I kept adding features, an unwanted visitor showed up. Its name was Drift.

At first, I ignored it. But over time, things started to feel off. Responses became harder to reason about. Debugging got messy. The system behaved inconsistently in ways I couldn't quite explain. The same prompt would sometimes lead to different styles of responses.

Letting the LLM Decide Too Much

In the first iteration, I started by brainstorming the concept with a frontier LLM. That's how the first version of what I later called the system contract was born. A kind of constitution. It defined the direction, some invariants, and constraints Talos should follow.

From there, the workflow was simple:

a spec agent would turn my feature ideas into implementation instructions
a coding agent would implement them
and I would hit the merge button... a lot

I wasn't paying much attention to what was happening under the hood. I just wanted to take Talos for a test drive as fast as possible. Sometimes I would roll back features. Sometimes I would go in a completely different direction. And occasionally... I even changed the system contract itself. Just because I could.

Meanwhile, Mr. Drift was quietly lurking in the background. Clearing his throat every now and then, just to let me know it's still there. I ignored him.

What I didn't realize at the time was this: Because the system contract was vague, the LLM started filling in the gaps. Not just implementation details. Architectural decisions. That was the real mistake.

Over time, this showed up everywhere:

inconsistent coding patterns
shortcuts along the "happy path"
different styles emerging with every feature

The system still worked. But it was slowly becoming something I didn't fully understand anymore.

The Rewrite Decision

I thought long and hard about what to do next. Eventually, I made the decision to start over. Again. This time, I wasn't starting from scratch blindly.

I had the scars from the first iteration. I knew exactly what went wrong. The system contract was too vague. So this time, I made it tighter. It defined clear architectural principles, strict module boundaries, and explicit contracts and schemas.

And most importantly, it defined what was not allowed. Forbidden dependency patterns, the kind that Drift thrives on.

Because that's the real problem, when something isn't explicitly defined, the LLM fills in the gaps. And when it fills in the gaps... it doesn't just write code. It makes architectural decisions.

So instead of giving the system more freedom, I gave it constraints.

The System Contract

The system contract is short. It doesn't tell Talos what to do. It tells it what it is and what it's not allowed to be.

The core idea: Talos is a set of small components, each doing one thing, talking only through contracts. A component can be swapped out without touching anything else, as long as the contract holds. If one starts reaching into another's internals, that's not a shortcut, that's the system breaking.

There's exactly one orchestrator: the Agent Loop. Everything else (transports, routers, prompts, model providers, summarizers, storage) behaves like a library. They wait to be called. They don't call each other. Sounds restrictive until you've lived without it.

Data crossing between components uses shared contracts and schemas. No module invents its own boundary shape. Changing a contract is an architectural change, and it's treated that way.

Then come the negative rules, the part Drift hated. Transports don't know about LLMs. Routers don't know about storage. Prompts don't know about the network. Each forbidden link is named explicitly, so when the LLM later tries to "just wire this up real quick," the contract is already there to say no.

A few quieter rules pull a lot of weight: storage is the only thing that touches the filesystem; time is never read directly, only through a Clock, so the runtime can be replayed deterministically; configuration is resolved once at startup, never by components at runtime; secrets are never logged or persisted.

That's it. Components, contracts, one orchestrator, explicit "thou shalt nots." It rarely changes. Everything else evolves around it. When I'm tempted to bend a rule for convenience, the contract is the thing that asks me, quietly, whether I really want to invite Drift back in.

Single orchestrator, everything else is a library

Enforcing the Rules

I also changed how I worked with the system. Instead of relying on a single frontier model to write specs, I introduced a second one to review them against the system contract.

It's funny, the reviewer always finds something to complain about. And it almost always ends with: "None of these issues are blockers...".

The coding agent now rarely goes rogue, and test coverage is pretty decent. After each feature is implemented, I run the same reviewer again, this time checking whether the implementation actually respects the system contract. The goal is simple: show Drift the door as early as possible.

In the beginning, I would still inspect every PR manually. I wanted to understand what was happening under the hood. But over time, that changed. As the specs became tighter and the contract more explicit, I found myself trusting the process more and needing to check less.

The Unexpected Shift

Once the rewrite settled, something strange happened. I started talking to Talos more. Not "write me a script" talking. Actual talking.

Half-formed ideas on the way to the car. A paragraph I'd just read, pasted in to see what it reminded me of. Problems I hadn't figured out yet, dumped somewhere that would push back.

None of this was planned. Talos was supposed to be a body for hardware. Instead, it was turning into the place I went to think.

For a while I didn't get why it was sticking. Other chat tools have memory too. What made this one different?

Then it clicked: the memory was Talos's, not someone else's.

In the other tools, memory is a feature the provider gives you. Their servers, their format, their rules. Switch models and the thread of who you are just... resets.

Talos's memory sits on disk, next to the runtime, in a shape I defined. The model on the other end can change tomorrow, local one day, frontier the next. The memory doesn't care.

And it didn't seem to matter how strong that model was. A small one running locally would still pull the right scrap of last week into the conversation.

The channels stayed separate too. Scripting didn't bleed into notes. Each one had its own thread.

And the longer I used it, the less it felt like a growing pile. Old things got sharper, not heavier.

That's when the project shifted in my head. It wasn't a hardware companion anymore. The memory system had quietly turned it into something else, and I hadn't noticed until I was already using it that way every day.

That shift didn't come from better prompts. It came from how Talos handles memory.

Memory

Three things, none of them clever on their own. They just stack.

From conversation to curated memory: retrieval every turn, consolidation over time.

Channels are domains. Each Slack channel (scripting, notes, brainstorm etc.) is its own memory namespace. Talos doesn't dump everything into one bucket and trust the model to untangle it later. The bucket is the channel. There's a small global pool for things that should follow me everywhere (names, preferences, recurring projects) but the default is separation. Notes don't bleed into scripts. Each thread keeps its own shape.

The system retrieves; the model doesn't. Early on I gave the LLM memory tools and let it decide when to reach for them. Frontier models managed. Smaller local ones drowned, they'd either ignore the tools, or hit them on every message. So I moved retrieval out of the model's hands entirely. Each turn, Talos takes the recent messages, embeds them, and pulls the closest matches from this channel plus the global pool. The model never sees the machinery. It just gets a Recalled Memory block, already filtered, scored, capped. A small model running locally gets the same memory pipeline as a frontier one. That's why the weak ones still feel oddly aware. Not because they're smarter but because they're given better context.

Memory refines instead of grows. Raw transcripts aren't memory. They're history. Once enough new conversation has built up, Talos rolls it into a summary. Right after that a consolidation pass runs: it embeds the candidate memory and looks for the nearest existing artifact in this domain. If they're close enough, the old one gets updated in place instead of a near-duplicate landing beside it. So "I'm working on the wheel-and-string rig" doesn't accumulate into thirty slightly different sentences over a month. It becomes one entry that sharpens each time I mention it. Old things get tighter. The pile doesn't grow linearly with use, which is the only reason any of this is still tractable after a long while.

Separate the spaces. Take retrieval away from the model. Let memory rewrite itself instead of accumulate. None of these are flashy, and that's part of why they hold up. There's no clever model call doing the heavy lifting. It's a small pipeline that runs every turn, the same way every time, on whatever model happens to be plugged in that day.

Planning and Execution

Memory got the small models punching above their weight in conversation. The next ceiling showed up on tasks.

Ask a 7B model to "set up a new sensor, wire it into the script, write a quick test, and post the result back," and you can watch it lose the thread halfway through. Not because it's stupid. Because the task is one big mouthful and it's trying to swallow it in one go.

So the loop splits in two. First, the model writes a plan, a short list of steps, each with its own goal and a few acceptance criteria you could actually check against. Then a second pass walks through the steps one at a time, with the result of each feeding into the next. Same model, different shape. Each call has a tiny job: do step 3 against its acceptance criteria, or report what step 3 produced.

The point isn't agentic theatre. "Do this whole thing" is a hard prompt; "do step 3" is an easy one. Once a task is broken into easy prompts, the bar for the model on the other end drops. A smaller, cheaper, local one will carry it. That's the actual unlock.

What Actually Changed for Me

The honest version is that I don't open the editor as often as I used to.

A year ago, every half-formed idea took the same path. Open VS Code. Make a folder. Wire up a config. Write the scaffolding. By the time I was actually doing the thing I cared about, the energy was already half gone, and the idea looked smaller than it had in my head.

Most of that lives somewhere else now. I describe what I want, in the channel that fits, and the boring part happens. A new sensor on the Pi is a few messages. A different approach to the motor loop is a paragraph, not an afternoon. The gap between "I wonder if" and "let me see what that looks like" got thin enough that I stopped noticing it, which is the thing I actually wanted.

What that frees up is the part I like. Thinking. Sitting with an idea long enough to know if it's any good.

Talos quietly settled into three roles I didn't plan for.

An assistant when I'm building. The boilerplate goes through it now, not me.

A memory, in a way nothing else has been. I can come back to a channel after two weeks and it knows what we were chasing, what worked, what didn't. I don't have to scroll to remember my own thinking.

A collaborator, in the small sense, the thing I think out loud to, when no one's around. It's not smarter than the people in my life. It's just there at 11pm when an idea won't leave me alone, and it tracks.

None of this was the plan. The plan was a Raspberry Pi I could chat with. What I ended up with is a quieter workflow. Fewer cables. More thinking.

DEV Community