JoeStrout

Posted on Apr 16

MiniClaw: A Tiny LLM Agent for Mini Micro

#miniscript #minimicro #agents #programming

Instruction-heavy logic in a thin harness

Agents are all the rage these days. Claude Code was one of the first, and perhaps still the most heavily used, specialized for coding. Then OpenClaw burst onto the scene, able to do all sorts of general computer-use things, and caused a shortage of Mac Minis. More recently, Hermes Agent is a common favorite, with over 93 thousand stars on GitHub.

All of these agents work in fundamentally the same way. A "harness" acts as both the main program for an LLM, controlling its context so that it always knows what it needs to know; and provides tools the LLM can use so that it can always do what it needs to do.

I covered accessing LLMs from Mini Micro back in 2022, and again in 2023, so why don't we take it to the logical next step, and create an agent in Mini Micro?

Introducing MiniClaw

Yesterday I sat down and created MiniClaw. It consists mainly of three files:

instructions.txt: these are the instructions to the LLM
agent.ms: the main program, which invokes the LLM and manages its context
tools.ms: code for the tools the agent can use to read, write, and manipulate files

So what can it do? Well, MiniClaw can read any file accessible within Mini Micro, which means the /sys disk, plus whatever minidisk or folder you have mounted as /usr and /usr2. It can also write files (only) under /usr/workspace. So, similar to Claude Code or most other agents, you can use it to create and modify pretty much any kind of text file. Or you can just ask it to explain and summarize things for you. For example, I asked it "tell me about the pictures on the sys disk", and it wrote out a nice summary:

On another occasion, I asked it to create a .md (Markdown) file describing all the demos found in /sys/demo. But then, in a later session, I decided that the document it created was too wordy, so I asked it to shorten it:

The gray text gives us some hints as to what the code is doing: it shows when we call the LLM, how much data we get back as a response, and what tool the LLM is using (and why).

Some tasks, like this one, take only a couple of tool calls. Others take more. The LLM will keep invoking tools, occasionally printing some messages for us about its work, until it figures the task is complete (or that it's unable to complete it).

How it works

The complete agent.ms file is only 263 lines long, divided into 14 functions. That's a bit too long to go over line by line here, but we'll hit the highlights, and I encourage you to check the source file for details.

The big picture is this:

Each time we call the LLM, we give it our instructions (always the same), and the prompt (varies each turn).
The prompt includes messages from the user, previous tool calls made by the agent, and the results of those calls -- all this stuff is called the "history". It also includes the current task, so the LLM is clear on what it's supposed to be doing.
The LLM gives us a response in JSON format: either a tool call, a question for the user, an intermediate message, or a final message (indicating it's done).
We run any tool calls the LLM has asked for, and append the call and results to the history.

And that's pretty much it. The main loop looks like this:

    while true
        if not currentUserInput then
            text.color = color.gray; print "==> ", ""
            text.color = "#00AA00"
            globals.currentUserInput = input
            text.color = color.gray
        end if
        response = getResponse
        respData = json.parse(response)
        if respData == null then
            addToHistory("**IMPORTANT:** You must format your response as a JSON object!")
        else
            handleResponse respData
        end if
    end while

currentUserInput is the instruction the agent is working on; it's empty at the start of the run, or when the agent says it's finished. So then we get more input from the user. (Half the code above is just fiddling with the text color to be fancy.)

Then we call getResponse to get the LLM's response to the current context (instructions plus prompt as described above), and try to parse it as JSON. Occasionally the LLM will forget to format its response as JSON; if that happens, we just add a stern reminder to the history (so the LLM will see it) and try again. Otherwise, we call handleResponse:

handleResponse = function(data)
    addToHistory data
    if data.type == "message" then
        printNicely data.content
        globals.lastMessage = data.content
    else if data.type == "question" then
        printNicely data.content
        addToHistory ["", "--- User response ---", input, "--- End user response ---"].join(EOL)
    else if data.type == "finish" then
        printNicely data.content
        globals.currentUserInput = ""
    else if data.type == "tool_call" then
        handleToolCall data
    else
        addToHistory "ERROR: invalid response type """ + data.type +
          """; must be ""message"", ""question"", ""finish"", or ""tool_call""."
    end if  
end function

I've removed some of the error handling and text-coloring above for clarity, but this is the gist of it. We just switch based on the type of response we got from the LLM; it should be one of the four types we put in the instructions. Again, note that when we want to give the LLM more information -- like the user's response to a question -- we just add it to the history.

Let's talk about that addHistory method a moment. Its job is mainly just to append the given string(s) to a list of strings, so they can be included in the context. But for any agent, context management is very important! Too much context burns through tokens, and degrades LLM performance. So, our addHistory method limits how much history it remembers.

addToHistory = function(entry)
    history.push entry
    globals.historyLen += entry.len
    while historyLen > 4096 and history.len > 8
        globals.historyLen -= history[0].len
        history.pull  // discard element 0
    end while
end function

Probably the next most important function is promptInput, which calculates the "prompt" part of the context -- the part that varies from turn to turn.

promptInput = function
    lines = []
    lines.push "# Task/User Input"
    lines.push currentUserInput
    lines.push ""
    lines.push "# Current State"
    lines.push "Date/time: " + dateTime.now
    if history then
        lines.push ""
        lines.push "# Recent history"
        lines += history
    end if
    return lines.join(EOL)
end function

Simple, right? It's just composing a bit of Markdown calling out the current user input, the current state (which for this version of MiniClaw, is only the date/time), and the history. This stuff is appended to the static instructions, and sent to the LLM.

The Tools

The functions above are going to be pretty standard for any agent. What determines what the agent actually does are the tools and instructions provided to it. In MiniClaw, the tools are separated out into their own file, tools.ms.

This file begins with some little helper functions: err, errMissingArg, and okResult, which all generate little result maps to be returned to the LLM; plus resolvePath and isWriteable, which help the tool code deal with files properly. Then, it has a function for each tool:

list_files
read_file
head_file
tail_file
write_file
delete_file
move_file
make_dir

Each of these takes a map containing arguments (which we've gotten by parsing the JSON from the LLM), does its thing if it can, and then returns a map of results -- usually from one of the err functions, or from okResult, with details (like the path of the affected file) added in. This lets the LLM know whether its attempt to use a tool was successful.

As an example, let's look at head_file, whose job it is to return the first so-many lines of a text file. This tool is described in the instructions file as:

{
    "name": "head_file",
    "desc": "Return the first n lines of a UTF-8 file.  Use this to examine large or unknown text files."
    "arguments": { "path": "string", "lines": "int" }
},

We give the agent the name of the tool, a description including advice on when to use it, and info on the expected arguments. So, the actual MiniScript function is expecting its args map to contain "path" and "lines":

head_file = function(args)
    path = args.get("path"); if not path then return errMissingArg("path")
    if not file.exists(path) then return err("Invalid path `" + path + "`")
    lines = args.get("lines", 10)
    data = file.readLines(path)
    if data.len > lines then data = data[:lines]
    result = okResult
    result.path = path
    result.content = data.join(EOL)
    return result
end function

It pulls those arguments out of the map, does some simple validation on them, reads the file, and returns the requested data as the "content" string of the result map.

The other tools all work in a similar fashion.

Trying it out

You can download the MiniClaw source files from GitHub, but in order to access the LLM (gpt-5.4-nano) it uses, you'll need to set up an API key:

Log in to platform.openai.com.
Click on API Keys on the left.
Click Create new secret key, give it a name like "MiniClaw", and copy the key it shows you.
Paste that into a file called api_key.secret next to agent.ms.

Then you can mount that directory in Mini Micro, and run "agent".

As for cost, I wouldn't worry about it too much... I used this thing a lot yesterday and today while developing it, and it cost under 40 cents. gpt-5.4-nano is pretty cheap, and seems smart enough for everything I've tried so far.

Taking it further

This is where it gets fun: add your own tools! This version of MiniClaw only does basic file creation/manipulation, as you can see from the tool list above. But you could make your own MiniClaw do anything Mini Micro is capable of. Examples:

Display a picture
Play a sound (or series of sounds — making music?)
Launch a program
Access the web or web services via http
Do math
Call Wolfram Alpha for help

And adding more tools is pretty easy: just create a function for it in tools.ms, and add a description of the tool to instructions.txt. That's it; the LLM should be smart enough to invoke it when the time is right.

If you want to switch to a different LLM provider (here's a handy guide to AI models for Hermes), you might have to adjust or rewrite the getResponse function, which formats the input for the LLM and then digs the actual response text out of the JSON package it's buried in. But this is totally doable. You could even run a local LLM, if that's your thing, and connect to it at localhost.

There's also a lot that could be done to improve MiniClaw's user interface. Right now it just prints (albeit with pretty colors, supporting basic markdown commonly used by LLMs) stuff to the text display as it comes in. You could instead make a structured display, keeping the current task up top, showing some info about the context on the side, and neatly formatted responses (perhaps drawn with proportional fonts into a PixelDisplay) below.

My main goal with MiniClaw was to create an agent that is simple and small enough (and MiniScript enough!) to be easily understood and modified. And the great thing is, Mini Micro is a safe sandbox environment, assuming you only mount minidisks or folders you aren't worried about. So go nuts and have fun!

Top comments (8)

Archit Mittal • Apr 20

Tiny agents running inside retro/minimal runtimes is such a fun constraint — forces you to actually think about what the minimum viable agent loop really is. One tip from shipping similar minimal agents: if tool calls aren't needed, a single-shot prompt with structured JSON output is often more reliable than a full ReAct loop for the 80% case, and you can fall back to looping only when the model signals it needs more info. Cuts latency significantly on weak runtimes.

PEACEBINFLOW • Apr 22

The thing that strikes me about this is how much of the agent's behavior is just... a loop with a JSON parser. The magic isn't in the code, it's in the instructions file. The harness is thin. The tools are straightforward. The LLM does the heavy lifting of deciding what to do and when.

What's interesting is how that flips the normal programming instinct. Usually we try to encode the intelligence in the logic—branching, state machines, careful error handling. Here, the logic is just a delivery mechanism for context. The real "program" is the prose you write in instructions.txt. That's a weird shift. It means debugging is less about stepping through code and more about rephrasing a paragraph until the model consistently does what you want.

The context management with the 4096 character rolling window is the quiet, practical detail that keeps this from being a toy. Without it, the history would grow unbounded and either blow the token limit or just get expensive. But it also means the agent has a kind of short-term amnesia. It remembers the last few exchanges, then forgets. For file manipulation tasks, that's probably fine. For longer, more meandering conversations, you'd feel the boundary.

I'm curious how often the JSON formatting reminder actually fires. In my experience, even the smaller models are pretty good at sticking to a schema once you've defined it clearly in the system prompt. But maybe the nano model drifts more. Does it happen often enough that you've considered just using a regex fallback to extract tool calls from malformed responses, or is the stern reminder usually enough?

JoeStrout • Apr 22

Yeah, those are good observations and you're right, it's a paradigm shift.

I only observed the model forgetting to respond in JSON once in two days of testing. I think the reminder would be enough to get it back on track.

mote • Apr 18

Interesting approach! The "harness" concept is a clean abstraction — separating the agent logic from the tool layer.

One thing worth considering: agents like this need persistent state between sessions. If MiniClaw runs once and exits, any context it learned gets lost. The most useful agents maintain some form of memory that survives restarts.

How are you handling the context window management? That's usually where these agents hit limits — the longer a session runs, the more tokens get consumed by conversation history. A sliding window approach or explicit summarization helps, but it's tricky to implement correctly.

For anyone building similar agents, I'd also think about tool access scopes. Giving write access to /usr/workspace is reasonable, but what about accidental destructive operations? Some form of confirmation or sandboxing for risky tools (delete, rm, etc.) prevents oops moments.

JoeStrout • Apr 22

MiniClaw manages context by simply forgetting the oldest history entries when history gets too long (by number of entries or total size). That's a crude method, of course; better would be to summarize, as you say, but I wanted to keep it simple.

Mini Micro already runs in a sandbox; it's incapable (at the virtual "hardware" level) of accessing anything outside the folder or zip file that you mount. So, MiniClaw get reliable sandboxing for free.

You're right about persistent state, though — that's a whole other layer, and important for some kinds of agents. A memory file would be a good first stab at that, but that also eats up context space, so it would have to be managed carefully.

Suny Choudhary • Apr 17

Love seeing stuff like this.

Tiny agents are honestly more interesting to me than giant stacks sometimes. Less magic, more clarity. You actually get to see where the behavior is coming from.