Tyler Tan

Posted on May 20

Building Claude Code from Scratch: A Minimal Agent in 393 Lines of C++

#agents #claude #cpp #showdev

An AI coding assistant that reads your files, writes code, and runs shell commands. The core logic? A single while loop. I thought it was bullshit too, until I built one myself.

The project is called MoonieCode, and the code lives here: https://github.com/Tenaryo/MoonieCode. Written in C++23, clocking in at 393 lines of source (637 if you count tests). Here's what it looks like in action:

$ ./moonie-code -p "list all .cpp files in the project"

A few seconds later Claude spits back your file list. What just happened? You gave it a sentence, it threw that sentence into an HTTP request, shipped it off to a Claude Haiku model somewhere in the cloud, Claude decided it needed to run find, MoonieCode ran it for Claude, fed the output back, and Claude formatted it into something human-readable.

That first step wasn't running bash. First it had to talk to the LLM. So let's start there: how do you get C++ and Claude to shake hands?

Shaking Hands with Claude

Talking to an LLM boils down to two moves: you HTTP POST a blob of JSON at it, and it sends a blob of JSON back. MoonieCode's HttpClient is a 25-line class whose guts are basically this:

cpr::Response response = cpr::Post(
    cpr::Url{base_url_ + "/chat/completions"},
    cpr::Header{{"Authorization", "Bearer " + api_key_},
                {"Content-Type", "application/json"}},
    cpr::Body{request_body.dump()}
);

cpr is a C++ wrapper around libcurl that handles the HTTP plumbing so you don't have to. You stuff your API key into the Authorization header, pack your JSON into the body, and POST to OpenRouter, an LLM API gateway that forwards the request to Claude for you.

So what's in that JSON? Two things: messages and tools.

messages is an array holding the conversation history between you and Claude. At the start it's just one entry:

{"role": "user", "content": "list all .cpp files in the project"}

tools is another array that tells Claude "here's what you have at your disposal." Each tool is a JSON object with a name, a description, and a parameter schema. Claude scans the list and goes, alright, I can ask this program to read files, write files, and run commands for me.

After you fire off the request, Claude sends back a JSON response. And here's where it gets fun: Claude's response comes in exactly two flavors.

Flavor one, straight text. You ask "what's 1+1" and it just answers:

{"choices": [{"message": {"content": "1+1 equals 2"}}]}

Flavor two, tool call. You ask it to "list all cpp files" and it can't answer directly, so it asks for help:

{"choices": [{"message": {"tool_calls": [{
  "id": "call_abc123",
  "function": {
    "name": "Bash",
    "arguments": "{\"command\": \"find . -name '*.cpp'\"}"
  }
}]}}]}

It's saying "I can't do this myself, but run this command for me and I'll take it from there." Notice arguments is a string containing more JSON, Claude packed a shell command inside it.

Now the hard part: how does your code tell these two cases apart? If Claude gives you text, print it. If it wants a tool run, execute the tool. You need those two paths separated cleanly.

MoonieCode solves this with a very C++ move:

using ParsedResponse = std::variant<ContentResult, std::vector<ToolCall>>;

std::variant works like a paranoid envelope: it contains either a letter (ContentResult) or a toolbox (a list of ToolCall objects), never both, never neither. And the compiler makes sure you handle both cases. Omit one, and your build fails.

Handling the variant means pairing it with std::visit and a classic C++ pattern called overloaded:

template <class... Ts>
struct overloaded : Ts... { using Ts::operator()...; };

Six lines of template code that let you dispatch elegantly with lambdas:

std::visit(overloaded{
    [&](const ContentResult& r) { /* Claude answered, print it */ },
    [&](const std::vector<ToolCall>& tcs) { /* Claude wants tools, run them */ },
}, parsed);

The beauty of this pattern is type safety. You physically cannot write code that forgets to handle one of the two possibilities. The compiler will chase you down until every branch exists. People love to complain that C++ is verbose, but this flavor of compile-time guardrail is genuinely satisfying when you're building something that has to not crash.

Alright, your program now knows what Claude wants. Next question: if Claude asked for a tool, what happens?

The While Loop Is the Soul of the Agent

Here's the entire agent loop in pseudocode:

push the user's prompt into messages
while (not done) {
    pack messages + tools into JSON
    POST to Claude
    parse Claude's response
    if (response is text) {
        print it, we're done
    } else if (response is tool calls) {
        append Claude's tool call records to messages
        for (each tool call) {
            execute it locally
            append the result to messages
        }
    }
}

That's it. No black magic, no secret sauce. Peel back the marketing and you find a while loop wrapping a four-step cycle: ask the LLM, see what it wants, if it answered you're done, if it asked for a tool you run it and ask again.

One detail that's easy to overlook: that messages array keeps growing. The "conversation history" with Claude isn't wiped between rounds, it just piles up layer by layer:

Starts with one role: "user" message
Claude says "run this command," so you append an role: "assistant" message with tool_calls
Command finishes, you append a role: "tool" message with the output
Next request carries the entire history, so Claude sees "last time I told you to run this, the result was this, now I will..."

That's the agent's "memory." No vector database, no fancy RAG pipeline, just push_back on a JSON array. Claude reads the full history and naturally chains multi-step reasoning.

What about stopping? MoonieCode has maxIterations = 30. If Claude chains 30 tool calls without giving a final answer, the program pulls the plug. It's a safety fuse that keeps the agent from spinning its wheels forever.

Of course, the real Claude Code is a different beast. Public information suggests its repo weighs in at over half a million lines of TypeScript. It doesn't use a crude 30-iteration cap, it runs a dynamic token budget system. It dispatches sub-agents to handle different tasks in parallel. It asks for confirmation before doing anything dangerous. It supports checkpointing so you can roll back when things explode. It speaks MCP to plug into external data sources. MoonieCode is roughly three orders of magnitude away from the real thing.

And yet. No matter how many layers of engineering get piled on top, the skeleton underneath is the same loop: ask the LLM, check what it wants, execute on its behalf, feed the result back in. That's what MoonieCode strips bare and shows you.

Doing Claude's Dirty Work

Claude says "I want to run find." That intent arrives as a JSON blob. Who turns it into an actual system call? ToolExecutor.

MoonieCode gives Claude three weapons: Read, Write, and Bash. When a tool call comes in, ToolExecutor::execute checks the name field and routes it:

auto ToolExecutor::execute(const ToolCall& tool_call) -> std::string {
    if (tool_call.name == "Read")  return handle_read(tool_call.arguments);
    if (tool_call.name == "Write") return handle_write(tool_call.arguments);
    if (tool_call.name == "Bash")  return handle_bash(tool_call.arguments);
    throw std::runtime_error("Unknown tool: " + tool_call.name);
}

That's it. A plain if-else chain mapping an LLM's "intent" to local C++ functions. No reflection. No plugin registry. No factory pattern. A 393-line project doesn't need design patterns.

Of the three tools, Bash is the star because it hands Claude the nuclear launch codes, it can run literally any command. Read and Write could technically be emulated with Bash (read with cat, write with tee), but they got their own tools because file I/O is so frequent it'd be wasteful, and error-prone, to channel it all through a shell.

Here's what's inside Bash:

auto ToolExecutor::handle_bash(const nlohmann::json& arguments) -> std::string {
    const auto command = arguments["command"].get<std::string>();
    const auto full_cmd = command + " 2>&1";  // capture stderr too

    FILE* pipe = popen(full_cmd.c_str(), "r");
    std::string output;
    std::array<char, 4096> buffer{};
    std::size_t bytes_read = 0;
    while ((bytes_read = fread(buffer.data(), 1, buffer.size(), pipe)) > 0) {
        output.append(buffer.data(), bytes_read);
    }

    int status = pclose(pipe);
    int exit_code = WIFEXITED(status) ? WEXITSTATUS(status) : status;
    output += "\n[exit code: " + std::to_string(exit_code) + "]";
    return output;
}

Pull the command field out of the JSON, tack on 2>&1 to swallow stderr too, popen it, loop fread until the pipe runs dry, pclose to clean up and grab the exit code, then mash stdout, stderr, and exit code into one string and toss it back.

Where does that string go? Right back into the messages array, wearing the role: "tool" badge. Next time Claude gets a request, it reads that message and knows exactly what happened when the command ran. Loop this, and Claude starts to feel like a pilot in a cockpit: the dashboard (messages) shows current state, the joystick (tools) lets it take action.

Read and Write follow the exact same formula: yank parameters from JSON, do local I/O, return a result string. Read uses ifstream to slurp files whole. Write uses ofstream and auto-creates parent directories with create_directories. So clean there's not much else to say.

What 393 Lines Actually Mean

The real Claude Code is reportedly over half a million lines of TypeScript. It has sub-agent dispatching, permission gatekeeping, checkpoint rollback, MCP multi-protocol adaptation, multi-model routing, context window compression, and a long list of features you won't find anywhere in MoonieCode. In terms of capabilities, MoonieCode isn't even a rounding error.

But here's the counterintuitive part: no matter how much engineering gets layered on, the agent loop at the center is the same one. Ask the LLM, receive tool calls, execute locally, feed results back. Those four steps are the Newton's laws of this space. Everything else is engineering.

MoonieCode's 393 lines don't have the right to be compared to Claude Code on features. But they do one thing well: they strip the agent skeleton down to the bone, rip off every layer of engineering skin, and let you stare directly at the heartbeat of an AI coding assistant. Once you've internalized those 393 lines, every AI coding tool you encounter will auto-decompile in your head into "okay, the permissions system is on top, sub-agent scheduling underneath, and at the very bottom... still a while loop."