DEV Community

Osama Ghazal
Osama Ghazal

Posted on

The Core of a Coding Agent Is 128 Lines of Python. So I Built One From Scratch.

128 lines of Python.

That's the entire core of a coding agent — the loop that powers tools like Claude Code and Cursor. I didn't believe it either, so I built one from scratch. Then I pointed it at a failing test, and it read the file, ran the test, saw the traceback, fixed the code, and re-ran it — choosing every step itself. No one hard-coded that.

It's open source (MIT), with a phased roadmap you can follow:

👉 github.com/osama96gh/coding-agent-from-scratch

Why build one instead of reading one

I use coding agents every day. As an AI engineer, I think they're the breakout use case for LLMs right now. But using something and understanding it are different things.

Reading a production agent's source to learn the core is a trap — the essential logic is buried under prompt caching, retries, telemetry, and elaborate scaffolding. You can't see the engine for the bodywork.

So I built just the engine. No optimizations. Just the essence.

The "huh, that small?" numbers

These surprised me enough that I re-counted:

Piece Size
Entire REPL + agent loop + permission gate (main.py) 128 lines
The system prompt that steers all behavior (prompts.py) 19 lines
Tools — read, list, grep, edit, write, run_bash 6 files, smallest is 35
Whole project, incl. 2 swappable providers + streaming ~1,300 lines

The thing that feels like magic — an agent autonomously reading files, running your tests, fixing the failure, re-running — comes out of about a hundred lines of orchestration. The intelligence lives in the model. Your job is plumbing.

The whole trick: the agent loop

Strip away the streaming, the permission gate, and the UI, and the heartbeat of the whole thing is this:

conversation.append({"role": "user", "content": user_input})

while True:  # keep going until the model stops asking for tools
    turn = llm.call(conversation, tools=TOOL_SCHEMAS, system=SYSTEM_PROMPT)
    conversation.append(turn.to_message())

    if not turn.tool_calls:        # plain text → the model is done
        break

    for call in turn.tool_calls:   # otherwise, run each tool it asked for…
        result = run_tool(call.name, call.args)
        conversation.append({
            "role": "tool", "id": call.id,
            "name": call.name, "content": result,
        })
    # …then loop, so the model sees the results and decides what's next
Enter fullscreen mode Exit fullscreen mode

That's it. That's the agent.

  1. You type a request in plain English.
  2. Send the conversation + the list of tools to the LLM.
  3. The model replies with either text (talk to you) or a tool call ("read main.py", "run pytest").
  4. If it's a tool call: run it, append the result to the conversation, loop back to step 2.
  5. If it's text: show it, wait for your next turn.

The model decides which tool and in what order; the loop just keeps turning until the model stops asking.

An agent is just an LLM, a loop, and some tools. Everything else in this repo is refinement on top of those three.

This is also where "it can debug itself" comes from — for free. When the shell tool feeds exit codes and stderr back into the conversation, the model sees the failure on the next turn and proposes a fix. Nobody wrote if tests fail, edit the code. It falls out of the loop.

The six tools

One file each: read_file, list_files, grep, edit_file, write_file, run_bash.

Each is just a function plus a JSON schema describing its arguments — and that schema is all the model needs to know the tool exists and how to call it. "Tool calling" sounds advanced; it's really "here's a function signature, fill in the arguments."

run_bash alone is almost a superpower — with a shell you can stand in for most of the others — which is exactly why an agent needs a permission gate.

The parts that make it feel real

These refinements sit on top of the core, and they're where most of the line count goes:

  • System prompt (19 lines). Tiny, but it's the steering wheel: you're a coding agent, prefer tools over guessing, work step by step.
  • Permission gate. Before anything risky (writes, shell commands), it asks — and the decision reads the arguments, not just the tool name, so git status runs unprompted while git push still stops to ask. The difference between an assistant and rm -rf roulette.
  • Context management. LLMs are stateless — every turn resends the whole conversation, which gets expensive fast. The fixes: lean on the provider's cached prefix, and summarize old turns yourself.
  • Pluggable providers. A thin interface makes OpenAI and Gemini interchangeable — one env var to switch — and keeps anything provider-specific out of the loop.
  • Streaming + usage reporting. See tokens as they generate; know what each turn cost.

The part that surprised me: it just feels real

That failing-test run from the top? I never scripted it. The model chose to read, run, diagnose, fix, and re-run entirely on its own — the same shape of behavior I pay for in Claude Code every day, out of ~128 lines I could read in a single sitting.

The gap between "toy" and "real" is smaller than the hype suggests. The production polish — caching, retries, sandboxing, a thousand handled edge cases — is genuine, hard engineering. But the core that makes an agent an agent is within any engineer's reach in an afternoon.

Build it yourself, one phase at a time

The repo is a phased roadmap — each phase runs on its own and teaches one concept, so you always have a working agent:

  1. The bare chat loop (no tools)
  2. Tool infrastructure + read_file
  3. Read-only exploration (list_files, grep)
  4. Write tools (edit_file, write_file)
  5. The shell tool (run_bash) — where it gets powerful (and dangerous)
  6. System prompt + UX polish
  7. Safety & permissions
  8. Context management for long sessions

GitHub logo osama96gh / coding-agent-from-scratch

Educational Python coding agent built from scratch to explain agent loops, tool calling, code editing, bash execution, permissions, context management, and OpenAI/Gemini providers.

Building a Coding Agent from Scratch

A learning project: build a simple but real coding agent (think a tiny Claude Code / Cursor / Codex), step by step, from nothing — to understand how complex AI agents are actually structured under the hood.

The one-sentence mental model: An agent is just an LLM, a loop, and some tools. Everything else is refinement. (source)

Project description

This repository is an educational, from-scratch Python implementation of a terminal coding agent. It shows the core mechanics behind modern AI coding tools: a model-driven agent loop, tool calling, file exploration, targeted code edits, shell command execution, permission checks, streaming responses, usage reporting, context compaction, and pluggable OpenAI/Gemini providers.

It is meant to be read, modified, and learned from. It is not a production coding agent, but a small reference implementation for understanding how production coding agents are structured under the hood.

What

Build it, break it, extend it (a new tool, a web UI, a third provider) — and tell me how it goes. The fastest way to stop an AI tool from feeling like magic is to build a small one yourself.

Top comments (0)