128 lines of Python.
That's the entire core of a coding agent — the loop that powers tools like Claude Code and Cursor. I didn't believe it either, so I built one from scratch. Then I pointed it at a failing test, and it read the file, ran the test, saw the traceback, fixed the code, and re-ran it — choosing every step itself. No one hard-coded that.
It's open source (MIT), with a phased roadmap you can follow:
👉 github.com/osama96gh/coding-agent-from-scratch
Why build one instead of reading one
I use coding agents every day. As an AI engineer, I think they're the breakout use case for LLMs right now. But using something and understanding it are different things.
Reading a production agent's source to learn the core is a trap — the essential logic is buried under prompt caching, retries, telemetry, and elaborate scaffolding. You can't see the engine for the bodywork.
So I built just the engine. No optimizations. Just the essence.
The "huh, that small?" numbers
These surprised me enough that I re-counted:
| Piece | Size |
|---|---|
Entire REPL + agent loop + permission gate (main.py) |
128 lines |
The system prompt that steers all behavior (prompts.py) |
19 lines |
| Tools — read, list, grep, edit, write, run_bash | 6 files, smallest is 35 |
| Whole project, incl. 2 swappable providers + streaming | ~1,300 lines |
The thing that feels like magic — an agent autonomously reading files, running your tests, fixing the failure, re-running — comes out of about a hundred lines of orchestration. The intelligence lives in the model. Your job is plumbing.
The whole trick: the agent loop
Strip away the streaming, the permission gate, and the UI, and the heartbeat of the whole thing is this:
conversation.append({"role": "user", "content": user_input})
while True: # keep going until the model stops asking for tools
turn = llm.call(conversation, tools=TOOL_SCHEMAS, system=SYSTEM_PROMPT)
conversation.append(turn.to_message())
if not turn.tool_calls: # plain text → the model is done
break
for call in turn.tool_calls: # otherwise, run each tool it asked for…
result = run_tool(call.name, call.args)
conversation.append({
"role": "tool", "id": call.id,
"name": call.name, "content": result,
})
# …then loop, so the model sees the results and decides what's next
That's it. That's the agent.
- You type a request in plain English.
- Send the conversation + the list of tools to the LLM.
- The model replies with either text (talk to you) or a tool call ("read
main.py", "runpytest"). - If it's a tool call: run it, append the result to the conversation, loop back to step 2.
- If it's text: show it, wait for your next turn.
The model decides which tool and in what order; the loop just keeps turning until the model stops asking.
An agent is just an LLM, a loop, and some tools. Everything else in this repo is refinement on top of those three.
This is also where "it can debug itself" comes from — for free. When the shell tool feeds exit codes and stderr back into the conversation, the model sees the failure on the next turn and proposes a fix. Nobody wrote if tests fail, edit the code. It falls out of the loop.
The six tools
One file each: read_file, list_files, grep, edit_file, write_file, run_bash.
Each is just a function plus a JSON schema describing its arguments — and that schema is all the model needs to know the tool exists and how to call it. "Tool calling" sounds advanced; it's really "here's a function signature, fill in the arguments."
run_bash alone is almost a superpower — with a shell you can stand in for most of the others — which is exactly why an agent needs a permission gate.
The parts that make it feel real
These refinements sit on top of the core, and they're where most of the line count goes:
- System prompt (19 lines). Tiny, but it's the steering wheel: you're a coding agent, prefer tools over guessing, work step by step.
-
Permission gate. Before anything risky (writes, shell commands), it asks — and the decision reads the arguments, not just the tool name, so
git statusruns unprompted whilegit pushstill stops to ask. The difference between an assistant andrm -rfroulette. - Context management. LLMs are stateless — every turn resends the whole conversation, which gets expensive fast. The fixes: lean on the provider's cached prefix, and summarize old turns yourself.
- Pluggable providers. A thin interface makes OpenAI and Gemini interchangeable — one env var to switch — and keeps anything provider-specific out of the loop.
- Streaming + usage reporting. See tokens as they generate; know what each turn cost.
The part that surprised me: it just feels real
That failing-test run from the top? I never scripted it. The model chose to read, run, diagnose, fix, and re-run entirely on its own — the same shape of behavior I pay for in Claude Code every day, out of ~128 lines I could read in a single sitting.
The gap between "toy" and "real" is smaller than the hype suggests. The production polish — caching, retries, sandboxing, a thousand handled edge cases — is genuine, hard engineering. But the core that makes an agent an agent is within any engineer's reach in an afternoon.
Build it yourself, one phase at a time
The repo is a phased roadmap — each phase runs on its own and teaches one concept, so you always have a working agent:
- The bare chat loop (no tools)
- Tool infrastructure +
read_file - Read-only exploration (
list_files,grep) - Write tools (
edit_file,write_file) - The shell tool (
run_bash) — where it gets powerful (and dangerous) - System prompt + UX polish
- Safety & permissions
- Context management for long sessions
osama96gh
/
coding-agent-from-scratch
Educational Python coding agent built from scratch to explain agent loops, tool calling, code editing, bash execution, permissions, context management, and OpenAI/Gemini providers.
Building a Coding Agent from Scratch
A learning project: build a simple but real coding agent (think a tiny Claude Code / Cursor / Codex), step by step, from nothing — to understand how complex AI agents are actually structured under the hood.
The one-sentence mental model: An agent is just an LLM, a loop, and some tools. Everything else is refinement. (source)
Project description
This repository is an educational, from-scratch Python implementation of a terminal coding agent. It shows the core mechanics behind modern AI coding tools: a model-driven agent loop, tool calling, file exploration, targeted code edits, shell command execution, permission checks, streaming responses, usage reporting, context compaction, and pluggable OpenAI/Gemini providers.
It is meant to be read, modified, and learned from. It is not a production coding agent, but a small reference implementation for understanding how production coding agents are structured under the hood.
What
…Build it, break it, extend it (a new tool, a web UI, a third provider) — and tell me how it goes. The fastest way to stop an AI tool from feeling like magic is to build a small one yourself.
Top comments (0)