How I Built an Agentic Coding CLI from Scratch

Vignesh Pai — Mon, 04 May 2026 01:22:48 +0000

I wanted to understand how AI coding tools actually work under the hood. Not just use them — but build one myself.

So I built AgentCode: an open-source, multi-model agentic coding CLI. You type a request in plain English, and it reads your codebase, writes code, runs tests, manages git — all autonomously.

Here's what I learned building it.

The Core Insight: It's Just a Loop

Every agentic coding tool — no matter how polished — runs the same fundamental pattern:

while needs_follow_up:
    1. Send conversation + tools → LLM
    2. If LLM returns tool calls → execute them, append results, loop
    3. If LLM returns text → done

That's it. The "magic" of AI coding agents is a while loop with function calling. The other 95% is context management, tool execution, error handling, and permissions.

Here's the simplified version of my agentic loop:

def run_agent_loop(user_input, conversation, config):
    conversation.add_user(user_input)

    for iteration in range(config.max_iterations):
        stream = completion(
            model=routed_model,
            messages=conversation.messages,
            tools=TOOL_DEFINITIONS,
            stream=True,
        )

        text, tool_calls, usage = process_stream(stream)

        if not tool_calls:
            # No tools called — model is done
            conversation.add_assistant(content=text)
            break

        # Execute each tool, feed results back, loop
        for tc in tool_calls:
            result = execute_tool(tc.name, tc.args)
            conversation.add_tool_result(tc.id, result)

When a user says "fix the bug in app.py", the LLM doesn't magically fix anything. It calls read_file("app.py"), sees the code, calls edit_file(...) with the fix, then calls run_command("pytest") to verify. Each step is a tool call that the loop executes and feeds back.

Architecture

┌─────────────────────────────────────────────────┐
│                  cli.py (UI)                    │
│  REPL loop · slash commands · Rich terminal UI  │
└──────────────────────┬──────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────┐
│               agent.py (Brain)                  │
│  Agentic loop · context management · permissions│
│                                                 │
│   LiteLLM ──→ Claude / GPT / Gemini / Ollama    │
└──────────────────────┬──────────────────────────┘
                       │
┌──────────────────────▼──────────────────────────┐
│               tools.py (Hands)                  │
│  read_file · write_file · edit_file             │
│  run_command · git_commit · search_text         │
└─────────────────────────────────────────────────┘

Three files, three responsibilities:

cli.py — the terminal UI (REPL, slash commands, session management)
agent.py — the brain (agentic loop, streaming, permissions, context compaction)
tools.py — the hands (file I/O, bash execution, git, search)

The Feature I'm Most Proud Of: Cost-Aware Routing

Most AI coding tools lock you into one model. You pay the same price whether you're asking "what does this function do" or "refactor the entire auth system."

AgentCode classifies every prompt by complexity and automatically picks the cheapest model that can handle it:

Tier	Example Prompt	Model	Why
Light	"what does this function do"	Haiku	Fast, cheap — just reading and explaining
Medium	"write unit tests for app.py"	Sonnet	Needs to understand code and generate new code
Heavy	"refactor the entire auth system"	Opus	Multi-file, multi-step, architectural thinking

The classification uses pattern matching on the input — words like "refactor", "migrate", "entire codebase" trigger heavy; "write", "create", "fix" trigger medium; "explain", "what is", "show me" trigger light.

def classify_complexity(user_input):
    text = user_input.lower()

    heavy_score = sum(1 for p in HEAVY_PATTERNS if re.search(p, text))
    medium_score = sum(1 for p in MEDIUM_PATTERNS if re.search(p, text))

    if heavy_score >= 2:
        return "heavy"
    elif medium_score >= 1:
        return "medium"
    else:
        return "light"

Simple, transparent, and saves real money. You can always override with /model if you disagree with the routing.

Streaming: The UX Difference

The first version waited for the full LLM response before showing anything. You'd stare at a blank terminal for 5-10 seconds. Adding streaming was a night-and-day improvement.

The tricky part with streaming in an agentic loop: the LLM can return text AND tool calls in the same response. Text tokens arrive one at a time, but tool call arguments arrive as fragments that need to be assembled.

def process_stream(stream):
    full_text = ""
    tool_calls_acc = {}

    for chunk in stream:
        delta = chunk.choices[0].delta

        # Text tokens — print immediately
        if delta.content:
            print(delta.content, end="", flush=True)
            full_text += delta.content

        # Tool call fragments — accumulate silently
        if delta.tool_calls:
            for tc_delta in delta.tool_calls:
                idx = tc_delta.index
                if idx not in tool_calls_acc:
                    tool_calls_acc[idx] = {"id": "", "name": "", "arguments": ""}
                if tc_delta.function.arguments:
                    tool_calls_acc[idx]["arguments"] += tc_delta.function.arguments

    return full_text, tool_calls_acc

Text streams to the screen in real-time. Tool calls assemble in the background. The user sees words appearing instantly while the agent figures out what to do next.

Multi-Model Support

AgentCode uses LiteLLM as an abstraction layer. This means I write one set of tool definitions in OpenAI's format, and LiteLLM translates them to whatever the provider expects.

Switch models mid-conversation:

❯ /model gpt-4o
✓ Switched to gpt-4o

❯ /model claude-opus-4-6
✓ Switched to claude-opus-4-6

❯ /model ollama/qwen2.5-coder
✓ Switched to ollama/qwen2.5-coder

Same tools, same loop, different brain. The local Ollama option means you can run the entire thing with zero API cost.

The Permission System

Any tool that writes files or executes commands asks before acting:

🔒 Permission Required
Tool: write_file
Args: {"path": "src/handler.py", "content": "..."}
Allow this action? [y/n] (y):

Read-only tools (read_file, list_directory, search) auto-approve. This keeps the flow fast while preventing the agent from doing anything destructive without your consent.

What I Learned

1. Context management is the hard problem. The agentic loop itself is trivial. Managing what's in the context window — compacting old messages, summarizing, keeping the right information available — that's where the real engineering is.

2. Tool definitions matter more than the prompt. A well-described tool with clear parameter descriptions outperforms a clever system prompt. The LLM reads the tool schema like documentation.

3. Streaming changes everything. The difference between "wait 8 seconds for a response" and "see words appearing instantly" is the difference between a frustrating tool and one you enjoy using.

4. Multi-model flexibility is underrated. Different models excel at different tasks. Being able to hot-swap between them — or let the router decide — means you always have the right tool for the job.

Try It

pip install agentcode-cli
export ANTHROPIC_API_KEY="your-key"
agentcode

The codebase is readable Python — no frameworks, no abstractions. If you're curious how agentic coding tools work, clone it and read through agent.py. The entire loop is about 50 lines.

GitHub: github.com/vigp17/AgentCode
PyPI: pypi.org/project/agentcode-cli

MIT licensed. Feedback and contributions welcome.

Tags: python, ai, opensource, tutorial

DEV Community: Vignesh Pai