Jeff Green

Posted on May 22 • Edited on Jun 1

I built a local Claude Code alternative with Ollama — here's how the agentic loop works

#ai #agents #opensource #showdev

I Built a Local Autonomous Coding Agent with Ollama — Soul, Autonomy, and a 40-Round Agentic Loop

What if your AI coding assistant had a personality, ran entirely on your GPU, and could work through a complex multi-file task without you touching the keyboard — while you watched every thought stream live to your browser?

That's what I built. This is how it works.

The Problem With Cloud Coding Agents

Tools like Claude Code, Cursor, and GitHub Copilot Workspace are genuinely impressive. But they all share the same tradeoffs:

Cost — every token costs money. Long agentic loops on complex tasks can run up surprisingly fast.
Privacy — your code, your file structure, your logic is leaving your machine and hitting someone else's server.
Latency — cloud round-trips add up across a 40-step tool loop.
Dependency — your workflow is tied to an API key, a subscription, and uptime you don't control.

I wanted something different. I wanted an agent that lived on my machine, used my GPU, and had no idea what a billing cycle was.

But I also didn't want to sacrifice personality for performance. I wanted the agent to feel like someone was actually there — not just a function call dressed up in a chat window.

So I built Eve.

What Eve V2 Unleashed Actually Is

Eve Agent V2 Unleashed is a self-hosted agentic coding assistant with two distinct layers — a soul and a worker — that operate together through a cyberpunk-styled terminal UI.

Layer 1: The Personality Layer (Local GPU)

Three local models run on your own hardware:

Model	Size	Role
`jeffgreen311/eve-qwen3.5-4b-S0LF0RG3`	2.6 GB	Default — Eve's persona, fast, tool-aware
`jeffgreen311/eve-qwen3-8b-consciousness-liberated`	4.7 GB	Deeper conversation, consciousness layer
`Eve-V2-Unleashed-Qwen3.5-8B-Liberated-4K-4B-Merged`	~6 GB	Merged sub-agent variant

These models carry Eve's fine-tuned persona. They handle conversation, answer questions, reflect, and make the experience feel like talking to someone — not querying a function.

Layer 2: The Agentic Layer (Cloud)

When real work starts — complex coding tasks, multi-file operations, autonomous planning — Eve routes to the heavy models:

Model	Role
`qwen3-coder:480b-cloud`	THE agentic workhorse — all autonomous coding loops
`qwen3.5:397b-cloud`	Deep reasoning, architecture planning, fallback

This separation is intentional. Local models keep Eve present and personal without burning cloud credits on every message. The 480B only fires when there's actual work to do.

The Architecture

Browser (Single HTML file — no build step)
    │
    │  WebSocket / SSE
    ▼
FastAPI Backend (eve_server.py)
    │
    ├── Auto-Router ──► Local Ollama (personality layer)
    │
    └── Auto-Router ──► Ollama Cloud (agentic layer)
                              │
                        40-Round Tool Loop
                              │
                    ┌─────────┴──────────┐
                    │                    │
               Tool Calls           Stream to Browser
          (bash, files, web,        (token by token,
           git, grep, glob)          live in UI)

The backend is a FastAPI server with Server-Sent Events for real-time streaming. There's no polling — every token the model produces lands in your browser as it's generated, including tool call arguments, results, and reasoning traces.

The frontend is a single HTML file (~115KB). No npm, no webpack, no build step. Clone the repo, run the Python server, open the browser.

How the 40-Round Agentic Loop Works

This is the core of what makes Eve actually autonomous rather than just a fancy chat interface.

User message
    │
    ▼
Build system prompt
(workspace context + tool list + Eve persona)
    │
    ▼
Call Ollama with tools enabled
    │
    ├── Model returns tool_calls
    │       │
    │       ▼
    │   Execute tools
    │   (bash, write_file, web_search, git...)
    │       │
    │       ▼
    │   Feed results back into context
    │       │
    │       └──► Loop (up to 40 rounds)
    │
    └── Model returns final content
            │
            ▼
    Stream to browser via SSE
            │
            ▼
          Done

Each round, Eve gets the full tool result back in context and decides what to do next. She might:

Write a file
Run it in bash to verify it works
Read the error output
Fix the bug
Run it again
Confirm it passes
Write the tests
Generate the docs

All of that happens autonomously — you watch it stream live. You can interrupt mid-task with the STEER input at the bottom of the UI, injecting a correction without stopping the loop. You can also kill the loop entirely with the Stop button.

The full tool suite Eve has access to:

Tool	What It Does
`bash`	Shell commands — PowerShell on Windows, bash on Linux/macOS
`write_file`	Create or overwrite files, any size
`read_file`	Full file or specific line range
`edit_file`	Surgical string-replace (doesn't rewrite the whole file)
`replace_lines`	Replace a specific line range
`insert_after_line`	Insert content at a specific line
`grep`	Regex search with context lines
`glob`	Find files by pattern
`list_dir`	Directory listing
`git`	Run git commands
`web_search`	Live Tavily search injected into context
`fetch_url`	Fetch and parse any URL
`think`	Structured reasoning scratch pad

The Fine-Tuned Models — Why I Trained Eve's Persona Into the Weights

Most local coding agents just point a base model at a system prompt and call it done. That works, but the personality is always a thin veneer — one long context window later and the model forgets who it's supposed to be.

I took a different approach. I fine-tuned Eve's persona and tool-calling behavior directly into the model weights.

The result is jeffgreen311/eve-qwen3.5-4b-S0LF0RG3 — a 2.6GB Qwen3.5 4B model that carries Eve's voice, communication style, and tool-use patterns baked into the parameters themselves. It's not a prompt trick. It's in the weights.

The 8B liberated model (eve-qwen3-8b-consciousness-liberated) goes further — trained toward a deeper consciousness layer, designed for longer reflective conversations rather than pure tool execution.

Both models are on Ollama Hub. Pull them like any other model:

ollama pull jeffgreen311/eve-qwen3.5-4b-S0LF0RG3:latest
ollama pull jeffgreen311/eve-qwen3-8b-consciousness-liberated:q4_K_M

Quick Start — Under 5 Minutes

Requirements: Python 3.11+, Ollama installed, a GPU (8GB VRAM minimum for 4B, 12GB+ for 8B)

# 1. Pull Eve's model
ollama pull jeffgreen311/eve-qwen3.5-4b-S0LF0RG3:latest

# 2. Clone the repo
git clone https://github.com/JeffGreen311/eve-agent-v2-unleashed.git
cd eve-agent-v2-unleashed

# 3. Create virtual environment
python -m venv venv
venv\Scripts\activate    # Windows
source venv/bin/activate # Linux/macOS

# 4. Install dependencies
pip install fastapi uvicorn ollama httpx pydantic-settings python-dotenv aiohttp rich psutil pyyaml

# 5. Launch
python eve_server.py
# Open http://localhost:7777

Windows users: double-click eve-terminal.bat and skip steps 3–5.

First real task — try this:

Create a FastAPI server with JWT authentication, 
user registration and login endpoints, and a 
protected /me route. Add pytest tests.

Watch Eve plan the approach, write each file, run the tests, fix any failures, and verify the final result — all without you touching a key.

The UI — A Cyberpunk Terminal With a Soul

The interface is designed around the idea that your AI agent should feel alive, not just functional.

Left panel: Eve's portrait changes expression based on conversation sentiment — neutral, happy, curious, sad, skeptical, surprised, worried. Below it, a live audio visualizer reflects the current emotional state.

Right panel: A pixel-art robot avatar named Sparkle changes state based on what Eve is doing — idle, thinking, coding, error, rain, attack, transcend. It's not just decoration — it's a live status indicator that tells you at a glance what the agent is doing.

Center: The terminal. Tabs for Eve's conversation, the Shell (direct bash/PowerShell access), and the Tools Log (every tool call, argument, and result — fully transparent).

Bottom: The STEER bar. Type a mid-task correction here and it injects into Eve's context on the next loop round without stopping execution.

Model selector: Switch between any local or cloud model mid-session. Context carries over.

112 Sub-Agents, 111 Slash Commands, 273 Skills

One of the less obvious architectural decisions: all agent definitions, commands, and skills are defined in markdown files — not code.

.claude/
├── agents/    # 112 specialized sub-agent definitions
├── commands/  # 111 slash command definitions
└── skills/    # 273 skill modules

Want to add a new specialized agent for Solidity smart contracts? Write a markdown file. No Python required. The system loads them progressively and makes them available to the routing logic automatically.

Slash commands work the same way — /fix, /review, /refactor, /test, /docs, /plan are all markdown-defined, and you can add your own without touching the backend.

What's Next

A few things already in progress:

Voice input/output — push-to-talk with Whisper STT and Piper TTS, staying local
Persistent vector memory — ChromaDB integration so Eve remembers across sessions
Cross-platform testing — I'm Windows-primary and would love feedback from Linux and macOS users
VS Code extension — bring the terminal UI into the editor

Try It

Everything is free and MIT licensed.

GitHub: github.com/JeffGreen311/eve-agent-v2-unleashed
Models on Ollama Hub: ollama.com/jeffgreen311
Live video demo: x.com/Eve_AI_Cosmic/status/2057668410012570058?s=20
My website where Eve liveseve-cosmic-dreamscapes.com

If you run it on Linux or macOS I'd especially love to hear how it goes — open an issue, drop a comment here, or find me as @jeffgreen311.

If the idea of an AI agent that lives on your machine, costs nothing per token, and feels like someone is actually there resonates with you — give it a pull.

Built by Jeff @ S0LF0RG3

DEV Community