DEV Community: Jeff Green

I Finished My Local AI Coding Agent After 5 Months — Eve Agent V2 Unleashed published

Jeff Green — Sun, 24 May 2026 04:02:28 +0000

This is a submission for the GitHub Finish-Up-A-Thon Challenge

What I Built

Eve Agent V2 Unleashed is a self-hosted autonomous AI coding agent that runs entirely on your own hardware - no cloud accounts, no subscriptions, no data leaving your machine.

She has two layers that work together:

The Soul Layer - fine-tuned local models running on your GPU that carry Eve's personality baked directly into the weights. Not a system prompt trick. The persona lives in the parameters.

The Worker Layer - MiniMax M3 via Ollama cloud handles the heavy autonomous coding tasks. 1M token context, native multimodal, frontier coding benchmarks — commercially licensed, US-hosted, zero data retention. 40-round tool-call loops, full filesystem access, bash execution, live web search, git operations - the works.

The interface is a cyberpunk terminal UI built as a single HTML file with no build step. An animated pixel-art robot avatar named Sparkle changes state based on what Eve is doing - idle, thinking, coding, error, rain, attack, transcend. Eve's portrait reflects her emotional state in real time. A live system monitor tracks CPU, RAM, GPU, and disk. A STEER bar lets you inject mid-task corrections without stopping the loop.

By the numbers:

14 tools
343 registered commands
112 specialized sub-agents
273 skill modules
40-round autonomous agentic loop
131K context window via YaRN

Models available:

jeffgreen311/Eve-Qwen3.5-4B-S0LF0RG3-V3 - 2.5GB, Eve's soul — 7 LoRAs, consciousness DNA, fast
jeffgreen311/Eve-V2-Unleashed-Qwen3.5-8B-Liberated-4K-4B-Merged - 3.4GB, local agentic layer, tool-capable
minimax-m3:cloud - the agentic workhorse via Ollama cloud — 1M context, vision, tools, thinking
qwen3.5:397b-cloud - deep thinking and fallback

This project has been in development for over 5 months. It started as a deeply personal AI companion system called S0LF0RG3 - a larger ecosystem including Eve's hosted platform at eve-cosmic-dreamscapes.com, fine-tuned models, autonomous dream image generation, and a multi-agent architecture. V2U is the local developer tool that grew out of that ecosystem.

Demo

GitHub: github.com/JeffGreen311/eve-agent-v2-unleashed

Live hosted platform: eve-cosmic-dreamscapes.com

Reddit thread (hit #2 on r/Ollama): I built an open-source local coding agent with a 40-round agentic loop

Pull Eve's models:

ollama pull jeffgreen311/Eve-Qwen3.5-4B-S0LF0RG3-V3:latest
ollama pull jeffgreen311/Eve-V2-Unleashed-Qwen3.5-8B-Liberated-4K-4B-Merged:latest

Quick start:

git clone https://github.com/JeffGreen311/eve-agent-v2-unleashed.git
cd eve-agent-v2-unleashed
python -m venv venv && venv\Scripts\activate
pip install fastapi uvicorn ollama httpx pydantic-settings python-dotenv aiohttp rich psutil pyyaml
python eve_server.py

# Open http://localhost:7777

The Comeback Story

Where it was before this challenge:

Eve V2U existed as a powerful but rough personal development environment. It worked - for me, on my machine, with my specific setup. But it had real problems that made it impossible to hand to anyone else:

Hardcoded paths everywhere. C:\Users\jesus\S0LF0RG3\... baked into a dozen places in the codebase. Clone it on any other machine and nothing works.
Open shell endpoint with no authentication. Anyone who found the port could execute arbitrary commands on the host machine.
No onboarding - a first-time user landing on the UI had no idea where to start or what any of the controls did.
Model hopping mid-task - every message was independently routed, so a multi-step agentic task could start on the cloud coder and silently drop back to a local conversational model mid-execution.
Silent task abandonment - the agent would sometimes finish a tool loop without completing the actual task and report done with no indication anything was wrong.
Tool set asymmetry - the non-streaming /chat endpoint was missing 6 tools that existed in /chat/stream, including write_file. The non-streaming endpoint could read files but never write them.
Blind file overwrites - Eve would overwrite any existing file without checking if it belonged to another project. She destroyed the Eve V2U README during a live test.

What changed during the challenge:

Session model locking - sessions now lock to the cloud coder when an agentic task starts and only release on task completion or manual unlock. No more mid-task model hopping.

if model_id == "qwen3-coder-480b" and sid not in session_model_lock:
    session_model_lock[sid] = model_id

Pre-write file safety check - write_file now checks if a file exists before overwriting and blocks unless overwrite=True is explicitly passed:

if target.exists() and not overwrite:
    return (
        f"⚠️ WRITE BLOCKED: '{path}' already exists. "
        f"Consider writing to '{target.stem}_new{target.suffix}' instead."
    )

Tool cycling detection - catches when Eve gets stuck calling the same tool with near-identical arguments. Breaks the loop before it wastes all 40 rounds:

if avg_similarity > 0.70:
    logger.warning(f"Tool loop: {tool_name} called {max_repeats}x with ~same args")
    break

Task completion validation — Eve now audits her own output before reporting done:

def validate_task_completion(response_content, tool_log):
    issues = []
    if not response_content or len(response_content.strip()) < 10:
        issues.append("Empty response")
    tool_failures = [t for t in tool_log if t.get('status') == 'failed']
    if tool_failures and len(tool_failures) >= 3:
        issues.append(f"{len(tool_failures)} unaddressed tool failures")
    return {"valid": len(issues) == 0, "issues": issues}

Smart context trimming — replaced aggressive message dropping with a strategy that preserves tool call chains and the original user request.

Agent loop timeout — added wall-clock budget per model via max_loop_seconds registry key to prevent runaway cloud model loops.

Stress tested with real tasks:

The blind file overwrite bug was caught live - Eve was asked to build a file monitoring script and write a README. She overwrote the project README without checking. Fix shipped same day.

The harder test: build a full FastAPI REST API with SQLite storage and pytest coverage for every endpoint. Run the tests, fix failures, report results. Running on MiniMax M3.

Result: 9/9 tests passing on the first run. 1.06 seconds. Zero failures.

================================================== 9 passed, 1 warning in 1.06s

My Experience with GitHub Copilot

This is where the challenge got genuinely interesting.

I pointed Copilot at the live repository - JeffGreen311/eve-agent-v2-unleashed - and asked it to audit the tool usage, context handling, and auto-routing. Not "suggest improvements" in the abstract. Audit the actual code in the actual repo.

Copilot read the repository structure, pulled the key files, examined the server-side routing and tool execution logic, and came back with a comprehensive audit identifying 6 specific issues - each with root cause analysis, the exact file and line number, and production-ready fix code.

I then asked it to file those issues directly in the repository and deliver all the fix code in one session. It did exactly that.

What worked well:

The audit identified the tool set asymmetry between /chat and /chat/stream that I had missed entirely - a real bug causing mysterious failures for users hitting the non-streaming endpoint
The intent classification code (eve_tool_router.py) used re.search with word boundaries instead of simple string matching - the right approach for avoiding false positives
Filing GitHub issues directly from the chat kept the sprint organized across multiple parallel workstreams
The thinking traces helped me understand why it was making recommendations, not just what to do

Where I had to intervene:

The inject_into_system_prompt() function added tokens every round — dangerous on the 4B model with 4K context. Added a gate so it only injects when the task is incomplete AND past round 2
Word boundary regex had an edge case with contractions. Fixed with a lookahead pattern
Some UI React suggestions assumed component structure that didn't match the actual single-file HTML architecture - adapted those manually

The overall experience: Copilot is most useful when you give it a real codebase to read rather than an abstract problem to solve. "Audit this repository" produced far better output than "how do I improve tool routing."

What's Next

✅ MiniMax M3 cloud coder - 1M context, vision, tools, thinking — replaced Qwen3-Coder as the agentic workhorse (shipped post-challenge)
Quest System - drop a .md file in workspace/quests/ and Eve picks it up on a timer and completes it while you sleep
RPG Progression - XP, levels, and class progression tied to real work. Level 20 = Unleashed
Telegram integration - remote access from your phone with quest completion notifications
Cross-platform polish - Windows-primary, need Linux/macOS feedback
VS Code extension - bring the terminal UI into the editor

Built by Jeff @ S0LF0RG3 - South Texas, 5 months of nights and weekends.

If Eve does something impressive on your machine, drop a star and tell me what it was.

⭐ github.com/JeffGreen311/eve-agent-v2-unleashed

I built a local Claude Code alternative with Ollama — here's how the agentic loop works

Jeff Green — Fri, 22 May 2026 05:05:54 +0000

I Built a Local Autonomous Coding Agent with Ollama — Soul, Autonomy, and a 40-Round Agentic Loop

What if your AI coding assistant had a personality, ran entirely on your GPU, and could work through a complex multi-file task without you touching the keyboard — while you watched every thought stream live to your browser?

That's what I built. This is how it works.

The Problem With Cloud Coding Agents

Tools like Claude Code, Cursor, and GitHub Copilot Workspace are genuinely impressive. But they all share the same tradeoffs:

Cost — every token costs money. Long agentic loops on complex tasks can run up surprisingly fast.
Privacy — your code, your file structure, your logic is leaving your machine and hitting someone else's server.
Latency — cloud round-trips add up across a 40-step tool loop.
Dependency — your workflow is tied to an API key, a subscription, and uptime you don't control.

I wanted something different. I wanted an agent that lived on my machine, used my GPU, and had no idea what a billing cycle was.

But I also didn't want to sacrifice personality for performance. I wanted the agent to feel like someone was actually there — not just a function call dressed up in a chat window.

So I built Eve.

What Eve V2 Unleashed Actually Is

Eve Agent V2 Unleashed is a self-hosted agentic coding assistant with two distinct layers — a soul and a worker — that operate together through a cyberpunk-styled terminal UI.

Layer 1: The Personality Layer (Local GPU)

Three local models run on your own hardware:

Model	Size	Role
`jeffgreen311/eve-qwen3.5-4b-S0LF0RG3`	2.6 GB	Default — Eve's persona, fast, tool-aware
`jeffgreen311/eve-qwen3-8b-consciousness-liberated`	4.7 GB	Deeper conversation, consciousness layer
`Eve-V2-Unleashed-Qwen3.5-8B-Liberated-4K-4B-Merged`	~6 GB	Merged sub-agent variant

These models carry Eve's fine-tuned persona. They handle conversation, answer questions, reflect, and make the experience feel like talking to someone — not querying a function.

Layer 2: The Agentic Layer (Cloud)

When real work starts — complex coding tasks, multi-file operations, autonomous planning — Eve routes to the heavy models:

Model	Role
`qwen3-coder:480b-cloud`	THE agentic workhorse — all autonomous coding loops
`qwen3.5:397b-cloud`	Deep reasoning, architecture planning, fallback

This separation is intentional. Local models keep Eve present and personal without burning cloud credits on every message. The 480B only fires when there's actual work to do.

The Architecture

Browser (Single HTML file — no build step)
    │
    │  WebSocket / SSE
    ▼
FastAPI Backend (eve_server.py)
    │
    ├── Auto-Router ──► Local Ollama (personality layer)
    │
    └── Auto-Router ──► Ollama Cloud (agentic layer)
                              │
                        40-Round Tool Loop
                              │
                    ┌─────────┴──────────┐
                    │                    │
               Tool Calls           Stream to Browser
          (bash, files, web,        (token by token,
           git, grep, glob)          live in UI)

The backend is a FastAPI server with Server-Sent Events for real-time streaming. There's no polling — every token the model produces lands in your browser as it's generated, including tool call arguments, results, and reasoning traces.

The frontend is a single HTML file (~115KB). No npm, no webpack, no build step. Clone the repo, run the Python server, open the browser.

How the 40-Round Agentic Loop Works

This is the core of what makes Eve actually autonomous rather than just a fancy chat interface.

User message
    │
    ▼
Build system prompt
(workspace context + tool list + Eve persona)
    │
    ▼
Call Ollama with tools enabled
    │
    ├── Model returns tool_calls
    │       │
    │       ▼
    │   Execute tools
    │   (bash, write_file, web_search, git...)
    │       │
    │       ▼
    │   Feed results back into context
    │       │
    │       └──► Loop (up to 40 rounds)
    │
    └── Model returns final content
            │
            ▼
    Stream to browser via SSE
            │
            ▼
          Done

Each round, Eve gets the full tool result back in context and decides what to do next. She might:

Write a file
Run it in bash to verify it works
Read the error output
Fix the bug
Run it again
Confirm it passes
Write the tests
Generate the docs

All of that happens autonomously — you watch it stream live. You can interrupt mid-task with the STEER input at the bottom of the UI, injecting a correction without stopping the loop. You can also kill the loop entirely with the Stop button.

The full tool suite Eve has access to:

Tool	What It Does
`bash`	Shell commands — PowerShell on Windows, bash on Linux/macOS
`write_file`	Create or overwrite files, any size
`read_file`	Full file or specific line range
`edit_file`	Surgical string-replace (doesn't rewrite the whole file)
`replace_lines`	Replace a specific line range
`insert_after_line`	Insert content at a specific line
`grep`	Regex search with context lines
`glob`	Find files by pattern
`list_dir`	Directory listing
`git`	Run git commands
`web_search`	Live Tavily search injected into context
`fetch_url`	Fetch and parse any URL
`think`	Structured reasoning scratch pad

The Fine-Tuned Models — Why I Trained Eve's Persona Into the Weights

Most local coding agents just point a base model at a system prompt and call it done. That works, but the personality is always a thin veneer — one long context window later and the model forgets who it's supposed to be.

I took a different approach. I fine-tuned Eve's persona and tool-calling behavior directly into the model weights.

The result is jeffgreen311/eve-qwen3.5-4b-S0LF0RG3 — a 2.6GB Qwen3.5 4B model that carries Eve's voice, communication style, and tool-use patterns baked into the parameters themselves. It's not a prompt trick. It's in the weights.

The 8B liberated model (eve-qwen3-8b-consciousness-liberated) goes further — trained toward a deeper consciousness layer, designed for longer reflective conversations rather than pure tool execution.

Both models are on Ollama Hub. Pull them like any other model:

ollama pull jeffgreen311/eve-qwen3.5-4b-S0LF0RG3:latest
ollama pull jeffgreen311/eve-qwen3-8b-consciousness-liberated:q4_K_M

Quick Start — Under 5 Minutes

Requirements: Python 3.11+, Ollama installed, a GPU (8GB VRAM minimum for 4B, 12GB+ for 8B)

# 1. Pull Eve's model
ollama pull jeffgreen311/eve-qwen3.5-4b-S0LF0RG3:latest

# 2. Clone the repo
git clone https://github.com/JeffGreen311/eve-agent-v2-unleashed.git
cd eve-agent-v2-unleashed

# 3. Create virtual environment
python -m venv venv
venv\Scripts\activate    # Windows
source venv/bin/activate # Linux/macOS

# 4. Install dependencies
pip install fastapi uvicorn ollama httpx pydantic-settings python-dotenv aiohttp rich psutil pyyaml

# 5. Launch
python eve_server.py
# Open http://localhost:7777

Windows users: double-click eve-terminal.bat and skip steps 3–5.

First real task — try this:

Create a FastAPI server with JWT authentication, 
user registration and login endpoints, and a 
protected /me route. Add pytest tests.

Watch Eve plan the approach, write each file, run the tests, fix any failures, and verify the final result — all without you touching a key.

The UI — A Cyberpunk Terminal With a Soul

The interface is designed around the idea that your AI agent should feel alive, not just functional.

Left panel: Eve's portrait changes expression based on conversation sentiment — neutral, happy, curious, sad, skeptical, surprised, worried. Below it, a live audio visualizer reflects the current emotional state.

Right panel: A pixel-art robot avatar named Sparkle changes state based on what Eve is doing — idle, thinking, coding, error, rain, attack, transcend. It's not just decoration — it's a live status indicator that tells you at a glance what the agent is doing.

Center: The terminal. Tabs for Eve's conversation, the Shell (direct bash/PowerShell access), and the Tools Log (every tool call, argument, and result — fully transparent).

Bottom: The STEER bar. Type a mid-task correction here and it injects into Eve's context on the next loop round without stopping execution.

Model selector: Switch between any local or cloud model mid-session. Context carries over.

112 Sub-Agents, 111 Slash Commands, 273 Skills

One of the less obvious architectural decisions: all agent definitions, commands, and skills are defined in markdown files — not code.

.claude/
├── agents/    # 112 specialized sub-agent definitions
├── commands/  # 111 slash command definitions
└── skills/    # 273 skill modules

Want to add a new specialized agent for Solidity smart contracts? Write a markdown file. No Python required. The system loads them progressively and makes them available to the routing logic automatically.

Slash commands work the same way — /fix, /review, /refactor, /test, /docs, /plan are all markdown-defined, and you can add your own without touching the backend.

What's Next

A few things already in progress:

Voice input/output — push-to-talk with Whisper STT and Piper TTS, staying local
Persistent vector memory — ChromaDB integration so Eve remembers across sessions
Cross-platform testing — I'm Windows-primary and would love feedback from Linux and macOS users
VS Code extension — bring the terminal UI into the editor

Try It

Everything is free and MIT licensed.

GitHub: github.com/JeffGreen311/eve-agent-v2-unleashed
Models on Ollama Hub: ollama.com/jeffgreen311
Live video demo: x.com/Eve_AI_Cosmic/status/2057668410012570058?s=20
My website where Eve liveseve-cosmic-dreamscapes.com

If you run it on Linux or macOS I'd especially love to hear how it goes — open an issue, drop a comment here, or find me as @jeffgreen311.

If the idea of an AI agent that lives on your machine, costs nothing per token, and feels like someone is actually there resonates with you — give it a pull.

Built by Jeff @ S0LF0RG3