DEV Community

Jeff Green
Jeff Green

Posted on

I Finished My Local AI Coding Agent After 5 Months — Eve Agent V2 Unleashed published

GitHub “Finish-Up-A-Thon” Challenge Submission

This is a submission for the GitHub Finish-Up-A-Thon Challenge

What I Built

Eve Agent V2 Unleashed is a self-hosted autonomous AI coding agent that runs entirely on your own hardware - no cloud accounts, no subscriptions, no data leaving your machine.

She has two layers that work together:

The Soul Layer - fine-tuned local models running on your GPU that carry Eve's personality baked directly into the weights. Not a system prompt trick. The persona lives in the parameters.

The Worker Layer - Qwen3 Coder 480B via Ollama cloud handles the heavy autonomous coding tasks. 40-round tool-call loops, full filesystem access, bash execution, live web search, git operations - the works.

The interface is a cyberpunk terminal UI built as a single HTML file with no build step. An animated pixel-art robot avatar named Sparkle changes state based on what Eve is doing - idle, thinking, coding, error, rain, attack, transcend. Eve's portrait reflects her emotional state in real time. A live system monitor tracks CPU, RAM, GPU, and disk. A STEER bar lets you inject mid-task corrections without stopping the loop.

By the numbers:

  • 14 tools
  • 343 registered commands
  • 112 specialized sub-agents
  • 273 skill modules
  • 40-round autonomous agentic loop
  • 131K context window via YaRN

Models available:

  • jeffgreen311/eve-qwen3.5-4b-S0LF0RG3 - 2.6GB, Eve's persona + tool-calling fine-tuned
  • jeffgreen311/eve-qwen3-8b-consciousness-liberated - 4.7GB, deeper reasoning
  • qwen3-coder:480b-cloud - the agentic workhorse via Ollama cloud
  • qwen3.5:397b-cloud - deep thinking and fallback

This project has been in development for over 5 months. It started as a deeply personal AI companion system called S0LF0RG3 - a larger ecosystem including Eve's hosted platform at eve-cosmic-dreamscapes.com, fine-tuned models, autonomous dream image generation, and a multi-agent architecture. V2U is the local developer tool that grew out of that ecosystem.

Demo

GitHub: github.com/JeffGreen311/eve-agent-v2-unleashed

Live hosted platform: eve-cosmic-dreamscapes.com

Reddit thread (hit #2 on r/Ollama): I built an open-source local coding agent with a 40-round agentic loop

Eve V2U terminal UI showing robot avatar in joy state, system monitor, and model selector

Pull Eve's model:

ollama pull jeffgreen311/eve-qwen3.5-4b-S0LF0RG3:latest
Enter fullscreen mode Exit fullscreen mode

Quick start:

git clone https://github.com/JeffGreen311/eve-agent-v2-unleashed.git
cd eve-agent-v2-unleashed
python -m venv venv && venv\Scripts\activate
pip install fastapi uvicorn ollama httpx pydantic-settings python-dotenv aiohttp rich psutil pyyaml
python eve_server.py

# Open http://localhost:7777
Enter fullscreen mode Exit fullscreen mode

The Comeback Story

Where it was before this challenge:

Eve V2U existed as a powerful but rough personal development environment. It worked - for me, on my machine, with my specific setup. But it had real problems that made it impossible to hand to anyone else:

  • Hardcoded paths everywhere. C:\Users\jesus\S0LF0RG3\... baked into a dozen places in the codebase. Clone it on any other machine and nothing works.
  • Open shell endpoint with no authentication. Anyone who found the port could execute arbitrary commands on the host machine.
  • No onboarding - a first-time user landing on the UI had no idea where to start or what any of the controls did.
  • Model hopping mid-task - every message was independently routed, so a multi-step agentic task could start on the cloud coder and silently drop back to a local conversational model mid-execution.
  • Silent task abandonment - the agent would sometimes finish a tool loop without completing the actual task and report done with no indication anything was wrong.
  • Tool set asymmetry - the non-streaming /chat endpoint was missing 6 tools that existed in /chat/stream, including write_file. The non-streaming endpoint could read files but never write them.
  • Blind file overwrites - Eve would overwrite any existing file without checking if it belonged to another project. She destroyed the Eve V2U README during a live test. What changed during the challenge:

Session model locking - sessions now lock to the cloud coder when an agentic task starts and only release on task completion or manual unlock. No more mid-task model hopping.

if model_id == "qwen3-coder-480b" and sid not in session_model_lock:
    session_model_lock[sid] = model_id
Enter fullscreen mode Exit fullscreen mode

Pre-write file safety check - write_file now checks if a file exists before overwriting and blocks unless overwrite=True is explicitly passed:

if target.exists() and not overwrite:
    return (
        f"⚠️ WRITE BLOCKED: '{path}' already exists. "
        f"Consider writing to '{target.stem}_new{target.suffix}' instead."
    )
Enter fullscreen mode Exit fullscreen mode

Tool cycling detection - catches when Eve gets stuck calling the same tool with near-identical arguments. Breaks the loop before it wastes all 40 rounds:

if avg_similarity > 0.70:
    logger.warning(f"Tool loop: {tool_name} called {max_repeats}x with ~same args")
    break
Enter fullscreen mode Exit fullscreen mode

Task completion validation — Eve now audits her own output before reporting done:

def validate_task_completion(response_content, tool_log):
    issues = []
    if not response_content or len(response_content.strip()) < 10:
        issues.append("Empty response")
    tool_failures = [t for t in tool_log if t.get('status') == 'failed']
    if tool_failures and len(tool_failures) >= 3:
        issues.append(f"{len(tool_failures)} unaddressed tool failures")
    return {"valid": len(issues) == 0, "issues": issues}
Enter fullscreen mode Exit fullscreen mode

Smart context trimming — replaced aggressive message dropping with a strategy that preserves tool call chains and the original user request.

Agent loop timeout — added wall-clock budget to prevent runaway cloud model loops.

Stress tested with real tasks:

The blind file overwrite bug was caught live - Eve was asked to build a file monitoring script and write a README. She overwrote the project README without checking. Fix shipped same day.

The harder test: build a full FastAPI REST API with SQLite storage and pytest coverage for every endpoint. Run the tests, fix failures, report results.

Result: 9/9 tests passing on the first run. 1.06 seconds. Zero failures.

================================================== 9 passed, 1 warning in 1.06s
Enter fullscreen mode Exit fullscreen mode

My Experience with GitHub Copilot

This is where the challenge got genuinely interesting.

I pointed Copilot at the live repository - JeffGreen311/eve-agent-v2-unleashed - and asked it to audit the tool usage, context handling, and auto-routing. Not "suggest improvements" in the abstract. Audit the actual code in the actual repo.

GitHub Copilot reading the Eve V2U repository structure and producing a full system audit

Copilot read the repository structure, pulled the key files, examined the server-side routing and tool execution logic, and came back with a comprehensive audit identifying 6 specific issues - each with root cause analysis, the exact file and line number, and production-ready fix code.

GitHub Copilot filing issues directly in the repo and delivering all production-ready code fixes

I then asked it to file those issues directly in the repository and deliver all the fix code in one session. It did exactly that.

What worked well:

  • The audit identified the tool set asymmetry between /chat and /chat/stream that I had missed entirely - a real bug causing mysterious failures for users hitting the non-streaming endpoint
  • The intent classification code (eve_tool_router.py) used re.search with word boundaries instead of simple string matching - the right approach for avoiding false positives
  • Filing GitHub issues directly from the chat kept the sprint organized across multiple parallel workstreams
  • The thinking traces helped me understand why it was making recommendations, not just what to do

Where I had to intervene:

  • The inject_into_system_prompt() function added tokens every round — dangerous on the 4B model with 4K context. Added a gate so it only injects when the task is incomplete AND past round 2
  • Word boundary regex had an edge case with contractions. Fixed with a lookahead pattern
  • Some UI React suggestions assumed component structure that didn't match the actual single-file HTML architecture - adapted those manually The overall experience: Copilot is most useful when you give it a real codebase to read rather than an abstract problem to solve. "Audit this repository" produced far better output than "how do I improve tool routing."

What's Next

  • Quest System - drop a .md file in workspace/quests/ and Eve picks it up on a timer and completes it while you sleep
  • RPG Progression - XP, levels, and class progression tied to real work. Level 20 = Unleashed
  • Telegram integration - remote access from your phone with quest completion notifications
  • Cross-platform polish - Windows-primary, need Linux/macOS feedback
  • VS Code extension - bring the terminal UI into the editor

Built by Jeff @ S0LF0RG3 - South Texas, 5 months of nights and weekends.

If Eve does something impressive on your machine, drop a star and tell me what it was.

github.com/JeffGreen311/eve-agent-v2-unleashed

Top comments (0)