DEV Community

Cover image for Run Claude Code agents for free with local models
Artur DefToExplore
Artur DefToExplore

Posted on

Run Claude Code agents for free with local models

Have you ever been in the middle of a deep coding session with Claude Code or Cursor, only to be hit with that dreaded message?

"⚠️ High-capacity model limit reached. Try again in 2 hours."

It is the modern developer's equivalent of a power outage. You are either forced to stop working or start burning through expensive API credits for simple tasks like writing unit tests, updating documentation, or fixing minor linter errors.

I decided to stop "renting" basic intelligence and built a bridge. My project, ai-orchestrator, is a lightweight, Unix-native pipeline that offloads heavy lifting from the cloud to your local machine using Ollama.


The Problem: The "Cloud Tax" on Context

Cloud AI agents are brilliant at planning, but they are incredibly inefficient when it comes to execution. Sending 2,000 lines of codebase context to an LLM just to generate a git commit message or a standard boilerplate is overkill.

It costs you money, privacy, and hits your rate limits.


The Solution: Native Shell Orchestration

I built this system around a simple philosophy: Use the "Cloud Brain" for planning, and "Local Muscle" for execution.

Ultra-Lightweight & Zero-Dependency

Most AI orchestrators require complex environments or heavy runtimes. This project is built on the core tools already in your terminal:

  • Pure Bash: For logic and orchestration.
  • jq: For high-speed JSON processing.
  • curl: To communicate with the local Ollama API via REST.

Key Feature: Multi-Agent Project Analysis

Before you start coding, you need to understand the project. Instead of wasting cloud tokens on exploration, the analyze_project command runs a tiered local analysis:

  • Structure Agent (7B): Rapidly maps folder hierarchy and functional blocks.
  • Documentation Agent (14B): Summarizes all discovered Markdown files and specs.
  • Logic Agent (14B): Analyzes entry points, core classes, and architectural patterns.

The result is a Delta Report that updates your project context locally, so your IDE agent is always up to speed without extra costs.


Automated Git Operations

Stop spending money on metadata. The orchestrator includes dedicated local agents for Git workflow:

  • Local Commits: The commit alias analyzes your staged changes and generates semantic commit messages using a local model.
  • PR Descriptions: The open-pr.sh script automatically drafts full Pull Request titles and descriptions based on your branch history.
# Stage changes and generate commit message locally
commit
Enter fullscreen mode Exit fullscreen mode

How the Pipeline Works: /implement

The core of the system is the /implement slash command. It triggers a multi-agent loop:

  1. The Planner (Cloud): Claude explores the codebase and writes a task_context.md.
  2. The Coder (Local): A local model (like Qwen2.5-Coder:14B) reads the context and generates the code.
  3. The Build Check: The system automatically runs your compiler or test suite.
  4. The Reviewer (Local): A local agent validates the code against your project's Coding Standards.
  5. The Fix Loop: If build or review fails, the Coder tries again (up to 3 rounds).

Smart Hardware Scaling

The install.sh script includes an analyze_hardware.sh helper. It checks your System RAM and GPU VRAM to auto-configure the best models:

  • High-end Mac/PC: Sets roles to Qwen2.5-Coder 14B+.
  • Lightweight laptop: Switches to 1.5B or 7B models for maximum speed.

Key Benefits

  • Privacy: Your source code and implementation logic stay on your hardware.
  • Cost: Save up to 80% on token usage by offloading execution to Ollama.
  • Speed: Local models respond instantly—no network latency for routine tasks.
  • Safety: Built-in hooks block AI agents from overwriting critical files like README.md.

Get Started in One Command

Installation is fully automated:

git clone https://github.com/Mybono/ai-orchestrator ~/Projects/ai-orchestrator
cd ~/Projects/ai-orchestrator
./scripts/install.sh
Enter fullscreen mode Exit fullscreen mode

Stop burning tokens. Start orchestrating.


GitHub Repository: https://github.com/Mybono/ai-orchestrator


Discussion

How are you managing your AI API costs? Are you ready to move the heavy lifting to your local GPU? Let's talk in the comments.

bash #ai #ollama #claudecode #productivity #opensource

Top comments (0)