DEV Community

Cover image for Cutting my AI spend to zero with an open-source Claude Code alternative
Ask Solutions
Ask Solutions

Posted on

Cutting my AI spend to zero with an open-source Claude Code alternative

I pay AUD$155/month for Claude Max. I have a MacBook Pro that runs large models fine. Two things bugged me about Claude Code:

  1. Even though Max was paid for, the API billed separately when I wired in third-party tools.
  2. My laptop sat idle while every refactor went to a remote API.

So I built OpenAgent. Terminal coding agent, 12+ providers, direct Max-subscription support, runs local models without a key.

GitHub logo ask-sol / openagent

Open-source agentic coding CLI for your terminal. Multi-provider (OpenAI, Anthropic, Gemini, Mistral, Groq, DeepSeek, xAI, Ollama, OpenRouter), token-efficient, with web search, MCP server support, local session resume, and built-in Reddit/X posting.

OpenAgent

The open-source Claude Code alternative that works with any AI provider.
Use your existing Claude Max subscription, OpenRouter, GPT-5, Gemini, Ollama, or any of 12 providers

Stars Version License

GitHub stars Total clones Unique users

Tracking since 2026-04-19 • 1,580 clones and 471 unique users in the last 14 days • updated 2026-04-25

OpenAgent Demo

InstallWhy OpenAgentProvidersFeaturesCommandsContributing


Why OpenAgent?

Already paying for Claude Max? OpenAgent lets you use your existing subscription directly — no separate API key, no extra cost. Just log in and code.

Want provider freedom? Switch between GPT-5, Claude, Gemini, Grok, DeepSeek, or local models with one command. No lock-in.

Want something open? OpenAgent is Apache 2.0 licensed. Fork it, extend it, self-host it.

OpenAgent vs Claude Code






















OpenAgent Claude Code
Providers 12+ (OpenAI, Anthropic, Gemini, Groq, Mistral, DeepSeek, xAI, Bedrock, Alibaba, Ollama, OpenRouter) Anthropic only
Use Max/Pro subscription ✅ No API key needed ✅ Built-in
Run





How the Max plan works

Anthropic ships a claude CLI that uses your subscription session. OpenAgent spawns it with --output-format stream-json and parses the result:

const child = spawn("claude", [
  "-p", prompt,
  "--model", modelAlias,
  "--output-format", "stream-json",
  "--verbose",
]);
Enter fullscreen mode Exit fullscreen mode

Each assistant event has cumulative token usage. The final result event has the real total_cost_usd from billing. No proxy, no OAuth dance, no scraped tokens. The CLI was always there.

Local models on Apple Silicon

Three local runtimes wired in: Ollama, LM Studio, MLX. OpenAgent installs them for you.

If you have an M5 Mac, Ollama crashes with llama runner process has terminated: %!w(<nil>). That's an upstream bug (PR #15581, unmerged). OpenAgent's MLX provider skips Ollama and talks to mlx_lm.server directly, which doesn't have the bug.

Pick MLX in setup, it runs pip install mlx-lm, downloads Gemma 4 E4B (3 GB), starts the server. Two minutes, $0.

Live cost tracking that isn't fake

Most agents estimate tokens by counting response characters and dividing by 4. That undercounts by ~60% because file contents and tool results never make it in.

OpenAgent reads the real usage field from each stream event, computes deltas, and reconciles against total_cost_usd at message end:

const deltaIn = totalInputTokens - lastEmittedInput;
const deltaOut = totalOutputTokens - lastEmittedOutput;
if (deltaIn > 0 || deltaOut > 0) {
  const deltaCost = (deltaIn * rate.in + deltaOut * rate.out) / 1_000_000;
  yield {
    type: "done",
    usage: { inputTokens: deltaIn, outputTokens: deltaOut, costUsd: deltaCost },
  };
}
Enter fullscreen mode Exit fullscreen mode

The number ticks up live and matches Anthropic's billing to four decimal places.

Token efficiency

Two-layer concise mode:

  1. The system prompt strips filler ("no thank-yous, no flattery, no recap") and bans decorative markdown.
  2. Streamed text is filtered client-side to drop <persisted-output> blocks and other internal markers before they ever reach your terminal.

About 30% fewer output tokens vs. an unfiltered session on the same Next.js refactor. Same code quality, less spend on conversational glue.

Install

macOS:

brew install ask-sol/openagent/openagent
Enter fullscreen mode Exit fullscreen mode

Linux:

curl -fsSL https://raw.githubusercontent.com/ask-sol/openagent/main/scripts/install-remote.sh | bash
Enter fullscreen mode Exit fullscreen mode

Repo: github.com/ask-sol/openagent
Docs: ask-sol.github.io/openagent

Apache 2.0. Issues and PRs welcome.

What's your setup?

I'm curious, how are you all handling AI costs for local development? Are you sticking with hosted APIs, or have you made the jump to local models on your workstation?

If you try OpenAgent, let me know if you run into any issues or have ideas for providers I should add next!

Top comments (0)