Description:
ARK is an open-source Go runtime that sends tool calls to cheap models and reasoning to expensive ones — automatically. Per-step cost tracking, persistent learning, 106 tests.
Every AI agent framework I've used does the same thing: pick one model, use it for everything. GPT-4o for a simple tool call that extracts a parameter. GPT-4o for the final reasoning step. Same price per token regardless of complexity.
That's like hiring a senior engineer to write config files.
I built ARK to fix this. It's an open-source runtime in Go that routes each step in the agent loop to the optimal model.
What it looks like
How routing works
| Step Type | Model | Why |
|---|---|---|
| Tool call (extract params) | Fast | Simple extraction, cheap model is fine |
| Final reasoning/summary | Strong | Needs quality, worth paying for |
| Error recovery/retry | Strong | Needs to understand what went wrong |
| Grounding check | Fast | Simple validation |
Configure it in one YAML block:
model:
provider: openai
strategy: cost_optimized
fast_model: gpt-4o-mini
strong_model: gpt-4o
Three strategies: single (one model, backwards compatible), cost_optimized (prefer cheap, fallback to strong), quality_first (always strong).
The router learns
If the fast model fails on a step type, ARK promotes it to the strong model next time. This persists across restarts in ark-router-learning.json.
Run 1: tool_call on gpt-4o-mini → fails
Run 1: fallback to gpt-4o → succeeds
Run 2: tool_call goes directly to gpt-4o (learned from failure)
No configuration needed. The router figures it out.
But routing is only part of it
ARK solves three problems together
1. Context waste
MCP tools dump 60,000+ tokens of tool schemas into every prompt. ARK loads only 3-5 relevant tools per task.
Raw MCP: 60,468 tokens (30.2% of context)
ARK: ~80 tokens (0.05% of context)
Savings: 99.9%
2. Cost per decision
Every step has a dollar amount. Cost feeds back into tool ranking — expensive tools that fail get demoted automatically.
3. Learning across runs
Tools that succeed get promoted. Tools that fail get demoted. Query patterns are remembered. Run 2 is smarter than Run 1.
github-list: 0.378 → 0.954 (+152.7%) after 3 runs
github-search: 0.552 → 0.419 (-24.1%) after 1 failure
The anti-hallucination gate
If tools are available but the LLM tries to answer without calling them, ARK blocks it:
Step 1: GROUNDING GATE — rejecting ungrounded answer, forcing tool use
Step 2: TOOL_CALL — github_list_repos
Step 3: COMPLETE — answer based on real data
Zero hallucinated answers across 30 stress test runs.
Connect any API
Define custom tools in agent.yaml. ARK handles domain allowlisting, parameter validation, cost tracking, and learning — automatically.
tools:
Headers support ${ENV_VAR} interpolation. Write operations are blocked by default. No code needed.
The numbers
106 tests, race detector clean
11 built-in tools (GitHub, Brave Search, file system)
3 LLM providers (Anthropic, OpenAI, Ollama)
Single Go binary, zero external dependencies
30-run stress test: zero crashes, zero hallucination
Verified cost tracking against OpenAI's billing dashboard
Try it
git clone https://github.com/atripati/ark.git
cd ark
No API keys needed for demos
go run ./cmd/ark demo
go run ./cmd/ark demo-learn
go run ./cmd/ark bench
Real task with Ollama (free)
go run ./cmd/ark run agent.yaml --task "list repos for openai"
With model routing (needs OpenAI key)
set strategy: cost_optimized in agent.yaml
go run ./cmd/ark run agent.yaml --task "find most starred repo for openai, then list its issues"
What's next
ARK is open source and actively developed. The next milestone is MCP server connector — so ARK can sit in front of any MCP server and manage its context automatically.
If you're building AI agents and hitting context waste, cost visibility, or model efficiency problems, I'd love to hear what's missing.
GitHub: github.com/atripati/ark



Top comments (0)