DEV Community

Abhishek Tripathi
Abhishek Tripathi

Posted on

I built an AI agent runtime that routes each step to a different model

Description:

ARK is an open-source Go runtime that sends tool calls to cheap models and reasoning to expensive ones — automatically. Per-step cost tracking, persistent learning, 106 tests.

Every AI agent framework I've used does the same thing: pick one model, use it for everything. GPT-4o for a simple tool call that extracts a parameter. GPT-4o for the final reasoning step. Same price per token regardless of complexity.
That's like hiring a senior engineer to write config files.
I built ARK to fix this. It's an open-source runtime in Go that routes each step in the agent loop to the optimal model.

What it looks like

How routing works

Step Type Model Why
Tool call (extract params) Fast Simple extraction, cheap model is fine
Final reasoning/summary Strong Needs quality, worth paying for
Error recovery/retry Strong Needs to understand what went wrong
Grounding check Fast Simple validation

Configure it in one YAML block:

model:
provider: openai
strategy: cost_optimized
fast_model: gpt-4o-mini
strong_model: gpt-4o

Three strategies: single (one model, backwards compatible), cost_optimized (prefer cheap, fallback to strong), quality_first (always strong).

The router learns

If the fast model fails on a step type, ARK promotes it to the strong model next time. This persists across restarts in ark-router-learning.json.

Run 1: tool_call on gpt-4o-mini → fails
Run 1: fallback to gpt-4o → succeeds
Run 2: tool_call goes directly to gpt-4o (learned from failure)

No configuration needed. The router figures it out.

But routing is only part of it

ARK solves three problems together

1. Context waste

MCP tools dump 60,000+ tokens of tool schemas into every prompt. ARK loads only 3-5 relevant tools per task.

Raw MCP: 60,468 tokens (30.2% of context)
ARK: ~80 tokens (0.05% of context)
Savings: 99.9%

2. Cost per decision

Every step has a dollar amount. Cost feeds back into tool ranking — expensive tools that fail get demoted automatically.

3. Learning across runs

Tools that succeed get promoted. Tools that fail get demoted. Query patterns are remembered. Run 2 is smarter than Run 1.

github-list: 0.378 → 0.954 (+152.7%) after 3 runs
github-search: 0.552 → 0.419 (-24.1%) after 1 failure

The anti-hallucination gate

If tools are available but the LLM tries to answer without calling them, ARK blocks it:

Step 1: GROUNDING GATE — rejecting ungrounded answer, forcing tool use
Step 2: TOOL_CALL — github_list_repos
Step 3: COMPLETE — answer based on real data

Zero hallucinated answers across 30 stress test runs.

Connect any API

Define custom tools in agent.yaml. ARK handles domain allowlisting, parameter validation, cost tracking, and learning — automatically.
tools:

Headers support ${ENV_VAR} interpolation. Write operations are blocked by default. No code needed.

The numbers

  • 106 tests, race detector clean

  • 11 built-in tools (GitHub, Brave Search, file system)

  • 3 LLM providers (Anthropic, OpenAI, Ollama)

  • Single Go binary, zero external dependencies

  • 30-run stress test: zero crashes, zero hallucination

  • Verified cost tracking against OpenAI's billing dashboard

Try it

git clone https://github.com/atripati/ark.git
cd ark

No API keys needed for demos

go run ./cmd/ark demo
go run ./cmd/ark demo-learn
go run ./cmd/ark bench

Real task with Ollama (free)

go run ./cmd/ark run agent.yaml --task "list repos for openai"

With model routing (needs OpenAI key)

set strategy: cost_optimized in agent.yaml

go run ./cmd/ark run agent.yaml --task "find most starred repo for openai, then list its issues"

What's next

ARK is open source and actively developed. The next milestone is MCP server connector — so ARK can sit in front of any MCP server and manage its context automatically.

If you're building AI agents and hitting context waste, cost visibility, or model efficiency problems, I'd love to hear what's missing.

GitHub: github.com/atripati/ark

Top comments (0)