AI Agents write code that compiles, but they still lie to the user. Here is how to fix the pipeline

Abhishek Tripathi — Mon, 18 May 2026 01:30:37 +0000

I was testing the Ark Runtime Kernel (https://www.arkruntime.com) on a standard Go coding task: “Write a function in Go that reads CSV.”
The internal verification engine did its job flawlessly. It caught an unmatched closing brace }, spun up AutoFixGoCode, fixed it, compiled it, and passed the unit tests with a 100% score.
But there was a catch, the output block on the screen was still showing the original, broken, hallucinated code.
The culprit, a classic data plane vs. control plane synchronization bug. The validation pipeline was testing the corrected code buffer, but the output renderer was still pointing to the stale, raw LLM response.
too me 3 days to figure it out and just pushed the fix to align the variable buffers. Now, look at the execution trace below,
The Governor intercepts the task and routes the simple tool calls to gpt-4o-mini to protect the token budget.
The Verification Layer extracts the block, lints, compiles, and runs auto-generated tests.
The Data Plane reflects the perfectly auto-corrected, beautifully clean, idiomatic Go code block to the user.
ARK Memory instantly ingests the 8 new execution metrics, scaling to 20 cross-domain memories automatically.
Stop building stateless wrappers that return broken code blocks. Build runtime layers that self-heal before the user ever sees an error.

AIAgents #SoftwareEngineering #Golang #LLMs #TechInnovation #ArkRuntime

I built an AI agent runtime that routes each step to a different model

Abhishek Tripathi — Sun, 12 Apr 2026 18:49:05 +0000

Description:

ARK is an open-source Go runtime that sends tool calls to cheap models and reasoning to expensive ones — automatically. Per-step cost tracking, persistent learning, 106 tests.

Every AI agent framework I've used does the same thing: pick one model, use it for everything. GPT-4o for a simple tool call that extracts a parameter. GPT-4o for the final reasoning step. Same price per token regardless of complexity.
That's like hiring a senior engineer to write config files.
I built ARK to fix this. It's an open-source runtime in Go that routes each step in the agent loop to the optimal model.

What it looks like

How routing works

Step Type	Model	Why
Tool call (extract params)	Fast	Simple extraction, cheap model is fine
Final reasoning/summary	Strong	Needs quality, worth paying for
Error recovery/retry	Strong	Needs to understand what went wrong
Grounding check	Fast	Simple validation

Configure it in one YAML block:

model:
provider: openai
strategy: cost_optimized
fast_model: gpt-4o-mini
strong_model: gpt-4o

Three strategies: single (one model, backwards compatible), cost_optimized (prefer cheap, fallback to strong), quality_first (always strong).

The router learns

If the fast model fails on a step type, ARK promotes it to the strong model next time. This persists across restarts in ark-router-learning.json.

Run 1: tool_call on gpt-4o-mini → fails
Run 1: fallback to gpt-4o → succeeds
Run 2: tool_call goes directly to gpt-4o (learned from failure)

No configuration needed. The router figures it out.

But routing is only part of it

ARK solves three problems together

1. Context waste

MCP tools dump 60,000+ tokens of tool schemas into every prompt. ARK loads only 3-5 relevant tools per task.

Raw MCP: 60,468 tokens (30.2% of context)
ARK: ~80 tokens (0.05% of context)
Savings: 99.9%

2. Cost per decision

Every step has a dollar amount. Cost feeds back into tool ranking — expensive tools that fail get demoted automatically.

3. Learning across runs

Tools that succeed get promoted. Tools that fail get demoted. Query patterns are remembered. Run 2 is smarter than Run 1.

github-list: 0.378 → 0.954 (+152.7%) after 3 runs
github-search: 0.552 → 0.419 (-24.1%) after 1 failure

The anti-hallucination gate

If tools are available but the LLM tries to answer without calling them, ARK blocks it:

Step 1: GROUNDING GATE — rejecting ungrounded answer, forcing tool use
Step 2: TOOL_CALL — github_list_repos
Step 3: COMPLETE — answer based on real data

Zero hallucinated answers across 30 stress test runs.

Connect any API

Define custom tools in agent.yaml. ARK handles domain allowlisting, parameter validation, cost tracking, and learning — automatically.
tools:

Headers support ${ENV_VAR} interpolation. Write operations are blocked by default. No code needed.

The numbers

106 tests, race detector clean
11 built-in tools (GitHub, Brave Search, file system)
3 LLM providers (Anthropic, OpenAI, Ollama)
Single Go binary, zero external dependencies
30-run stress test: zero crashes, zero hallucination
Verified cost tracking against OpenAI's billing dashboard

Try it

git clone https://github.com/atripati/ark.git
cd ark

No API keys needed for demos

go run ./cmd/ark demo
go run ./cmd/ark demo-learn
go run ./cmd/ark bench

Real task with Ollama (free)

go run ./cmd/ark run agent.yaml --task "list repos for openai"

With model routing (needs OpenAI key)

set strategy: cost_optimized in agent.yaml

go run ./cmd/ark run agent.yaml --task "find most starred repo for openai, then list its issues"

What's next

ARK is open source and actively developed. The next milestone is MCP server connector — so ARK can sit in front of any MCP server and manage its context automatically.

If you're building AI agents and hitting context waste, cost visibility, or model efficiency problems, I'd love to hear what's missing.

GitHub: github.com/atripati/ark

DEV Community: Abhishek Tripathi