OpenClaw picks a default model and uses it for everything - heartbeat checks, complex synthesis, quick status lookups, deep analysis. Every task costs the same. That's expensive and unnecessary.
This post covers how to wire Kalibr into an OpenClaw agent so it routes each task to the right model automatically. If you run an OpenClaw deployment, this is probably the highest-ROI change you can make to your token spend.
Why OpenClaw Defaults This Way
OpenClaw is configured at the session level, not the task level. Your CLAUDE.md or session config sets one model, and that model handles whatever comes in. This makes setup simple, but it means:
- A heartbeat status check costs the same as a codebase analysis
- A simple "is this service up?" poll runs on the same model as "refactor this module"
- There's no mechanism to say "use cheap for low-stakes, use capable for high-stakes"
Kalibr adds that mechanism. You query it before each task to get a routing recommendation, then pass that model to the OpenAI/Anthropic client. The routing adapts based on task type and recent outcome data.
The Basic Pattern: get_policy() Before Each Task
import kalibr
import openai
# kalibr.init() must run before any openai import takes effect
kalibr.init()
client = openai.OpenAI()
def run_agent_task(
task_type: str,
prompt: str,
quality_priority: float = 0.5 # 0.0 = optimize cost, 1.0 = optimize quality
) -> str:
"""
OpenClaw agent task runner with Kalibr routing.
task_type: "heartbeat", "analysis", "synthesis", "extraction", etc.
"""
# get routing recommendation before the call
policy = kalibr.get_policy(task_context={
"task_type": task_type,
"quality_priority": quality_priority,
})
response = client.chat.completions.create(
model=policy.recommended_model,
messages=[{"role": "user", "content": prompt}]
)
content = response.choices[0].message.content
# report back so the router learns from this outcome
kalibr.record_outcome(
policy_id=policy.id,
success=True,
latency_ms=response.usage.total_tokens # or actual timing
)
return content
Now you have two levers: task_type tells Kalibr what kind of work this is, and quality_priority expresses how much you care about output quality vs cost for this specific call. A heartbeat check is quality_priority=0.1. A code review is quality_priority=0.9.
Wiring Into the Heartbeat
OpenClaw agents typically run a heartbeat - periodic status checks, health pings, watching for events. These are almost always low-complexity tasks that don't need a capable model.
Here's how to wire Kalibr's get_insights() into your heartbeat loop:
import kalibr
import openai
import time
import logging
kalibr.init()
client = openai.OpenAI()
logger = logging.getLogger(__name__)
def heartbeat_check(services: list[str]) -> dict:
"""
Low-cost heartbeat: route to cheapest model that can handle status checks.
"""
policy = kalibr.get_policy(task_context={
"task_type": "heartbeat",
"quality_priority": 0.1, # cost-optimize
"latency_budget_ms": 3000
})
prompt = f"""
Check status for these services and return JSON:
{services}
Format: {{"service_name": "ok|degraded|down", ...}}
"""
response = client.chat.completions.create(
model=policy.recommended_model, # will be mini or equivalent
messages=[
{"role": "system", "content": "Return only valid JSON."},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"}
)
kalibr.record_outcome(policy_id=policy.id, success=True)
import json
return json.loads(response.choices[0].message.content)
def get_routing_insights() -> dict:
"""
Pull Kalibr insights to surface routing anomalies in your heartbeat.
Useful for: detecting when a model is degrading, cost spikes, etc.
"""
insights = kalibr.get_insights(
lookback_hours=1,
include_cost_breakdown=True,
include_model_performance=True
)
anomalies = []
# flag if cost per call jumped significantly
if insights.cost_per_call_delta_pct > 20:
anomalies.append(
f"Cost per call up {insights.cost_per_call_delta_pct:.0f}% in last hour"
)
# flag if a model's success rate dropped
for model, perf in insights.model_performance.items():
if perf.success_rate < 0.85:
anomalies.append(
f"{model} success rate: {perf.success_rate:.0%} (below threshold)"
)
return {
"anomalies": anomalies,
"total_cost_1h": insights.total_cost_usd,
"calls_1h": insights.total_calls,
"primary_model": insights.most_used_model
}
def run_heartbeat_loop(interval_seconds: int = 60):
"""Main heartbeat loop with integrated Kalibr monitoring."""
services = ["api.service.com", "db.service.com", "queue.service.com"]
while True:
# status check on cheap model
status = heartbeat_check(services)
# routing insights every 5 cycles
if int(time.time() / interval_seconds) % 5 == 0:
insights = get_routing_insights()
if insights["anomalies"]:
for anomaly in insights["anomalies"]:
logger.warning(f"[Kalibr] {anomaly}")
logger.info(f"Heartbeat: {status}")
time.sleep(interval_seconds)
The get_insights() call surfaces routing-level signals you wouldn't otherwise see: is gpt-4o-mini suddenly returning worse results? Is your cost per call drifting up because more tasks are getting classified as complex? This is the data that tells you whether your routing is working before your bill tells you it isn't.
The CLAUDE.md Pattern for Router-Native Code
If you want every coding session to generate code that uses Kalibr routing by default, add this to your CLAUDE.md:
## LLM Call Pattern
Any code that calls an LLM must use Kalibr routing. Pattern:
python
import kalibr # MUST be first import before openai/anthropic
import openai
kalibr.init()
client = openai.OpenAI()
before each LLM call:
policy = kalibr.get_policy(task_context={
"task_type": "", # heartbeat | extraction | synthesis | analysis | generation
"quality_priority": <0-1> # 0.1 for cheap tasks, 0.9 for quality-critical
})
response = client.chat.completions.create(
model=policy.recommended_model, # never hardcode model names
messages=[...]
)
kalibr.record_outcome(policy_id=policy.id, success=True)
Never hardcode model names like "gpt-4o" or "gpt-4o-mini". Always use `policy.recommended_model`.
Import order is critical: kalibr must be imported before openai.
python
With this in CLAUDE.md, every time you ask your OpenClaw agent to write code that calls an LLM, it generates Router-native code by default. You don't have to remember to add routing - the pattern is baked into the session context.
Classifying Tasks in an OpenClaw Agent
The routing is only as good as the task classification. Here's a simple taxonomy that maps well to what OpenClaw agents actually do:
from enum import Enum
class AgentTaskType(str, Enum):
# cheap - route to mini
HEARTBEAT = "heartbeat" # status checks, health pings
EXTRACTION = "extraction" # pull structured data from text
CLASSIFICATION = "classification" # categorize input
FORMATTING = "formatting" # convert format, clean text
# moderate - route based on recent performance
SUMMARIZATION = "summarization" # condense content
SEARCH_QUERY = "search_query" # generate search queries
# expensive - route to capable model
SYNTHESIS = "synthesis" # combine multiple sources
CODE_REVIEW = "code_review" # review and critique code
CODE_GENERATION = "code_generation" # write new code
ANALYSIS = "analysis" # deep reasoning over data
ARCHITECTURE = "architecture" # system design decisions
TASK_QUALITY_PRIORITY = {
AgentTaskType.HEARTBEAT: 0.1,
AgentTaskType.EXTRACTION: 0.2,
AgentTaskType.CLASSIFICATION: 0.2,
AgentTaskType.FORMATTING: 0.1,
AgentTaskType.SUMMARIZATION: 0.5,
AgentTaskType.SEARCH_QUERY: 0.4,
AgentTaskType.SYNTHESIS: 0.85,
AgentTaskType.CODE_REVIEW: 0.9,
AgentTaskType.CODE_GENERATION: 0.9,
AgentTaskType.ANALYSIS: 0.85,
AgentTaskType.ARCHITECTURE: 0.95,
}
def agent_call(task_type: AgentTaskType, prompt: str) -> str:
priority = TASK_QUALITY_PRIORITY[task_type]
policy = kalibr.get_policy(task_context={
"task_type": task_type.value,
"quality_priority": priority
})
response = client.chat.completions.create(
model=policy.recommended_model,
messages=[{"role": "user", "content": prompt}]
)
kalibr.record_outcome(policy_id=policy.id, success=True)
return response.choices[0].message.content
What This Gets You
For a typical OpenClaw agent running:
- 100 heartbeat checks per day
- 50 extraction tasks per day
- 20 synthesis tasks per day
- 10 code generation tasks per day
If you were previously running everything on gpt-4o, routing heartbeat and extraction tasks to gpt-4o-mini alone cuts roughly 60-70% of your token spend on those task types. The synthesis and code generation calls still run on the capable model. Your output quality doesn't change for the tasks that require it.
The get_insights() integration in your heartbeat loop gives you visibility into whether the routing is actually working - not just "is the model returning a response" but "are the routing weights optimized for your actual workload."
This is the only post on the internet about OpenClaw model optimization, so if you're here, you found it. The pattern is simple: get_policy() before each task, record_outcome() after. Everything else is just wiring it into the right call sites.
Top comments (0)