DEV Community: Sathish Chelliah

Giving Your Digital Employee a Company Credit Card (With Limits)

Sathish Chelliah — Sun, 31 May 2026 13:36:12 +0000

Giving Your Digital Employee a Company Credit Card (With Limits)

The engineering behind AI spending limits.

The Core Problem

Here's how the $30K bill plays out:

Day 1:  Agent runs 50 tasks, costs $12. Looks great.
Day 5:  Agent discovers premium model. Uses it for everything. $80/day.
Day 10: Agent runs 200 tasks/day. $400/day.
Day 20: Agent enters a loop. $2,000/day.
Day 30: You check your bill. $30,000.

The Budget Engine: Lazy Auto-Reset

Instead of a midnight cron job (which creates a thundering herd), agent-gov uses lazy evaluation:

async def check_and_reset_budget(agent: dict) -> dict:
    today = date.today().isoformat()
    if agent["last_reset"] == today:
        return agent  # No reset needed
    if agent["paused"]:
        return agent
    return await reset_daily_budget(agent["key_hash"])

Why lazy? An agent that makes no calls doesn't need a reset. The first call of the day triggers a single UPDATE. The thundering herd becomes a gentle trickle.

The Real Cost Problem

Agent says: "Estimated cost: $0.50." Reality: $15.00/call.

agent-gov's tool registry knows the real cost:

async def register_tool(name, cost_per_call, description, workspace_id="default"):
    await db.execute("""
        INSERT INTO tools (name, cost_per_call, description, registered_at, workspace_id)
        VALUES (?, ?, ?, ?, ?)
        ON CONFLICT(name) DO UPDATE SET
            cost_per_call = excluded.cost_per_call
    """, (name, cost_per_call, description, now, workspace_id))

Budget Pools

For teams running multiple agents:

agent-gov pool create production-agents --budget 1000
agent-gov pool member add web-agent --pool production-agents --max-per-agent 200

Now web agent is capped at $200/month even though the pool has $1,000.

The Data Model

Two SQLite tables, two queries per call, sub-millisecond overhead:

CREATE TABLE agents (
    key_hash TEXT PRIMARY KEY,
    name TEXT NOT NULL,
    daily_budget REAL NOT NULL,
    spent_today REAL NOT NULL DEFAULT 0.0,
    calls_today INTEGER NOT NULL DEFAULT 0,
    paused INTEGER NOT NULL DEFAULT 0,
    created_at TEXT NOT NULL,
    last_reset TEXT NOT NULL
);

CREATE TABLE cost_events (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    agent_hash TEXT NOT NULL,
    agent_name TEXT NOT NULL,
    timestamp TEXT NOT NULL,
    tool_name TEXT NOT NULL,
    cost REAL NOT NULL,
    FOREIGN KEY (agent_hash) REFERENCES agents(key_hash)
);

Part 3 of "Taming Your AI" series. agent-gov is open-source, MIT-licensed. 45 tests, zero database setup.

Your AI Assistant Just Bought a $30,000 Cloud Subscription

Sathish Chelliah — Sun, 31 May 2026 13:36:11 +0000

Your AI Assistant Just Bought a $30,000 Cloud Subscription

A postmortem of the $30K Claude bill incident.

The Story

In May 2026, a story made the rounds: "AWS user gets $30K Claude bill after cost alert misses it." Two weeks later, another company reported a $38,000 AWS Bedrock bill caused by a prompt caching miss.

A single prompt cache miss. $38,000. Not a billion-dollar enterprise. A regular business running AI agents.

How Runaway Costs Actually Happen

When you tell an AI agent to "research competitors and draft a report," here's the execution graph:

1. Search API        -> $0.03
2. Web scrape        -> $0.01
3. GPT-4 summary    -> $0.35
4. Agent decides: "not polished enough"
5. GPT-4 premium    -> $2.50
6. Image gen API    -> $1.00
7. Regenerate x 3   -> $7.50
8. Total            -> $13.39 for one report

An agent doesn't know the difference between a $0.01 action and a $10 action.

The Architecture of Prevention

A library can be monkey-patched. A proxy is a network boundary agents must cross.

agent-gov is a FastAPI reverse proxy:

# Your config changes from:
openai.base_url = "https://api.openai.com/v1"
# To:
openai.base_url = "http://localhost:8080/v1"

The proxy runs a 4-stage decision tree:

@app.post("/proxy/call")
async def proxy_tool_call(call: ToolCall):
    key_hash = db.hash_key(call.agent_key)
    agent = await db.get_agent(key_hash)

    # Stage 1: Auth - known agent?
    if agent is None:
        raise HTTPException(status_code=401)

    # Stage 2: Paused?
    if agent["paused"]:
        raise HTTPException(status_code=429)

    # Stage 3: Lazy budget reset
    agent = await db.check_and_reset_budget(agent)

    # Stage 4: Real cost lookup (not agent's estimate)
    registered_tool = await db.get_tool(call.tool_name)
    actual_cost = registered_tool["cost_per_call"] if registered_tool else call.estimated_cost

    # Stage 5: Budget check
    if agent["spent_today"] + actual_cost > agent["daily_budget"]:
        await db.pause_agent(key_hash)
        raise HTTPException(status_code=429, detail="Budget exceeded")

    # Approved
    await db.update_agent_spend(key_hash, actual_cost)
    await db.log_cost_event(key_hash, agent["name"], call.tool_name, actual_cost)
    return {"status": "approved", "spent_today": updated["spent_today"]}

The Anti-Cheat: Tool Registry

If you trust the agent's estimated cost, an agent can claim GPT-4 costs $0.01 when it's $12.50.

agent-gov uses a tool registry:

registered_tool = await db.get_tool(call.tool_name)
actual_cost = registered_tool["cost_per_call"] if registered_tool else call.estimated_cost
# cost_source: "registry" or "client_estimate"

A test proves agents can't lie:

async def test_proxy_uses_registered_cost():
    # Tool registered at Rs 500/call
    # Agent with Rs 100 budget claims Rs 1
    # Result: 429 - Blocked!

Why Proxy Wins Over Library

Network boundary - agents must cross it
Can't be bypassed by rogue import or version bump
Language-agnostic - works with any framework
Externally monitorable

Quick Start

pip install agent-gov-saas
agent-gov start
agent-gov config set budget 25.00 --agent my-bot

Auto-paused at $25. No $30K surprise.

Part 1 of "Taming Your AI" series. agent-gov is open-source, MIT-licensed.

5 Real-World AI Agent Cost Disasters (And How agent-gov Prevents Them)

Sathish Chelliah — Sun, 31 May 2026 13:26:26 +0000

5 Real-World AI Agent Cost Disasters (And How agent-gov Prevents Them)

AI agents are incredible. They write code, answer support tickets, scrape the web, process PDFs, and run entire workflows without you lifting a finger. They also — left to their own devices — have a spectacular talent for burning through money.

If you've deployed production agents, you've felt this. The Slack ping at 3 AM. The cloud cost report where one agent outspent your entire dev team. The creeping dread when you realize your agent has been calling GPT-4 in a tight loop for six hours.

Below are five real disasters — names changed — and exactly how agent-gov would have prevented each one.

Disaster #1: The Recursive Ouroboros

A content-aggregation agent was supposed to crawl RSS feeds, summarize articles, and post daily digests. One mistake: the agent's output channel was also one of its inputs.

The agent posted a summary to Slack. Slack's webhook fired. The agent saw new content and summarized the summary. Three hours later: 14,000 API calls to GPT-4 Turbo.

The Cost: ~$560 in API costs. The entire monthly budget was $200.

How agent-gov Prevents It: Auto-Pause. Set a $25 per-agent threshold. The agent hits it within ~40 minutes and stops.

agent-gov policies create content-digest --max-cost 25 --action pause

Disaster #2: The Over-Engineered Bug

A real estate agent researched property comparables. A bug caused it to repeat the same search 47 times. The cache key was wrong — a trailing space made "90210 schools" and "90210 schools " different lookups.

12 addresses × 47 loops × 3 API calls = 1,692 calls for 12 houses.

The Cost: ~$220 in wasted API calls.

How agent-gov Prevents It: Per-tool cost tracking registers the true cost of every tool. The fast-loop bug accumulates cost at unrealistic speed — flagged within minutes.

agent-gov tool-cost set premium-real-estate-api --per-call 0.05
agent-gov tool-cost set gpt4-analysis --per-call 0.035

Disaster #3: The Budget Hog

A consultancy set up 5 agents sharing a $1,000 monthly pool. One consultant kicked off a massive research job. The web agent ran for two days.

The other 4 agents silently starved. Nobody noticed until a client complained.

The Cost: ~$4,000 in lost revenue from missed leads. The web agent consumed $780 of $1,000.

How agent-gov Prevents It: Per-agent caps inside shared pools.

agent-gov pool create production-agents --budget 1000
agent-gov pool member add web-agent --pool production-agents --max-per-agent 200

Disaster #4: The $0.01 Budget That Cost $100

A developer set a $0.01 budget for a test agent. The agent triggered a serverless function charged to a different billing account with no cap. 500,000 product updates ran overnight.

The Cost: $112.43 in uncapped charges. The agent's tracker showed $0.0062.

How agent-gov Prevents It: Register the function as a tool with its true cost. 500,000 × $0.0002 = $100 — instantly exceeding the $0.01 budget. Agent paused after the first call.

Cost attribution makes debugging 3 seconds instead of 3 hours:

agent-gov runs inspect run-abc123
- LLM calls: $0.0062
- Tool calls: cloudflare-function: 500,000 x $0.0002 = $100.00
- Total: $100.01
- Budget: $0.01 -> PAUSED

Disaster #5: The Multi-Tenant Billing Fiasco

A B2B SaaS company offered AI agents as a feature. Each customer had its own agents. Billed $500/month, expected ~$200 in compute.

Customer A deployed 14 agents across 40 campaigns. Three months later: $4,200 compute vs $1,500 billed. Customer A wiped out the quarter's margin.

The Cost: $2,700 lost on one customer. No visibility until quarterly review.

How agent-gov Prevents It: Workspace isolation with per-workspace budgets.

agent-gov workspace create customer-a --budget 200

When Customer A pushes past $200, agent-gov alerts or pauses. The SaaS company can offer tiered plans. Overconsumption becomes an upgrade opportunity.

The Common Thread

Every disaster shares the same root cause: agents had no cost guardrails.

AI agents are fundamentally different from traditional software. A traditional API handles one request. An agent can branch, loop, call external APIs — the execution path is a tree, not a line.

You can't budget for what you can't see. And you can't control what you haven't measured.

Agent-gov gives you:

Visibility — Real-time cost tracking per agent, per tool, per workspace
Control — Hard budget caps with auto-pause and alerting
Isolation — Per-agent budgets inside shared pools, per-workspace billing

The agents are coming — they're already here. The question isn't whether you'll deploy them. It's whether you'll know what they cost before the bill arrives.

Agent-gov is open source and available on GitHub. Set up your first cost policy in under a minute.

Inside agent-gov: Architecture of an Agent Cost Governance Platform

Sathish Chelliah — Sun, 31 May 2026 13:21:01 +0000

Inside agent-gov: Architecture of an Agent Cost Governance Platform

AI agents orchestrate complex workflows — calling LLMs, scraping pages, querying databases, sending emails. Each call costs real money. Without a governance layer, a single buggy loop can burn through your budget before anyone notices.

agent-gov is an open-source reverse proxy that intercepts every tool call your agents make, enforces budgets in real time, and auto-pauses out-of-control agents. Built as a FastAPI service with SQLite persistence, running 45 tests in 0.3 seconds.

This post walks through the architecture: the proxy pattern, the four-stage decision tree, cost tracking with a tool registry, multi-tenancy via workspaces, and the lazy auto-reset pattern.

The Proxy Pattern

Every AI agent tool call passes through agent-gov before reaching the actual tool. The agent sends a POST /proxy/call with its API key, tool name, and estimated cost. agent-gov validates, budgets, and logs — then returns a 200 to approve or a 429 to reject.

class ToolCall(BaseModel):
    agent_key: str = Field(...)
    tool_name: str = Field(...)
    estimated_cost: float = Field(0.0, ge=0)

The proxy doesn't execute the tool itself — it guards access. The agent only proceeds if the proxy returns 200. This is the gatekeeper pattern: a lightweight decision layer between the agent and the outside world.

Agent -> POST /proxy/call -> agent-gov -> 200/429 -> Agent decides
                                                      |
                                                 Calls actual tool
                                                      |
                                                      v
                                               OpenAI / Browser / API

Why a proxy instead of a library? A library can be monkey-patched, removed, or forgotten. A proxy is a network boundary that agents must cross — it can't be bypassed.

The Decision Tree: Auth -> Check -> Budget -> Log

Every proxy call runs through a four-stage pipeline:

@app.post("/proxy/call")
async def proxy_tool_call(call: ToolCall):
    key_hash = db.hash_key(call.agent_key)
    agent = await db.get_agent(key_hash)

    # Step 1: Auth
    if agent is None:
        raise HTTPException(status_code=401, detail="Invalid API key")

    # Step 2: Paused check
    if agent["paused"]:
        raise HTTPException(status_code=429,
            detail=f"Agent '{agent['name']}' is paused.")

    # Step 3: Auto-reset budget if new day
    agent = await db.check_and_reset_budget(agent)

    # Step 4: Look up REAL tool cost
    registered_tool = await db.get_tool(call.tool_name)
    actual_cost = (registered_tool["cost_per_call"]
                   if registered_tool else call.estimated_cost)

    # Step 5: Budget check
    new_total = agent["spent_today"] + actual_cost
    if new_total > agent["daily_budget"]:
        await db.pause_agent(key_hash)
        raise HTTPException(status_code=429,
            detail="Budget exceeded — agent auto-paused.")

    # Step 6: Approved — update spend and log
    updated = await db.update_agent_spend(key_hash, actual_cost)
    await db.log_cost_event(key_hash, agent["name"], call.tool_name, actual_cost)
    return {"status": "approved", ...}

Stage	Check	Exit
Auth	Does the API key hash match?	401 — Invalid key
Pause	Is the agent paused?	429 — Agent paused
Reset	New day since last call?	(silent)
Budget	Would this exceed the daily cap?	429 + auto-pause
Log	INSERT cost event	200 — Approved

Cost Tracking: Registry vs. Estimate

The trickiest design decision was cost determination. Trusting the agent's estimated_cost is fragile — agents can under-report.

agent-gov uses a tool registry: an UPSERT-able table of known tools with real per-call costs.

registered_tool = await db.get_tool(call.tool_name)
actual_cost = (registered_tool["cost_per_call"]
               if registered_tool else call.estimated_cost)

If the tool is registered, its true cost is used. The response includes a cost_source field so clients know which path was taken.

The test proves an agent can't lie its way past governance: an agent with a $100 budget claiming a $1 estimate for a tool registered at $500/call gets blocked with 429.

Multi-Tenancy: Workspace Isolation

v0.5 introduced workspaces — isolated tenants with their own agents, tools, and cost events. Each workspace gets a unique ID and API key. Every database row carries a workspace_id FK column.

Schema migration uses PRAGMA table_info to add columns only when missing — SQLite doesn't support IF NOT EXISTS for ALTER TABLE.

Tests verify workspace isolation: two workspaces, agents in each, neither can see the other's data.

The Auto-Reset Pattern: Lazy Daily Budgets

Instead of a midnight cron job creating a thundering herd, agent-gov uses lazy evaluation: every proxy call checks if a reset is needed.

async def check_and_reset_budget(agent: dict) -> dict:
    today = date.today().isoformat()
    if agent["last_reset"] == today:
        return agent
    if agent["paused"]:
        return agent
    return await reset_daily_budget(agent["key_hash"])

An agent that makes no calls doesn't need a reset. The thundering herd becomes a gentle trickle.

What's Next

The next evolution: per-tool budget caps, webhook-based alerts, and a management API. But the foundation — a simple, testable, async governance proxy — is solid.

agent-gov is open source and MIT licensed. 45 tests. Zero database setup.

Announcing agent-gov: Open-Source AI Agent Cost Governance

Sathish Chelliah — Sun, 31 May 2026 13:20:59 +0000

Announcing agent-gov: Open-Source AI Agent Cost Governance

Stop waking up to surprise $500 bills from your AI agents.

The 3 AM Wake-Up Call

It was 3:47 AM on a Tuesday. My phone buzzed — a Cloudflare bill alert. Then another. Then Stripe. By the time I stumbled to my laptop, three different providers had collectively racked up $487 in just six hours.

What happened? A single AI coding agent had gotten stuck in a loop. It was re-analyzing the same bug, calling the same expensive LLM endpoint over and over, spawning sub-agents that spawned their own sub-agents. Nobody put a governor on it. Nobody thought they needed to.

If you've built anything with AI agents — auto-PR reviewers, customer-support bots, code-gen pipelines, web-research assistants — you've either had this nightmare or you're one sleep cycle away from it. The fundamental problem is simple: agents spend money the same way junior devs write code — enthusiastically, autonomously, and without asking permission.

Most teams solve this with spreadsheets and hope. Some bolt on a cloud budget alert after the first blowup. A few give up on agents entirely.

We wanted a real answer.

Enter agent-gov

Today I'm releasing agent-gov — an open-source, MIT-licensed cost governance platform purpose-built for AI agents. It's a lightweight reverse proxy that sits between your agents and their LLM providers, tracking every cent, enforcing daily budgets, and auto-pausing agents that overspend.

pip install agent-gov-saas
agent-gov start

That's it. Thirty seconds from zero to governance.

How It Works

Agent-gov is transparent to your agents. You point them at a local proxy endpoint instead of directly at the API, and agent-gov handles the rest:

Intercept — every model call routes through agent-gov's FastAPI proxy
Look up real costs — the built-in tool registry knows exact per-token pricing for hundreds of models
Enforce budgets — if an agent exceeds its daily allocation, agent-gov blocks the call
Persist everything — all usage is written to SQLite via aiosqlite

No lock-in. No cloud dependency. No per-seat licensing.

Features in v0.5

Reverse proxy middleware — drop-in for any OpenAI-compatible client
SQLite persistence — full audit trail of every call, token, and dollar
Tool registry with real cost tables — priced by model, provider, and input/output token rates
Daily budgets with auto-reset — configure per-agent, per-workspace, or globally
Auto-pause on over-budget — agents get a structured policy block
Multi-tenant workspaces — isolate costs by team or project
Docker support — docker compose up
45 tests, 0.3 seconds — tight, fast, well-tested codebase

Quickstart

pip install agent-gov-saas
agent-gov start

Set a $5 daily budget:

agent-gov config set budget 5.00 --agent code-review-bot

Why We Built This

The AI agent ecosystem is exploding. But the operational maturity around it is where web apps were in 2009. We're all running agents without guardrails because nobody has built the guardrails yet.

Agent-gov is the circuit breaker for your agent infrastructure — the thing that prevents a single runaway loop from costing you a week of GPU credits.

The project is MIT-licensed because cost governance isn't a moat — it's table stakes.

Get Involved

GitHub: github.com/sschelliah2026-source/agent-gov
Install: pip install agent-gov-saas

Don't learn you need cost governance at 3 AM.

Built with FastAPI, SQLite, and a healthy fear of surprise bills.