Nilofer 🚀

Posted on Jun 4

CostGuard: A Real-Time Circuit Breaker That Stops AI Spend Before It Gets Out of Control

#fastapi #python #machinelearning #opensource

AI API costs can spiral faster than anyone expects. A runaway loop, a misconfigured batch job, or a forgotten test that fires thousands of requests - by the time you see the bill, the damage is done.

CostGuard is a production-ready local proxy that enforces hard spending limits before AI API requests are sent. It sits between your applications and AI providers - OpenAI, Anthropic, and OpenRouter calculating the cost of every request before it goes out, and blocking it if any limit would be exceeded. Per-session, per-hour, per-day, and per-project circuit breakers, a real-time terminal dashboard, and multi-channel alerts, all running locally with no data leaving your machine.

Features

Hard Circuit Breakers - Per-session, per-hour, per-day, and per-project spending limits.
Real-Time Cost Estimation - Pre-call cost calculation using tiktoken before the request is sent.
Safe Mode - Require explicit confirmation for expensive requests above a configurable threshold.
Real-Time Dashboard - Terminal-based dashboard with WebSocket updates.
Multi-Channel Alerts - Console, webhook, and file-based alerting.
OpenAI-Compatible API - Drop-in replacement for the OpenAI SDK.
Local SQLite - All data stays on your machine.
Async Architecture - High-performance concurrent request handling.

Architecture

Client SDKs hit the OpenAI-compatible FastAPI proxy. The cost estimator pre-prices the request, then the circuit breaker evaluates limits in order: session, then hour, then day, then project. Allowed traffic forwards to the provider. Tripped limits return a 429 and fire alerts. Spend and pricing data live in local SQLite, and the terminal dashboard streams over WebSocket.

Quick Start

Installation

Clone the repository, create and activate a virtual environment, and install:

pip install -e ".[dev]"

Configuration

cp .env.example .env

Add at least one provider API key and any optional budget overrides - session, hour, day, project, or safe-mode thresholds.

Running the Server

costguard server

Or with uvicorn directly:

uvicorn costguard.server:create_app --factory --reload --port 8000

Using the Proxy

Point your OpenAI SDK at http://localhost:8000/v1, keep the provider API key in the client, and send the usual chat-completions request with session and project headers.

Real-Time Dashboard

costguard dashboard

Run this in a separate terminal. Set COSTGUARD_SESSION_ID=my-session before launching to scope the dashboard to a specific session.

API Endpoints

OpenAI-Compatible Endpoints

POST /v1/chat/completions - chat completions with cost tracking
GET /v1/models - list available models with pricing

CostGuard-Specific Endpoints

POST /v1/estimate - get cost estimate without making a request
GET /v1/status/{session_id} - get circuit breaker status
POST /v1/safe-mode/confirm - confirm a paused safe mode request
GET /health - health check

WebSocket

WS /v1/dashboard/ws - real-time dashboard updates

Circuit Breaker Behavior

Limits are evaluated in this deterministic order:

Session Limit - most restrictive, resets on new session
Hour Limit - rolling 1-hour window
Day Limit - resets at midnight UTC
Project Limit - least restrictive, tracks all-time project spend

When any limit is exceeded, the request is blocked with a structured error, an alert fires immediately, the circuit breaker status changes to OPEN, and subsequent requests are blocked until the limit resets.

Safe Mode

When a request's estimated cost exceeds COSTGUARD_SAFE_MODE_THRESHOLD, the request is paused and an alert is sent to configured channels. Confirm the request with POST /v1/safe-mode/confirm - the original request proceeds if confirmed.

Configuration Reference

Development

Running Tests

pytest                                        # Full suite
pytest --cov=costguard --cov-report=html      # With coverage
pytest tests/test_circuit_breaker.py          # Focused run

Code Quality

ruff format src tests                                              # Formatting
ruff check src tests                                               # Linting
mypy src/costguard                                                 # Type checking
ruff format --check src tests && ruff check src tests && mypy src/costguard  # Full gate

How I Built This Using NEO

This project was built using NEO. NEO is a fully autonomous AI engineering agent that can write code and build solutions for AI/ML tasks including AI model evals, prompt optimization and end to end AI pipeline development.

The requirement was a local circuit-breaker proxy for AI spend control - one that estimates request cost before sending it, enforces session, hour, day, and project limits, supports safe mode for expensive requests, and exposes an OpenAI-compatible API so existing SDKs work without changes. NEO built the full implementation: the FastAPI proxy server with OpenAI-compatible endpoints, the tiktoken-based pre-call cost estimator, the circuit breaker with four limit tiers evaluated in deterministic order, the safe mode flow with confirmation endpoint, the multi-channel alert system covering console, webhook, and file, the terminal dashboard streaming over WebSocket, the local SQLite persistence layer, the pricing tables for OpenAI, Anthropic, and OpenRouter, and the full test suite.

How You Can Use and Extend This With NEO

Use it to protect any AI application from runaway costs.
Point your OpenAI SDK at http://localhost:8000/v1. Every request is pre-priced and checked against your configured limits before it leaves your machine. A misconfigured loop or an unexpected spike in usage trips the circuit breaker and fires an alert before the billing damage reaches your provider.

Use safe mode for high-stakes production requests.
Set COSTGUARD_SAFE_MODE_THRESHOLD to the cost above which you want human confirmation. Expensive requests are paused and alerted before proceeding. This is particularly useful for batch jobs or agent workflows where a single request can be unexpectedly large.

Use the estimate endpoint to build cost-aware UIs.
POST /v1/estimate returns the cost of a request without sending it. This lets you show users the expected cost of a query before they submit it or build dashboards that surface real-time spend across sessions and projects.

Extend it with additional model pricing.
The pricing tables cover OpenAI, Anthropic, and OpenRouter. Custom pricing can be added via PricingManager(custom_pricing_file=...). Any model not yet in the built-in tables can be priced by adding it to a JSON file - no code changes required.

Final Notes

AI API costs are easy to lose track of and expensive to discover late. CostGuard enforces limits before requests go out, not after the bill arrives. Pre-call cost estimation, four-tier circuit breaking, safe mode for expensive requests, and a real-time dashboard all running locally with no data leaving your machine.

The code is at https://github.com/dakshjain-1616/cost-Guard
You can also build with NEO in your IDE using the VS Code extension or Cursor.
You can use NEO MCP with Claude Code: https://heyneo.com/claude-code

DEV Community