Ezekiel Adetoro

Posted on Mar 6

Taking Control of AI Model Costs with Aegis-Monitor

#ai #python #programming #machinelearning

As AI models become increasingly central to modern applications, organizations and individuals face a critical challenge: how do you maintain innovation velocity while keeping costs under control? That's where aegis-monitor comes in—a powerful solution for real-time cost governance and model comparison.

In this post, we'll explore a FastAPI demo application that showcases how aegis-monitor helps teams track AI model costs, enforce budgets, and make data-driven decisions when comparing models.

The AI Cost Challenge

Let's be honest: AI model costs can spiral out of control quickly. Whether you're using GPT-4, Claude, or open-source alternatives, every API call adds up. Consider these common scenarios:

A developer experiments with different prompts, unknowingly racking up hundreds of dollars in costs
A product team deploys a feature using an expensive model when a cheaper alternative would suffice
FinOps teams struggle to attribute costs across different teams and projects
No one notices until the monthly bill arrives—too late to course-correct

Sound familiar? You're not alone. This is exactly why we built this demo to showcase aegis-monitor's capabilities.

Introducing the Aegis-Monitor

Our FastAPI demo application demonstrates three core capabilities that every AI-powered application needs:

1. Real-Time Cost Tracking
Gone are the days of waiting for your cloud bill to understand AI spending. With aegis-monitor's CostCalculatorand PricingRegistry, every inference request is tracked in real-time:

cost_calculator = request.app.state.aegis_cost_calculator
cost = cost_calculator.calculate_request_cost(
    model=req.model,
    input_tokens=input_tokens,
    output_tokens=output_tokens
)

The system automatically:

Calculates token counts for both input and output
Applies accurate, model-specific pricing
Aggregates costs across requests, teams, and time periods
Stores detailed records for analysis and auditing Why this matters: You gain immediate visibility into exactly what each API call costs, enabling proactive cost management instead of reactive damage control.

2. Budget Enforcement That Actually Works

Visibility is great, but what about prevention? The demo implements budget enforcement at the inference level:

default_budget = db.query(models.Budget).filter(models.Budget.name == "default").first()
if default_budget:
    if default_budget.spent + cost > default_budget.limit:
        raise HTTPException(status_code=402, detail="Budget limit reached. Inference blocked.")

You can configure budgets:

Per team: Marketing gets $500/month, Engineering gets $2000/month
Per model: Limit experimental GPT-4 usage to $100/day
Global limits: Set organization-wide caps
Soft warnings: Get alerts at 75% and 90% of budget
Hard blocks: Automatically prevent inference when limits are reached

Why this matters: Instead of discovering overspending after the fact, you prevent it from happening in the first place. No more surprise bills.

3. Intelligent Model Comparison

Here's where things get really interesting. The /compare endpoint reveals which models deliver the best value:

@router.get("/")
def compare_models(db: Session = Depends(get_db)):
    data = db.query(
        models.InferenceRecord.model,
        func.avg(models.InferenceRecord.quality_score).label("avg_quality"),
        func.avg(models.InferenceRecord.cost / models.InferenceRecord.tokens).label("cost_per_token"),
    ).group_by(models.InferenceRecord.model).all()

This gives you a clear view of:

Average quality scores per model
Cost per token for each model
Quality vs. cost tradeoffs across your model portfolio

Why this matters: Data-driven decisions trump gut feelings. Maybe GPT-4 is worth the premium for customer-facing features, but Claude Sonnet or GPT-3.5 is perfectly adequate for internal tools. Now you have the data to prove it.

Real-World Benefits

Let's talk about the tangible impact of using aegis-monitor:

For Developers

Instant feedback: See the cost impact of your prompt engineering experiments immediately
Guilt-free innovation: Experiment within budget guardrails instead of avoiding expensive models altogether
Better debugging: Track down cost anomalies quickly with detailed inference records

For FinOps Teams

Predictable spending: Set and enforce budgets before costs spiral
Cost attribution: Know exactly which teams, projects, or models drive your AI spend
Trend analysis: Identify usage patterns and optimize spending over time

For Product Teams

Informed tradeoffs: Choose the right model for each use case based on quality and cost data
Risk mitigation: Prevent budget overruns that could impact product roadmaps
Performance benchmarking: Track quality metrics alongside cost to ensure you're getting value

For Data Scientists

Model evaluation: Compare models objectively on both performance and efficiency
Cost-aware optimization: Factor in inference costs when selecting models for production
Regression detection: Get alerts when model changes unexpectedly increase costs

A Quick Tour of the API

The demo exposes several endpoints that work together:

POST /infer: Submit inference requests with automatic cost tracking and budget enforcement
GET /costs: Review total and per-model expenditures
GET /compare: Analyze quality vs. cost metrics across models
GET/POST /budgets: Manage team or project-specific budgets
GET /alerts: Check for budget warnings and violations

Each endpoint integrates seamlessly with aegis-monitor's cost calculation engine, ensuring accuracy and consistency.

Getting Started

Want to try it yourself? The demo is straightforward to run:

# Clone and setup
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Run the application
cd app
fastapi dev main.py

The application uses SQLite for persistence, so there's no complex database setup required. Within minutes, you can be tracking costs and comparing models.

Beyond the Demo: Production Considerations

While this demo uses mock quality scores and simplified scenarios, the principles scale to production:

Integrate with real LLM providers: Replace mock responses with actual API calls to OpenAI, Anthropic, or your provider of choice
Implement sophisticated quality metrics: Track task-specific metrics like accuracy, BLEU scores, or custom evaluation criteria
Add notification channels: Wire up Slack, email, or PagerDuty for real-time alerts
Extend budget granularity: Add per-user, per-endpoint, or custom budget dimensions
Build dashboards: Visualize cost trends, quality metrics, and model comparisons over time

The aegis-monitor foundation gives you the flexibility to adapt to your specific needs.

The Bottom Line

AI models are powerful, but without proper cost governance, they can become a budget black hole. Aegis-monitor provides the visibility, control, and intelligence you need to:

✅ Track every dollar spent on AI inference
✅ Enforce budgets proactively before overspending happens
✅ Compare models objectively based on quality and cost
✅ Make data-driven decisions about which models to use and when
✅ Scale confidently knowing costs won't spiral out of control

Whether you're a startup managing limited resources or an enterprise deploying AI at scale, cost governance isn't optional—it's essential. This demo proves that with the right tools, it doesn't have to be complicated.

Try It Out

The full source code for this demo is available in the repository. We encourage you to:

Clone it and experiment with different models
Implement your own quality metrics
Integrate it with your existing AI infrastructure
Share your feedback and improvements

🚀 Ready to Add Aegis-Monitor to Your Project?

Getting started with aegis-monitor in your own application is simple:

# Install aegis-monitor
pip install aegis-monitor

Then integrate it into your FastAPI application:

from aegis.cost.calculator import CostCalculator
from aegis.cost.pricing_registry import PricingRegistry

# Initialize in your FastAPI app
app = FastAPI()
app.state.aegis_cost_calculator = CostCalculator(PricingRegistry())

# Calculate costs in your inference endpoints
cost = app.state.aegis_cost_calculator.calculate_request_cost(
    model="gpt-4",
    input_tokens=input_tokens,
    output_tokens=output_tokens
)

Here are additional usage patterns you can drop into a real project.

Sample 1: App Startup Setup

from fastapi import FastAPI
from aegis.cost.calculator import CostCalculator
from aegis.cost.pricing_registry import PricingRegistry

app = FastAPI(title="AI Gateway")

@app.on_event("startup")
def init_aegis_monitor() -> None:
    registry = PricingRegistry()
    app.state.aegis_cost_calculator = CostCalculator(registry)

Sample 2: Track Cost in an Inference Endpoint

from fastapi import APIRouter, HTTPException, Request

router = APIRouter()

@router.post("/infer")
def infer(payload: dict, request: Request):
    model = payload.get("model", "gpt-4")
    prompt = payload.get("prompt", "")

    # Replace with your tokenizer in production.
    input_tokens = len(prompt) // 4 + 20
    output_tokens = 120

    try:
        cost = request.app.state.aegis_cost_calculator.calculate_request_cost(
            model=model,
            input_tokens=input_tokens,
            output_tokens=output_tokens,
        )
    except ValueError as exc:
        raise HTTPException(status_code=400, detail=str(exc)) from exc

    return {
        "model": model,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "estimated_cost_usd": cost,
    }

Sample 3: Budget Guardrail Check Before Processing

def enforce_budget_or_raise(current_spend: float, budget_limit: float, request_cost: float) -> None:
    projected_spend = current_spend + request_cost
    if projected_spend > budget_limit:
        raise HTTPException(status_code=402, detail="Budget limit reached. Inference blocked.")

Sample 4: Client Calls to Compare Models

# Run two requests on different models
curl -X POST http://127.0.0.1:8000/infer \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4","prompt":"Summarize this report"}'

curl -X POST http://127.0.0.1:8000/infer \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-3-haiku","prompt":"Summarize this report"}'

# Review quality vs cost comparison
curl http://127.0.0.1:8000/compare/

Sample 5: Budget Creation and Alert Check

# Create or update a default budget
curl -X POST http://127.0.0.1:8000/budgets/ \
  -H "Content-Type: application/json" \
  -d '{"name":"default","limit":50.0,"spent":0.0}'

# Inspect budget alerts
curl http://127.0.0.1:8000/alerts/

That's it! You now have accurate, real-time cost tracking for your AI models.

Next Steps

Explore the demo: Clone this repository and run it locally to see aegis-monitor in action
Install aegis-monitor: Add aegis-monitor to your requirements.txt
Integrate with your app: Start tracking costs in your existing inference endpoints
Set up budgets: Implement budget enforcement to prevent overspending
Build dashboards: Visualize your cost and quality metrics

Ready to take control of your AI costs? Give aegis-monitor a try, and see how visibility and governance can transform your AI operations.

Learn More

📦 PyPI: pip install aegis-monitor
📦 GitHub: Full code
📚 Demo Repository: Check out the full source code with working example
💬 Documentation: Share your feedback and contribute improvements

Have questions or want to share your own cost governance stories? Reach out to our team or open an issue in the repository. We'd love to hear from you!

DEV Community