DEV Community

Cover image for Taking Control of AI Model Costs with Aegis-Monitor
Ezekiel Adetoro
Ezekiel Adetoro

Posted on

Taking Control of AI Model Costs with Aegis-Monitor

As AI models become increasingly central to modern applications, organizations and individuals face a critical challenge: how do you maintain innovation velocity while keeping costs under control? That's where aegis-monitor comes in—a powerful solution for real-time cost governance and model comparison.

In this post, we'll explore a FastAPI demo application that showcases how aegis-monitor helps teams track AI model costs, enforce budgets, and make data-driven decisions when comparing models.

The AI Cost Challenge

Let's be honest: AI model costs can spiral out of control quickly. Whether you're using GPT-4, Claude, or open-source alternatives, every API call adds up. Consider these common scenarios:

  • A developer experiments with different prompts, unknowingly racking up hundreds of dollars in costs
  • A product team deploys a feature using an expensive model when a cheaper alternative would suffice
  • FinOps teams struggle to attribute costs across different teams and projects
  • No one notices until the monthly bill arrives—too late to course-correct

Sound familiar? You're not alone. This is exactly why we built this demo to showcase aegis-monitor's capabilities.

Introducing the Aegis-Monitor

Our FastAPI demo application demonstrates three core capabilities that every AI-powered application needs:

1. Real-Time Cost Tracking
Gone are the days of waiting for your cloud bill to understand AI spending. With aegis-monitor's CostCalculatorand PricingRegistry, every inference request is tracked in real-time:

cost_calculator = request.app.state.aegis_cost_calculator
cost = cost_calculator.calculate_request_cost(
    model=req.model,
    input_tokens=input_tokens,
    output_tokens=output_tokens
)
Enter fullscreen mode Exit fullscreen mode

The system automatically:

  • Calculates token counts for both input and output
  • Applies accurate, model-specific pricing
  • Aggregates costs across requests, teams, and time periods
  • Stores detailed records for analysis and auditing Why this matters: You gain immediate visibility into exactly what each API call costs, enabling proactive cost management instead of reactive damage control.

2. Budget Enforcement That Actually Works

Visibility is great, but what about prevention? The demo implements budget enforcement at the inference level:

default_budget = db.query(models.Budget).filter(models.Budget.name == "default").first()
if default_budget:
    if default_budget.spent + cost > default_budget.limit:
        raise HTTPException(status_code=402, detail="Budget limit reached. Inference blocked.")
Enter fullscreen mode Exit fullscreen mode

You can configure budgets:

  • Per team: Marketing gets $500/month, Engineering gets $2000/month
  • Per model: Limit experimental GPT-4 usage to $100/day
  • Global limits: Set organization-wide caps
  • Soft warnings: Get alerts at 75% and 90% of budget
  • Hard blocks: Automatically prevent inference when limits are reached

Why this matters: Instead of discovering overspending after the fact, you prevent it from happening in the first place. No more surprise bills.

3. Intelligent Model Comparison

Here's where things get really interesting. The /compare endpoint reveals which models deliver the best value:

@router.get("/")
def compare_models(db: Session = Depends(get_db)):
    data = db.query(
        models.InferenceRecord.model,
        func.avg(models.InferenceRecord.quality_score).label("avg_quality"),
        func.avg(models.InferenceRecord.cost / models.InferenceRecord.tokens).label("cost_per_token"),
    ).group_by(models.InferenceRecord.model).all()
Enter fullscreen mode Exit fullscreen mode

This gives you a clear view of:

  • Average quality scores per model
  • Cost per token for each model
  • Quality vs. cost tradeoffs across your model portfolio

Why this matters: Data-driven decisions trump gut feelings. Maybe GPT-4 is worth the premium for customer-facing features, but Claude Sonnet or GPT-3.5 is perfectly adequate for internal tools. Now you have the data to prove it.

Real-World Benefits

Let's talk about the tangible impact of using aegis-monitor:

For Developers

  • Instant feedback: See the cost impact of your prompt engineering experiments immediately
  • Guilt-free innovation: Experiment within budget guardrails instead of avoiding expensive models altogether
  • Better debugging: Track down cost anomalies quickly with detailed inference records

For FinOps Teams

  • Predictable spending: Set and enforce budgets before costs spiral
  • Cost attribution: Know exactly which teams, projects, or models drive your AI spend
  • Trend analysis: Identify usage patterns and optimize spending over time

For Product Teams

  • Informed tradeoffs: Choose the right model for each use case based on quality and cost data
  • Risk mitigation: Prevent budget overruns that could impact product roadmaps
  • Performance benchmarking: Track quality metrics alongside cost to ensure you're getting value

For Data Scientists

  • Model evaluation: Compare models objectively on both performance and efficiency
  • Cost-aware optimization: Factor in inference costs when selecting models for production
  • Regression detection: Get alerts when model changes unexpectedly increase costs

A Quick Tour of the API

The demo exposes several endpoints that work together:

  • POST /infer: Submit inference requests with automatic cost tracking and budget enforcement
  • GET /costs: Review total and per-model expenditures
  • GET /compare: Analyze quality vs. cost metrics across models
  • GET/POST /budgets: Manage team or project-specific budgets
  • GET /alerts: Check for budget warnings and violations

Each endpoint integrates seamlessly with aegis-monitor's cost calculation engine, ensuring accuracy and consistency.

Getting Started

Want to try it yourself? The demo is straightforward to run:

# Clone and setup
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Run the application
cd app
fastapi dev main.py
Enter fullscreen mode Exit fullscreen mode

The application uses SQLite for persistence, so there's no complex database setup required. Within minutes, you can be tracking costs and comparing models.

Beyond the Demo: Production Considerations

While this demo uses mock quality scores and simplified scenarios, the principles scale to production:

  1. Integrate with real LLM providers: Replace mock responses with actual API calls to OpenAI, Anthropic, or your provider of choice
  2. Implement sophisticated quality metrics: Track task-specific metrics like accuracy, BLEU scores, or custom evaluation criteria
  3. Add notification channels: Wire up Slack, email, or PagerDuty for real-time alerts
  4. Extend budget granularity: Add per-user, per-endpoint, or custom budget dimensions
  5. Build dashboards: Visualize cost trends, quality metrics, and model comparisons over time

The aegis-monitor foundation gives you the flexibility to adapt to your specific needs.

The Bottom Line

AI models are powerful, but without proper cost governance, they can become a budget black hole. Aegis-monitor provides the visibility, control, and intelligence you need to:

Track every dollar spent on AI inference
Enforce budgets proactively before overspending happens
Compare models objectively based on quality and cost
Make data-driven decisions about which models to use and when
Scale confidently knowing costs won't spiral out of control

Whether you're a startup managing limited resources or an enterprise deploying AI at scale, cost governance isn't optional—it's essential. This demo proves that with the right tools, it doesn't have to be complicated.

Try It Out

The full source code for this demo is available in the repository. We encourage you to:

  • Clone it and experiment with different models
  • Implement your own quality metrics
  • Integrate it with your existing AI infrastructure
  • Share your feedback and improvements

🚀 Ready to Add Aegis-Monitor to Your Project?

Getting started with aegis-monitor in your own application is simple:

# Install aegis-monitor
pip install aegis-monitor
Enter fullscreen mode Exit fullscreen mode

Then integrate it into your FastAPI application:

from aegis.cost.calculator import CostCalculator
from aegis.cost.pricing_registry import PricingRegistry

# Initialize in your FastAPI app
app = FastAPI()
app.state.aegis_cost_calculator = CostCalculator(PricingRegistry())

# Calculate costs in your inference endpoints
cost = app.state.aegis_cost_calculator.calculate_request_cost(
    model="gpt-4",
    input_tokens=input_tokens,
    output_tokens=output_tokens
)
Enter fullscreen mode Exit fullscreen mode

Here are additional usage patterns you can drop into a real project.

Sample 1: App Startup Setup

from fastapi import FastAPI
from aegis.cost.calculator import CostCalculator
from aegis.cost.pricing_registry import PricingRegistry

app = FastAPI(title="AI Gateway")

@app.on_event("startup")
def init_aegis_monitor() -> None:
    registry = PricingRegistry()
    app.state.aegis_cost_calculator = CostCalculator(registry)
Enter fullscreen mode Exit fullscreen mode

Sample 2: Track Cost in an Inference Endpoint

from fastapi import APIRouter, HTTPException, Request

router = APIRouter()

@router.post("/infer")
def infer(payload: dict, request: Request):
    model = payload.get("model", "gpt-4")
    prompt = payload.get("prompt", "")

    # Replace with your tokenizer in production.
    input_tokens = len(prompt) // 4 + 20
    output_tokens = 120

    try:
        cost = request.app.state.aegis_cost_calculator.calculate_request_cost(
            model=model,
            input_tokens=input_tokens,
            output_tokens=output_tokens,
        )
    except ValueError as exc:
        raise HTTPException(status_code=400, detail=str(exc)) from exc

    return {
        "model": model,
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "estimated_cost_usd": cost,
    }
Enter fullscreen mode Exit fullscreen mode

Sample 3: Budget Guardrail Check Before Processing

def enforce_budget_or_raise(current_spend: float, budget_limit: float, request_cost: float) -> None:
    projected_spend = current_spend + request_cost
    if projected_spend > budget_limit:
        raise HTTPException(status_code=402, detail="Budget limit reached. Inference blocked.")
Enter fullscreen mode Exit fullscreen mode

Sample 4: Client Calls to Compare Models

# Run two requests on different models
curl -X POST http://127.0.0.1:8000/infer \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4","prompt":"Summarize this report"}'

curl -X POST http://127.0.0.1:8000/infer \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-3-haiku","prompt":"Summarize this report"}'

# Review quality vs cost comparison
curl http://127.0.0.1:8000/compare/
Enter fullscreen mode Exit fullscreen mode

Sample 5: Budget Creation and Alert Check

# Create or update a default budget
curl -X POST http://127.0.0.1:8000/budgets/ \
  -H "Content-Type: application/json" \
  -d '{"name":"default","limit":50.0,"spent":0.0}'

# Inspect budget alerts
curl http://127.0.0.1:8000/alerts/
Enter fullscreen mode Exit fullscreen mode

That's it! You now have accurate, real-time cost tracking for your AI models.

Next Steps

  1. Explore the demo: Clone this repository and run it locally to see aegis-monitor in action
  2. Install aegis-monitor: Add aegis-monitor to your requirements.txt
  3. Integrate with your app: Start tracking costs in your existing inference endpoints
  4. Set up budgets: Implement budget enforcement to prevent overspending
  5. Build dashboards: Visualize your cost and quality metrics

Ready to take control of your AI costs? Give aegis-monitor a try, and see how visibility and governance can transform your AI operations.

Learn More

  • 📦 PyPI: pip install aegis-monitor
  • 📦 GitHub: Full code
  • 📚 Demo Repository: Check out the full source code with working example
  • 💬 Documentation: Share your feedback and contribute improvements

Have questions or want to share your own cost governance stories? Reach out to our team or open an issue in the repository. We'd love to hear from you!

Top comments (0)