As AI models become increasingly central to modern applications, organizations and individuals face a critical challenge: how do you maintain innovation velocity while keeping costs under control? That's where aegis-monitor comes in—a powerful solution for real-time cost governance and model comparison.
In this post, we'll explore a FastAPI demo application that showcases how aegis-monitor helps teams track AI model costs, enforce budgets, and make data-driven decisions when comparing models.
The AI Cost Challenge
Let's be honest: AI model costs can spiral out of control quickly. Whether you're using GPT-4, Claude, or open-source alternatives, every API call adds up. Consider these common scenarios:
- A developer experiments with different prompts, unknowingly racking up hundreds of dollars in costs
- A product team deploys a feature using an expensive model when a cheaper alternative would suffice
- FinOps teams struggle to attribute costs across different teams and projects
- No one notices until the monthly bill arrives—too late to course-correct
Sound familiar? You're not alone. This is exactly why we built this demo to showcase aegis-monitor's capabilities.
Introducing the Aegis-Monitor
Our FastAPI demo application demonstrates three core capabilities that every AI-powered application needs:
1. Real-Time Cost Tracking
Gone are the days of waiting for your cloud bill to understand AI spending. With aegis-monitor's CostCalculatorand PricingRegistry, every inference request is tracked in real-time:
cost_calculator = request.app.state.aegis_cost_calculator
cost = cost_calculator.calculate_request_cost(
model=req.model,
input_tokens=input_tokens,
output_tokens=output_tokens
)
The system automatically:
- Calculates token counts for both input and output
- Applies accurate, model-specific pricing
- Aggregates costs across requests, teams, and time periods
- Stores detailed records for analysis and auditing Why this matters: You gain immediate visibility into exactly what each API call costs, enabling proactive cost management instead of reactive damage control.
2. Budget Enforcement That Actually Works
Visibility is great, but what about prevention? The demo implements budget enforcement at the inference level:
default_budget = db.query(models.Budget).filter(models.Budget.name == "default").first()
if default_budget:
if default_budget.spent + cost > default_budget.limit:
raise HTTPException(status_code=402, detail="Budget limit reached. Inference blocked.")
You can configure budgets:
- Per team: Marketing gets $500/month, Engineering gets $2000/month
- Per model: Limit experimental GPT-4 usage to $100/day
- Global limits: Set organization-wide caps
- Soft warnings: Get alerts at 75% and 90% of budget
- Hard blocks: Automatically prevent inference when limits are reached
Why this matters: Instead of discovering overspending after the fact, you prevent it from happening in the first place. No more surprise bills.
3. Intelligent Model Comparison
Here's where things get really interesting. The /compare endpoint reveals which models deliver the best value:
@router.get("/")
def compare_models(db: Session = Depends(get_db)):
data = db.query(
models.InferenceRecord.model,
func.avg(models.InferenceRecord.quality_score).label("avg_quality"),
func.avg(models.InferenceRecord.cost / models.InferenceRecord.tokens).label("cost_per_token"),
).group_by(models.InferenceRecord.model).all()
This gives you a clear view of:
- Average quality scores per model
- Cost per token for each model
- Quality vs. cost tradeoffs across your model portfolio
Why this matters: Data-driven decisions trump gut feelings. Maybe GPT-4 is worth the premium for customer-facing features, but Claude Sonnet or GPT-3.5 is perfectly adequate for internal tools. Now you have the data to prove it.
Real-World Benefits
Let's talk about the tangible impact of using aegis-monitor:
For Developers
- Instant feedback: See the cost impact of your prompt engineering experiments immediately
- Guilt-free innovation: Experiment within budget guardrails instead of avoiding expensive models altogether
- Better debugging: Track down cost anomalies quickly with detailed inference records
For FinOps Teams
- Predictable spending: Set and enforce budgets before costs spiral
- Cost attribution: Know exactly which teams, projects, or models drive your AI spend
- Trend analysis: Identify usage patterns and optimize spending over time
For Product Teams
- Informed tradeoffs: Choose the right model for each use case based on quality and cost data
- Risk mitigation: Prevent budget overruns that could impact product roadmaps
- Performance benchmarking: Track quality metrics alongside cost to ensure you're getting value
For Data Scientists
- Model evaluation: Compare models objectively on both performance and efficiency
- Cost-aware optimization: Factor in inference costs when selecting models for production
- Regression detection: Get alerts when model changes unexpectedly increase costs
A Quick Tour of the API
The demo exposes several endpoints that work together:
-
POST /infer: Submit inference requests with automatic cost tracking and budget enforcement -
GET /costs: Review total and per-model expenditures -
GET /compare: Analyze quality vs. cost metrics across models -
GET/POST /budgets: Manage team or project-specific budgets -
GET /alerts: Check for budget warnings and violations
Each endpoint integrates seamlessly with aegis-monitor's cost calculation engine, ensuring accuracy and consistency.
Getting Started
Want to try it yourself? The demo is straightforward to run:
# Clone and setup
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Run the application
cd app
fastapi dev main.py
The application uses SQLite for persistence, so there's no complex database setup required. Within minutes, you can be tracking costs and comparing models.
Beyond the Demo: Production Considerations
While this demo uses mock quality scores and simplified scenarios, the principles scale to production:
- Integrate with real LLM providers: Replace mock responses with actual API calls to OpenAI, Anthropic, or your provider of choice
- Implement sophisticated quality metrics: Track task-specific metrics like accuracy, BLEU scores, or custom evaluation criteria
- Add notification channels: Wire up Slack, email, or PagerDuty for real-time alerts
- Extend budget granularity: Add per-user, per-endpoint, or custom budget dimensions
- Build dashboards: Visualize cost trends, quality metrics, and model comparisons over time
The aegis-monitor foundation gives you the flexibility to adapt to your specific needs.
The Bottom Line
AI models are powerful, but without proper cost governance, they can become a budget black hole. Aegis-monitor provides the visibility, control, and intelligence you need to:
✅ Track every dollar spent on AI inference
✅ Enforce budgets proactively before overspending happens
✅ Compare models objectively based on quality and cost
✅ Make data-driven decisions about which models to use and when
✅ Scale confidently knowing costs won't spiral out of control
Whether you're a startup managing limited resources or an enterprise deploying AI at scale, cost governance isn't optional—it's essential. This demo proves that with the right tools, it doesn't have to be complicated.
Try It Out
The full source code for this demo is available in the repository. We encourage you to:
- Clone it and experiment with different models
- Implement your own quality metrics
- Integrate it with your existing AI infrastructure
- Share your feedback and improvements
🚀 Ready to Add Aegis-Monitor to Your Project?
Getting started with aegis-monitor in your own application is simple:
# Install aegis-monitor
pip install aegis-monitor
Then integrate it into your FastAPI application:
from aegis.cost.calculator import CostCalculator
from aegis.cost.pricing_registry import PricingRegistry
# Initialize in your FastAPI app
app = FastAPI()
app.state.aegis_cost_calculator = CostCalculator(PricingRegistry())
# Calculate costs in your inference endpoints
cost = app.state.aegis_cost_calculator.calculate_request_cost(
model="gpt-4",
input_tokens=input_tokens,
output_tokens=output_tokens
)
Here are additional usage patterns you can drop into a real project.
Sample 1: App Startup Setup
from fastapi import FastAPI
from aegis.cost.calculator import CostCalculator
from aegis.cost.pricing_registry import PricingRegistry
app = FastAPI(title="AI Gateway")
@app.on_event("startup")
def init_aegis_monitor() -> None:
registry = PricingRegistry()
app.state.aegis_cost_calculator = CostCalculator(registry)
Sample 2: Track Cost in an Inference Endpoint
from fastapi import APIRouter, HTTPException, Request
router = APIRouter()
@router.post("/infer")
def infer(payload: dict, request: Request):
model = payload.get("model", "gpt-4")
prompt = payload.get("prompt", "")
# Replace with your tokenizer in production.
input_tokens = len(prompt) // 4 + 20
output_tokens = 120
try:
cost = request.app.state.aegis_cost_calculator.calculate_request_cost(
model=model,
input_tokens=input_tokens,
output_tokens=output_tokens,
)
except ValueError as exc:
raise HTTPException(status_code=400, detail=str(exc)) from exc
return {
"model": model,
"input_tokens": input_tokens,
"output_tokens": output_tokens,
"estimated_cost_usd": cost,
}
Sample 3: Budget Guardrail Check Before Processing
def enforce_budget_or_raise(current_spend: float, budget_limit: float, request_cost: float) -> None:
projected_spend = current_spend + request_cost
if projected_spend > budget_limit:
raise HTTPException(status_code=402, detail="Budget limit reached. Inference blocked.")
Sample 4: Client Calls to Compare Models
# Run two requests on different models
curl -X POST http://127.0.0.1:8000/infer \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4","prompt":"Summarize this report"}'
curl -X POST http://127.0.0.1:8000/infer \
-H "Content-Type: application/json" \
-d '{"model":"claude-3-haiku","prompt":"Summarize this report"}'
# Review quality vs cost comparison
curl http://127.0.0.1:8000/compare/
Sample 5: Budget Creation and Alert Check
# Create or update a default budget
curl -X POST http://127.0.0.1:8000/budgets/ \
-H "Content-Type: application/json" \
-d '{"name":"default","limit":50.0,"spent":0.0}'
# Inspect budget alerts
curl http://127.0.0.1:8000/alerts/
That's it! You now have accurate, real-time cost tracking for your AI models.
Next Steps
- Explore the demo: Clone this repository and run it locally to see aegis-monitor in action
-
Install aegis-monitor: Add
aegis-monitorto your requirements.txt - Integrate with your app: Start tracking costs in your existing inference endpoints
- Set up budgets: Implement budget enforcement to prevent overspending
- Build dashboards: Visualize your cost and quality metrics
Ready to take control of your AI costs? Give aegis-monitor a try, and see how visibility and governance can transform your AI operations.
Learn More
- 📦 PyPI:
pip install aegis-monitor - 📦 GitHub: Full code
- 📚 Demo Repository: Check out the full source code with working example
- 💬 Documentation: Share your feedback and contribute improvements
Have questions or want to share your own cost governance stories? Reach out to our team or open an issue in the repository. We'd love to hear from you!
Top comments (0)