DEV Community

Cover image for Building a GenAI Observability Deck: Tracking Latency and Costs with CloudWatch Logs Insights
Eric Rodríguez
Eric Rodríguez

Posted on

Building a GenAI Observability Deck: Tracking Latency and Costs with CloudWatch Logs Insights

Deploying an LLM is easy. Knowing what it’s doing in production is hard. When you move from a local script to a scheduled Lambda function running 24/7, print() statements are no longer sufficient. You need Observability.

For Day 33, I focused on building a "Mission Control" dashboard for my Finance Agent. The goal was to visualize not just technical metrics (errors), but business metrics (cost).

  1. The Strategy: Structured Logging

Standard text logs are hard to parse. To make my logs "queryable," I moved to JSON Structured Logging. In my Python Lambda, I implemented a wrapper function that outputs metrics in a specific format:

Python
def log_metric(metric_name, value, unit="Count", properties={}):
print(json.dumps({
"metric": metric_name,
"value": value,
"unit": unit,
"timestamp": datetime.datetime.now().isoformat(),
**properties
}))

This simple change transforms a log stream into a database. CloudWatch can now automatically parse metric, value, and properties.

  1. Calculating GenAI Costs on the Fly

Amazon Bedrock charges per token. To track this in real-time, I calculate the cost immediately after the inference:

Python

Pricing for Amazon Nova Micro (Example)

cost_input = (input_tokens / 1000) * 0.00035
cost_output = (output_tokens / 1000) * 0.00140
total_cost = cost_input + cost_output

log_metric("AICost", total_cost, unit="USD")

  1. The Power of Logs Insights

With the data in the logs, I didn't need to set up complex Prometheus exporters or external tools. I used CloudWatch Logs Insights with this query:

SQL
filter metric="AICost"
| stats sum(value) as Total_USD by bin(1d)
This query filters the stream for my specific metric, sums the value, and bins it by day.

  1. The Dashboard

I combined this custom query with standard Lambda metrics (Duration, Invocations, Errors) into a single CloudWatch Dashboard. The result is a comprehensive view of the system's health. I can correlate a spike in latency with a specific complex query, and immediately see the financial impact of that spike.

Conclusion Observability is not about having pretty charts. It's about answering questions. By structuring my logs, I can now answer "How much did the AI spend today?" without leaving the IDE.

Top comments (0)