DEV Community

Cover image for MLflow-Powered Generative AI Observability with Amazon Bedrock
Vivek V. for AWS Community Builders

Posted on • Originally published at vivek-aws.Medium

4 1

MLflow-Powered Generative AI Observability with Amazon Bedrock

Complete AWS-CDK implementation for Integrating Amazon Bedrock with MLflow on ECS Fargate

This step-by-step guide demonstrates how to build a generative AI observability platform based on MLflow tracking server. We will be deploying MLflow on Amazon ECS Fargate using AWS CDK (TypeScript) and integrate it with Amazon Bedrock in our scripts to track, monitor, and analyze LLM interactions.

Architecture Overview

MLflow-Powered Gen AI Observability on ECS Fargate

Our solution consists of the following components:

  • MLflow Tracking Server: Deployed as a containerized application on Amazon ECS Fargate
  • PostgreSQL Database: Lightweight, open-source database backend for MLflow metadata on Amazon ECS Fargate
  • Amazon S3: Artifact store for MLflow experiments
  • Amazon Bedrock: Fully managed foundation model service
  • AWS Cloud Map: Service discovery for the MLflow and PostgreSQL containers, enabling seamless communication between components within the ECS cluster
  • AWS CDK: Infrastructure as code using TypeScript

This architecture provides a scalable and cost-effective platform for tracking and analyzing your generative AI applications with MLflow’s powerful tracing capabilities.

Step-by-Step Implementation

Prerequisites

  • AWS Account with appropriate permissions
  • AWS CLI configured locally
  • Node.js and npm installed
  • AWS CDK Toolkit (npm install -g aws-cdk)
  • Docker installed locally
  • Basic knowledge of TypeScript and CDK

Clone the Repository

git clone https://github.com/awsdataarchitect/mlflow-bedrock-cdk.git
cd mlflow-bedrock-cdk
Enter fullscreen mode Exit fullscreen mode

CDK Infrastructure Deployment

The core infrastructure is defined in lib/mlflow-bedrock-cdk-stack.ts:

cdk bootstrap
cdk deploy
Enter fullscreen mode Exit fullscreen mode

Our CDK stack creates:

  • A VPC with public and private subnets
  • An ECS cluster to host our containers
  • An S3 bucket for MLflow artifact storage
  • A PostgreSQL database running in Fargate for MLflow backend
  • A load-balanced MLflow service with public access
  • Appropriate IAM roles and security groups
  • AWS Cloud Map namespace for service discovery
  • ECR repository for storing container images
  • Fargate tasks for both MLflow and PostgreSQL services
  • Application Load Balancer (ALB) for routing traffic to the MLflow service

The deployment will take 5 - 6 minutes. Once completed, the CDK will output the URL for your MLflow tracking server.

cdk deploy output

Integrating Amazon Bedrock with MLflow

Now that we have our MLflow server running, let’s run our sample Python scripts to demonstrate the integration with Amazon Bedrock.

Basic Tracing Example (bedrock_tracing.py)

import boto3
import mlflow
import tiktoken
from mlflow.entities import SpanType

# Set your ALB tracking URI 
mlflow.set_tracking_uri("http://Mlflow-MLflo-loG90qp0PACe-859729803.us-east-1.elb.amazonaws.com")

# Enable auto-tracing
mlflow.bedrock.autolog()

# Create experiment
mlflow.set_experiment("Bedrock-Token-Cost-Demo")

....

prompt = "Explain machine learning observability in one paragraph"

with mlflow.start_run():
    mlflow.log_param("prompt", prompt)

    # Fixed message structure
    response = bedrock.converse(
        modelId=model_id,
        messages=[{
            "role": "user",
            "content": [{
                "text": prompt  # Content must be list of content blocks
            }]
        }],
        inferenceConfig={
            "maxTokens": 512,
            "temperature": 0.1,
            "topP": 0.9
        }
    )
....
....
Enter fullscreen mode Exit fullscreen mode

What this example script does:

  • MLflow Setup: Configures tracking, enables auto-tracing, and sets up an experiment (Bedrock-Token-Cost-Demo).
  • Bedrock Client: Initializes boto3 client for amazon.nova-lite-v1:0.
  • Token Cost Calculation: Computes token counts and cost using Amazon Nova Lite model pricing.
  • Inference Execution: Sends prompt to Bedrock, retrieves and extracts response.
  • MLflow Logging: Logs prompt, token metrics, and cost; prints response and total cost.

Example Output

python3 scripts/bedrock_tracing.py

2025/03/28 12:35:42 INFO mlflow.bedrock: Enabled auto-tracing for Bedrock. Note that MLflow can only trace boto3 service clients that are created after this call. If you have already created one, please recreate the client by calling `boto3.client`.
2025/03/28 12:35:44 INFO mlflow.tracking.fluent: Experiment with name 'Bedrock-Token-Cost-Demo' does not exist. Creating a new experiment.
Response: Machine learning observability refers to the ability to monitor, diagnose, and understand the behavior and performance of machine learning models in real-time and throughout their lifecycle. It involves collecting and analyzing various metrics, logs, and traces to gain insights into model performance, data quality, and operational health. Observability helps in identifying issues such as data drift, model drift, and performance degradation, enabling data scientists and engineers to make informed decisions, ensure model reliability, and maintain the overall health of machine learning systems. This practice is crucial for maintaining model accuracy, ensuring compliance, and facilitating continuous improvement in machine learning deployments.
Cost: $0.02934000
🏃 View run rambunctious-koi-880 at: http://Mlflow-MLflo-cXfL3g06yBhj-966380976.us-east-1.elb.amazonaws.com/#/experiments/1/runs/cabc0132b4164555bab2e6e7d0e5fb04
🧪 View experiment at: http://Mlflow-MLflo-cXfL3g06yBhj-966380976.us-east-1.elb.amazonaws.com/#/experiments/1
Enter fullscreen mode Exit fullscreen mode

Advanced Features

Streaming Responses (bedrock_streaming.py):

MLflow’s Bedrock integration offers several advanced features like streaming and function calls (tool usage for Anthropic and Amazon Nova models that support tools). Let’s explore streaming responses with tracing.

import boto3
import mlflow

# Set tracking URI to your deployed MLflow server
mlflow.set_tracking_uri("http://Mlflow-MLflo-cXfL3g06yBhj-966380976.us-east-1.elb.amazonaws.com")  # Replace with your actual URL

# Enable auto-tracing for Amazon Bedrock
mlflow.bedrock.autolog()
mlflow.set_experiment("Bedrock-Streaming")

# Create a boto3 client for Bedrock
bedrock = boto3.client(
    service_name="bedrock-runtime",
    region_name="us-east-1",  # Replace with your region
)

# Call Bedrock streaming API
response = bedrock.converse_stream(
    modelId = "amazon.nova-lite-v1:0", # Or any Bedrock model you are using
    messages=[
        {
            "role": "user",
            "content": [
                {"text": "Write a short poem about machine learning observability."}
            ]
        }
    ],
    inferenceConfig={
        "maxTokens": 300,
        "temperature": 0.1,
        "topP": 0.9,
    }
)

# Process streaming response
for chunk in response["stream"]:
    if "message" in chunk:
        message_content = chunk["message"]["content"]
        if message_content:
            print(message_content[0]["text"], end="", flush=True)
Enter fullscreen mode Exit fullscreen mode

MLflow captures streaming responses by creating a span when the streaming chunks are consumed, combining them into a single trace that can be viewed in the MLflow UI

python3 scripts/bedrock_streaming.py 
2025/03/28 12:43:57 INFO mlflow.bedrock: Enabled auto-tracing for Bedrock. Note that MLflow can only trace boto3 service clients that are created after this call. If you have already created one, please recreate the client by calling `boto3.client`.
2025/03/28 12:43:58 INFO mlflow.tracking.fluent: Experiment with name 'Bedrock-Streaming' does not exist. Creating a new experiment.
Enter fullscreen mode Exit fullscreen mode

Observability in MLflow UI

Exploring Traces in the MLflow UI
After running these examples, you can navigate to your MLflow server URL (AWS Application Load Balancer) to explore the captured traces:

  1. In the MLflow UI, browse to the experiments you created
  2. Click on an experiment to see individual runs
  3. Each run represents a single interaction with Bedrock
  4. Click on a run to view detailed information:
  • The prompt and completion
  • Model parameters (temperature, max_tokens, etc.)
  • Latency information
  • Streaming events (if applicable)
  • Function calls (if applicable)

The MLflow UI provides a comprehensive view of your model interactions, making it easy to track and compare different prompts and responses.

MLflow UI
MLflow UI

MLflow Run Experiment
MLflow Run Experiment

Model Metrics
Model Metrics

MLflow Traces
MLflow Traces

Trace details
Trace details

Production Considerations

For production deployments, consider these enhancements:

Auto Scaling

Add the following to your CDK stack to enable auto-scaling for your MLflow service:

const scalableTarget = mlflowService.service.autoScaleTaskCount({
  minCapacity: 1,
  maxCapacity: 5
});
scalableTarget.scaleOnCpuUtilization('CpuScaling', {
  targetUtilizationPercent: 70,
  scaleInCooldown: cdk.Duration.seconds(60),
  scaleOutCooldown: cdk.Duration.seconds(60)
});
Enter fullscreen mode Exit fullscreen mode

Security Enhancements

Use AWS Secrets Manager for PostgreSQL credentials:

const dbCredentials = new secretsmanager.Secret(this, 'PostgresCredentials', {
  secretName: `${projectName}-db-credentials`,
  generateSecretString: {
    secretStringTemplate: JSON.stringify({ username: 'mlflow' }),
    generateStringKey: 'password',
    excludePunctuation: true
  }
});

// Reference in container definition
const postgresContainer = postgresTaskDefinition.addContainer('PostgresContainer', {
  // ...
  secrets: {
    'POSTGRES_USER': ecs.Secret.fromSecretsManager(dbCredentials, 'username'),
    'POSTGRES_PASSWORD': ecs.Secret.fromSecretsManager(dbCredentials, 'password')
  }
});
Enter fullscreen mode Exit fullscreen mode

Restrict access to your MLflow server:

// In your MLflow service definition
const mlflowService = new ApplicationLoadBalancedFargateService(this, 'MLflowService', {
  // ...
  publicLoadBalancer: true
});

// Add ingress rule to only allow specific IP ranges
const lb = mlflowService.loadBalancer;
const lbsg = lb.connections.securityGroups[0];
lbsg.addIngressRule(
  ec2.Peer.ipv4('192.0.2.0/24'), // Replace with your IP range
  ec2.Port.tcp(80),
  'Allow access from corporate network only'
);
Enter fullscreen mode Exit fullscreen mode

Troubleshooting Guide

Common issues and solutions:

MLflow Can’t Connect to PostgreSQL

If MLflow can’t connect to the PostgreSQL database:

  1. Check that security groups allow traffic on port 5432
  2. Verify that service discovery is properly configured
  3. Check if the PostgreSQL container is running

Bedrock API Permissions

If you encounter permission issues when calling Bedrock:

  1. Verify that your IAM role has the necessary Bedrock permissions
  2. Check if you’re trying to use a model that’s not enabled in your account
  3. Make sure the region in your Bedrock client matches the region where the model is available

MLflow UI Not Loading Traces

If traces don’t appear in the MLflow UI:

  1. Verify that mlflow.bedrock.autolog() is called before making Bedrock API calls
  2. Check if the tracking URI is set correctly
  3. Look for any exceptions in your Python script

Cleanup

To delete all AWS resources provisioned via CDK, run:

cdk destroy
Enter fullscreen mode Exit fullscreen mode

This ensures that no unnecessary infrastructure is left running and helps avoid additional costs.

Conclusion

In this guide, we have implemented an MLflow-powered observability framework for generative AI, seamlessly integrated with Amazon Bedrock for model interaction tracking, cost analysis, and token usage monitoring. By deploying MLflow on ECS with a lightweight PostgreSQL backend, we’ve created a cost-effective solution for tracking and analyzing LLM interactions.

By using CDK, we’ve made the infrastructure deployment repeatable and maintainable, allowing you to easily version and update your observability platform as your needs evolve.

Heroku

Deploy with ease. Manage efficiently. Scale faster.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started

Top comments (0)

Best Practices for Running  Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK cover image

Best Practices for Running Container WordPress on AWS (ECS, EFS, RDS, ELB) using CDK

This post discusses the process of migrating a growing WordPress eShop business to AWS using AWS CDK for an easily scalable, high availability architecture. The detailed structure encompasses several pillars: Compute, Storage, Database, Cache, CDN, DNS, Security, and Backup.

Read full post