Every week I ship a new piece of warrantyAI — an AI-powered warranty management system I'm building on AWS. This week was Week 8: a 3-agent LangGraph pipeline wired to Bedrock.
Before the agents could do anything, I needed one thing to work cleanly: invoking Claude from a Lambda function without a server, without a container fleet, without an inference endpoint sitting idle burning money.
Here's exactly how I did it.
Why serverless + Bedrock is the right combo
Bedrock's invoke_model API is synchronous and stateless. It takes a request, returns a response. That's exactly what Lambda is built for. No warm model, no GPU instance, no ECS cluster. You pay per invocation, per token.
For warrantyAI's workload — sporadic document uploads, not a real-time chat product — this matters. My entire system runs under $1.30/day.
The setup: IAM first, always
Before any code, the Lambda execution role needs this policy:
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:ap-south-1::foundation-model/anthropic.claude-haiku-4-5-20251001",
"arn:aws:bedrock:ap-south-1::foundation-model/anthropic.claude-sonnet-4-6"
]
}
Scope it to specific model ARNs. Not *. Ever.
The invoke wrapper
This is the core function I reuse across all 3 agents in warrantyAI:
import json
import boto3
bedrock = boto3.client("bedrock-runtime", region_name="ap-south-1")
HAIKU = "anthropic.claude-haiku-4-5-20251001"
SONNET = "anthropic.claude-sonnet-4-6"
def invoke_bedrock(prompt: str, model_id: str = HAIKU, max_tokens: int = 512) -> str:
"""
Invoke a Bedrock Claude model from Lambda.
Returns the text response as a string.
"""
response = bedrock.invoke_model(
modelId=model_id,
contentType="application/json",
accept="application/json",
body=json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"messages": [
{"role": "user", "content": prompt}
]
})
)
body = json.loads(response["body"].read())
return body["content"][0]["text"].strip()
That's it. Stateless, reusable, testable in isolation.
Haiku-first, Sonnet fallback
Haiku is fast and cheap. Sonnet is accurate and expensive. In warrantyAI's Classifier agent, I try Haiku first. If it returns low confidence, I retry with Sonnet automatically:
def classify_warranty(structured_data: dict) -> dict:
prompt = build_classify_prompt(structured_data)
# Attempt 1: Haiku
result = invoke_bedrock(prompt, model_id=HAIKU)
parsed = json.loads(result)
# Fallback: Sonnet if confidence < 0.7
if parsed.get("confidence", 0) < 0.7:
result = invoke_bedrock(prompt, model_id=SONNET)
parsed = json.loads(result)
parsed["model_used"] = "sonnet"
else:
parsed["model_used"] = "haiku"
return parsed
In practice, Haiku handles ~85% of documents. Sonnet kicks in for complex commercial warranties with ambiguous clause structures.
Three things that will burn you
1. The body is a StreamingBody, not a string.
Always call .read() before json.loads(). Forget this once and you'll spend 20 minutes confused.
# Wrong
body = json.loads(response["body"])
# Right
body = json.loads(response["body"].read())
2. Token limits on Lambda payloads.
Lambda has a 6MB synchronous response limit. Bedrock responses are usually tiny, but if you're passing large documents in your prompt, chunk them first. I cap prompts at 4,000 characters in the Reader agent.
3. Bedrock is regional.
Not all models are available in all regions. ap-south-1 (Mumbai) supports Haiku and Sonnet. If you get a ResourceNotFoundException, check model availability in your region first before debugging your code.
Cost reality check
For warrantyAI's workload (roughly 50 documents/day):
| Model | Avg tokens/call | Cost/call | Daily cost |
|---|---|---|---|
| Haiku | ~800 | ~$0.0004 | ~$0.017 |
| Sonnet (15% of calls) | ~800 | ~$0.006 | ~$0.005 |
Total Bedrock cost: under $0.025/day for this workload.
The rest of my $1.30/day budget goes to Textract, SNS, and S3.
What's next
This pattern is the foundation for the entire warrantyAI pipeline. Next Sunday I'll cover how I wired these invocations into a LangGraph StateGraph — three agents, one shared state dict, no message queues.
Follow along if you're building serverless AI on AWS. I publish every Sunday in LinkedIn
This is part of the Serverless Meets AI series — practical AWS patterns from building warrantyAI.
Top comments (0)