DEV Community

Harish Aravindan
Harish Aravindan

Posted on

Serverless Bedrock: How I invoke Claude from Lambda in warrantyAI

Every week I ship a new piece of warrantyAI — an AI-powered warranty management system I'm building on AWS. This week was Week 8: a 3-agent LangGraph pipeline wired to Bedrock.

Before the agents could do anything, I needed one thing to work cleanly: invoking Claude from a Lambda function without a server, without a container fleet, without an inference endpoint sitting idle burning money.

Building warrantyAI on AWS: AI-powered warranty management system | Harish Aravindan posted on the topic | LinkedIn

𝗪𝗲𝗲𝗸 𝟴 𝗼𝗳 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝘄𝗮𝗿𝗿𝗮𝗻𝘁𝘆𝗔𝗜 👉 𝗙𝗼𝗿 𝘁𝗵𝗼𝘀𝗲 𝘀𝗲𝗲𝗶𝗻𝗴 𝘁𝗵𝗶𝘀 𝗳𝗼𝗿 𝘁𝗵𝗲 𝗳𝗶𝗿𝘀𝘁 𝘁𝗶𝗺𝗲 I'm a Senior Cloud Engineer building an AI-powered warranty management system on AWS — from scratch, one week at a time. No shortcuts. Real architecture. Real cost numbers. 𝗧𝗵𝗶𝘀 𝘄𝗲𝗲𝗸: 𝗜 𝘄𝗶𝗿𝗲𝗱 𝟯 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝘁𝗼𝗴𝗲𝘁𝗵𝗲𝗿 𝘂𝘀𝗶𝗻𝗴 𝗟𝗮𝗻𝗴𝗚𝗿𝗮𝗽𝗵 The problem warrantyAI solves: Most people lose track of their warranties. Appliances expire. Repairs get denied. Money is wasted. warrantyAI reads the document, classifies the risk, and reminds you before it’s too late. This week I built the core pipeline that makes that happen. Reader → Classifier → Reminder 📄 𝗥𝗲𝗮𝗱𝗲𝗿 𝗔𝗴𝗲𝗻𝘁 Customer uploads a warranty PDF to S3. Textract pulls the raw text. Bedrock Haiku structures it into named fields — product, brand, expiry date, serial number. 🔍 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗲𝗿 𝗔𝗴𝗲𝗻𝘁 Takes those fields and classifies the warranty. Haiku first — fast and cheap. If confidence drops below 70%, automatically retries with Sonnet. GovernanceShield guardrail (built in Week 7) runs on every invocation. Outputs: category, expiry date, risk level. 🔔 𝗥𝗲𝗺𝗶𝗻𝗱𝗲𝗿 𝗔𝗴𝗲𝗻𝘁 Reads risk level from shared state. Generates a human-readable notification via Haiku. Publishes to SNS — but only for medium and high risk. Low risk? Message generated, not sent. Deliberate FinOps decision. SNS isn’t free at scale. Repo : https://lnkd.in/gsndTpQV 𝗪𝗵𝗮𝘁 𝗵𝗲𝗹𝗱 𝘁𝗵𝗶𝘀 𝘁𝗼𝗴𝗲𝘁𝗵𝗲𝗿: 𝗪𝗮𝗿𝗿𝗮𝗻𝘁𝘆𝗦𝘁𝗮𝘁𝗲 One typed Python dict shared across all 3 agents. No message queues between agents. No shared database mid-pipeline. Each agent reads from it, writes back a partial update. LangGraph handles the sequencing. What connected cleanly from previous weeks: ✔ Week 7 GovernanceShield guardrail — one import, plugged straight in ✔ Per-agent IAM roles already existed — zero new permissions needed ✔ S3 audit bucket already live — all 3 agents write to it Building incrementally pays off. What’s your multi-agent orchestration framework of choice right now? #AIPlatformEngineering #LangGraph #AWSBedrock #warrantyAI #Serverless #AI

favicon linkedin.com

Here's exactly how I did it.


Why serverless + Bedrock is the right combo

Bedrock's invoke_model API is synchronous and stateless. It takes a request, returns a response. That's exactly what Lambda is built for. No warm model, no GPU instance, no ECS cluster. You pay per invocation, per token.

For warrantyAI's workload — sporadic document uploads, not a real-time chat product — this matters. My entire system runs under $1.30/day.


The setup: IAM first, always

Before any code, the Lambda execution role needs this policy:

{
  "Effect": "Allow",
  "Action": [
    "bedrock:InvokeModel",
    "bedrock:InvokeModelWithResponseStream"
  ],
  "Resource": [
    "arn:aws:bedrock:ap-south-1::foundation-model/anthropic.claude-haiku-4-5-20251001",
    "arn:aws:bedrock:ap-south-1::foundation-model/anthropic.claude-sonnet-4-6"
  ]
}
Enter fullscreen mode Exit fullscreen mode

Scope it to specific model ARNs. Not *. Ever.


The invoke wrapper

This is the core function I reuse across all 3 agents in warrantyAI:

import json
import boto3

bedrock = boto3.client("bedrock-runtime", region_name="ap-south-1")

HAIKU  = "anthropic.claude-haiku-4-5-20251001"
SONNET = "anthropic.claude-sonnet-4-6"

def invoke_bedrock(prompt: str, model_id: str = HAIKU, max_tokens: int = 512) -> str:
    """
    Invoke a Bedrock Claude model from Lambda.
    Returns the text response as a string.
    """
    response = bedrock.invoke_model(
        modelId=model_id,
        contentType="application/json",
        accept="application/json",
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "messages": [
                {"role": "user", "content": prompt}
            ]
        })
    )
    body = json.loads(response["body"].read())
    return body["content"][0]["text"].strip()
Enter fullscreen mode Exit fullscreen mode

That's it. Stateless, reusable, testable in isolation.


Haiku-first, Sonnet fallback

Haiku is fast and cheap. Sonnet is accurate and expensive. In warrantyAI's Classifier agent, I try Haiku first. If it returns low confidence, I retry with Sonnet automatically:

def classify_warranty(structured_data: dict) -> dict:
    prompt = build_classify_prompt(structured_data)

    # Attempt 1: Haiku
    result = invoke_bedrock(prompt, model_id=HAIKU)
    parsed = json.loads(result)

    # Fallback: Sonnet if confidence < 0.7
    if parsed.get("confidence", 0) < 0.7:
        result = invoke_bedrock(prompt, model_id=SONNET)
        parsed = json.loads(result)
        parsed["model_used"] = "sonnet"
    else:
        parsed["model_used"] = "haiku"

    return parsed
Enter fullscreen mode Exit fullscreen mode

In practice, Haiku handles ~85% of documents. Sonnet kicks in for complex commercial warranties with ambiguous clause structures.


Three things that will burn you

1. The body is a StreamingBody, not a string.
Always call .read() before json.loads(). Forget this once and you'll spend 20 minutes confused.

# Wrong
body = json.loads(response["body"])

# Right
body = json.loads(response["body"].read())
Enter fullscreen mode Exit fullscreen mode

2. Token limits on Lambda payloads.
Lambda has a 6MB synchronous response limit. Bedrock responses are usually tiny, but if you're passing large documents in your prompt, chunk them first. I cap prompts at 4,000 characters in the Reader agent.

3. Bedrock is regional.
Not all models are available in all regions. ap-south-1 (Mumbai) supports Haiku and Sonnet. If you get a ResourceNotFoundException, check model availability in your region first before debugging your code.


Cost reality check

For warrantyAI's workload (roughly 50 documents/day):

Model Avg tokens/call Cost/call Daily cost
Haiku ~800 ~$0.0004 ~$0.017
Sonnet (15% of calls) ~800 ~$0.006 ~$0.005

Total Bedrock cost: under $0.025/day for this workload.
The rest of my $1.30/day budget goes to Textract, SNS, and S3.


What's next

This pattern is the foundation for the entire warrantyAI pipeline. Next Sunday I'll cover how I wired these invocations into a LangGraph StateGraph — three agents, one shared state dict, no message queues.

Follow along if you're building serverless AI on AWS. I publish every Sunday in LinkedIn

This is part of the Serverless Meets AI series — practical AWS patterns from building warrantyAI.

Top comments (0)