Ifeanyi O. for AWS

Posted on May 8 • Originally published at builder.aws.com

I Built My Mom an AI Recipe Helper for Mother's Day

#aws #ai #agents #mothersday

A project about a fridge, Strands agents & trying to keep up with Mom's seasons.

My Mom's Problem

My mom has a drawer full of recipes. Yes! Not a folder.... an actual drawer. Torn pages from magazines, things written on the back of envelopes and binders that've been around longer than I have. Whenever she's looking for dinner ideas she pulls something out, sighs and puts it back.

The recipe is not the issue here, that's always fine. It's that she's missing one ingredient or four!

And even if she has everything, the recipe might not fit whatever season she's in at the moment because the other thing is, her eating is a moving target.

One month she's a vegetarian because a friend got her into it, a few months later she's watching her sugar because the doctor mentioned it during a checkup, then it's low sodium or she's "just trying to be a little better with dinners." You just never really know which version is going to be at the stove on a given Tuesday.
But the real question she's always asking, underneath the recipe hunt, is...

What could I make with what I already have, that fits how I'm eating right now, that I won't get bored with?

Now that sounds like problem for an agent! So for Mother's Day this year, I built her one and called it AskMom Recipes. You can try it out here.

What it does in 30 seconds

You open a pink website and tell it what's in your kitchen. You can type it, or you can just take a photo of your groceries on the counter and drop it in.

You also have the option to pick how you're eating this week, for eg. no restriction, vegetarian, low sodium, diabetic-friendly or gluten-free. Then hit the "Suggest recipes" button.
A couple seconds later, you'll get three recipes and each one tells you:

What you already have vs. what you'd need to grab
Simple, unfussy steps (the kind of thing my mom actually cooks)
Why it's good for you, with real nutrition numbers from the USDA
One sentence about where the dish possibly comes from

If you don't love the recipes, you can tap "make it healthier" or "something quicker" or "fewer ingredients" and it adjusts on the fly.

That's it! The rest of this blog is how I built it and the design decisions that made the difference between "okay, cool demo that kinda works" and "something my mom would actually use."

Why an agent and not a prompt

An easy version of this would just be to throw the ingredients into a single prompt, ask for three recipes, done. I tried it, it's fine but there's problems that show up the moment you try to use it for real.
First, the model will happily make up nutrition numbers. Ask it how much protein is in chicken and it'll give you a number, confidently and you'll have no idea if it's right.

Second, it'll cheerfully tell you that tomatoes come from Italy or that pasta is a Chinese invention. Third, if you want to say "actually, make that healthier" the prompt has to start over from scratch because there's no memory.

An agent solves two of those three. As for the made up facts we needed something else which we'll get.

I used Strands Agents, which is a refreshingly simple Python SDK for building agents on Amazon Bedrock. You define your tools as functions, write a system prompt and Strands handles the loop of "model decides what tool to call, tool runs, result goes back to model, repeat."

It's the first agent framework I've used that didn't make me want to throw my laptop.

def build_agent(model_id: Optional[str] = None) -> Agent:

    model_id = model_id or os.environ.get(
        "BEDROCK_MODEL_ID", "anthropic.claude-3-haiku-20240307-v1:0"
    )
    region = os.environ.get("AWS_REGION", "us-east-1")

    model = BedrockModel(
        model_id=model_id,
        region_name=region,
        temperature=0.4,
    )

    return Agent(
        model=model,
        system_prompt=SYSTEM_PROMPT,
        tools=[
            extract_ingredients_from_image,
            extract_ingredients_from_text,
            suggest_recipes,
        ],
    )

With Strands, setting up the agent is almost anticlimactic. There's a model, a prompt and a list of tools. That's the whole thing.

I wanted to flag this, because it took me a few iterations to get right, the tools I gave the agent are not "one tool for everything." They're small, single-purpose and named like verbs.

The tools and why there aren't very many

Everyone always goes tool crazy, when they first build an agent. Each tool feels free so why not just add them all, right? Wrong!

The problem is every tool call is another round trip through the model. The model says "I'd like to call X," you run X, hand the result back, it thinks for a second and says "now I'd like to call Y." Each hop is a few seconds but when you add six tool calls, then your recipe helper takes forty seconds and your API times out.

So I decided to split the work in two:

Things the agent handles (because they genuinely need a model's judgment):

Reading a photo and identifying what's in it
Normalizing messy text like "some chicken, rice, maybe garlic" into a clean list
Generating three healthy recipes that fit the user's preferences

Things Python handles (because they just need a lookup):

Looking up real nutrition numbers
Formatting a recipe card for the frontend

Since Python can fetch faster than the model can ask, why would I let a language model guess when there's a real source? USDA has an API with the actual nutrition numbers.

This one design decision cut request latency from ~34 seconds to ~17 seconds. You get the same quality in half the time with way less money spent on Bedrock tokens and it made the agent more reliable with fewer retries and self corrections that eat tokens for no reason.

The lesson is to use the LLM for the things that need reasoning and use code for the things that need answers.

Preference handling

Alright, back to my mom for a second. The seasons thing, vegetarian one month and low sodium the next taught me more about preference handling that I didn't expect.

The naive approach is a freeform text field that ask you to type your dietary preferences. The problem is the model has to figure out what "watching my sugar a bit" means versus "diabetic-friendly" versus "low-carb." Sometimes it guesses wrong and sometimes the recipes it produces have way too much added.

So I made it a dropdown. Five options:

No restriction
Vegetarian
Low sodium
Diabetic-friendly
Gluten-free

Each one maps to a crisp, one sentence guidance block that gets injected into the recipe prompt:

_PREFERENCE_GUIDANCE = {
    "none": "No dietary restrictions. Focus on healthy, balanced meals.",
    "vegetarian": "Strictly vegetarian: no meat, poultry, or fish.",
    "low_sodium": "Low sodium: minimize salt and high-sodium ingredients like soy sauce or canned broths.",
    "diabetic_friendly": "Diabetic-friendly: low glycemic index, limit added sugars and refined carbs.",
    "gluten_free": "Strictly gluten-free: no wheat, barley, rye, or standard soy sauce.",
}

It's five lines of Python.

A dropdown sounds less impressive than "the AI understands your preferences," but the dropdown is more reliable and trust me, reliable is what my mom needs.

When it comes to agent patterns, the more you can constrain the inputs, the better the outputs. Freeform is cool in demos but more structure is better in real life.

The "send a photo of your groceries" part

I wanted to keep the typed-ingredients path simple, but the photo path is where it gets fun. You take a picture of whatever's on your counter and the agent reads the photo, identifies the food and treats it the same as if you'd typed it in.

Under the hood this is one of Strands' best party tricks. Claude 3 Haiku on Bedrock handles vision natively, so the whole thing is one tool call:

@tool
def extract_ingredients_from_image(s3_key: str) -> List[str]:
    """Look at a photo of groceries and return the ingredients visible in it."""
    bucket = os.environ["UPLOADS_BUCKET_NAME"]
    region = os.environ.get("AWS_REGION", "us-east-1")

    # 1. Fetch image bytes from S3
    s3 = boto3.client("s3", region_name=region)
    obj = s3.get_object(Bucket=bucket, Key=s3_key)
    image_bytes = obj["Body"].read()
    media_type = obj.get("ContentType", "image/jpeg")

    # 2. Ask Bedrock Claude Haiku what's in the photo
    prompt = (
        "List only the food ingredients you can clearly see in this photo. "
        "Return a JSON array of short lowercase names, nothing else. "
        'Example: ["tomato", "onion", "chicken breast"]. '
        "If you can't identify any food, return []."
    )
    bedrock = boto3.client("bedrock-runtime", region_name=region)
    response = bedrock.invoke_model(
        modelId=os.environ["BEDROCK_MODEL_ID"],
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 400,
            "messages": [{
                "role": "user",
                "content": [
                    {"type": "image", "source": {
                        "type": "base64",
                        "media_type": media_type,
                        "data": base64.standard_b64encode(image_bytes).decode(),
                    }},
                    {"type": "text", "text": prompt},
                ],
            }],
        }),
    )

    # 3. Pull the JSON array back out (defensively)
    text = json.loads(response["body"].read())["content"][0]["text"]
    match = re.search(r"\[.*?\]", text, re.DOTALL)
    if not match:
        return []
    try:
        items = json.loads(match.group(0))
    except json.JSONDecodeError:
        return []
    return [str(x).strip().lower() for x in items if str(x).strip()]

The prompt is boring on purpose. All is says is "List only the food ingredients you can clearly see in this photo. Return a JSON array of short lowercase names, nothing else." I did not give it a personality or tell it to "please think carefully". I also didn't give it examples of how the model should "reason." I just told it what I want and what format I want it in.

The model is very good at ambitious prompts and very good at literal ones. Ambitious prompts return ambitious, chatty and often wrong answers but literal prompts more closely return what you asked for. For this tool, where the output feeds directly into the next step, literal wins every time.

Also, note that the photo gets uploaded straight from the browser to S3 via a pre-signed URL, so my Lambda never has to deal with multipart file uploads. The agent just gets an S3 key and fetches it.

The Architecture

Here's the whole thing, in the order a request flows:

Browser loads static pink site from S3, served through CloudFront.
User submits text and/or photo. If photo, goes straight to S3 as pre-signed URL.
Browser calls POST /ingredients on API Gateway, which hits a Lambda.
Lambda builds a Strands agent with three tools, runs the agent loop.
Agent extracts ingredients (text + vision), then calls suggest_recipes once.
Lambda enriches each recipe in Python, origin lookup, USDA lookup and format.
Session is saved to DynamoDB with a 24-hour TTL for the "refine" follow up.
Recipes come back to the browser.

The whole thing is deployed with AWS CDK in Python, one stack, one make deploy. No Docker needed, no hand-wiring and definitely no clicking around in the console.

What it costs me to keep it running

For my mom's personal deployment, answering a few requests a week:

Bedrock: Tenth of a penny per recipe request.
Lambda: Free.
DynamoDB: Free.
S3 + CloudFront: Couple cents a month for the static site.
USDA API: Free.

I pay less for AskMom per month than for a single coffee. If I published it and a hundred people used it regularly, I'd still be under a few dollars a month!

Try it yourself

If you want to build one for your mom (or your dad, or anyone whose fridge is a mystery), the whole thing is open source.

Repo: github.com/rextheking/askmom-recipes-sample-for-aws-strands

The README walks through the deploy in six steps. You need:

An AWS account with Bedrock access enabled for Claude 3 Haiku in us-east-1
Python 3.12, Node 20/22/24, AWS CDK CLI and the AWS CLI
A free USDA FoodData Central API key (takes 30 seconds to get)

Then it's:

cd infra
make install
make bootstrap     # one time per AWS account
make deploy        # ~5-7 minutes, most of it CloudFront
# paste the ApiUrl into web/config.js
cd ../web
make deploy
# open the CloudFront URL

Note, the first deploy takes a few minutes because CloudFront is slow to propagate. If you're just kicking the tires, there's a -c with_cloudfront=false flag that skips CloudFront and gets you a working API in about 90 seconds.