DEV Community: Tidding Ramsey

I Used Amazon Bedrock as My AI Coding Partner for a Day Here's What Happened

Tidding Ramsey — Wed, 27 May 2026 11:31:12 +0000

How generative AI on AWS helped me summarize feedback, squash bugs, and write better Python without leaving the console.

Introduction

I recently completed a hands-on lab using Amazon Bedrock, AWS's managed generative AI service, and I came away genuinely impressed. Not just by what the AI could do, but by how quickly it slotted into real development workflows.

In this article, I'll walk you through what I learned: from touring the Bedrock console to using it as a live coding assistant. Whether you're an AWS veteran or just curious about practical AI tooling, I think there's something here for you.

What Is Amazon Bedrock?

Amazon Bedrock is a fully managed AWS service that gives you on-demand access to large language models (LLMs) without having to provision servers, manage infrastructure, or train models from scratch.

It comes in two flavors:

Serverless models : Fully managed foundation models from AWS and partner AI companies (like Anthropic, Meta, Mistral, and others)
Marketplace models : Over 100 specialized models deployed on managed Amazon SageMaker endpoints

One thing that stood out to me: Bedrock isn't just a chat interface. It's a full AI application platform with tools for:

Agents : Automate tasks by connecting LLMs to APIs and data sources

Flows : Chain Bedrock tools and AWS services into end-to-end AI pipelines

Knowledge Bases : Upload your own docs and build Q&A bots grounded in real content

Prompt Management : Version, test, and reuse prompts across multiple applications

For this lab, I focused on the Chat / Text playground using the Amazon Nova Micro model.

Part 1: Summarizing Messy, Unstructured Feedback

The first task felt immediately practical: summarize a wall of rambling customer feedback and extract actionable improvements.

The prompt structure I used was simple but effective:

Summarize the following feedback and produce action points for fixes and improvements:

Separating the instruction from the data (with a blank line) made the prompt cleaner both for me to read and, arguably, for the model to parse.

The feedback itself was a fictional but realistic stream-of-consciousness review of a car parts app. Full of metaphors, colloquialisms, and run-on sentences. The kind of thing you'd actually get from a user interview.

What the model returned: A clean, structured list of positives and improvement suggestions search UX, cart flow, missing features like a compatibility wizard and saved parts lists, shipping cost transparency, and live support access.

Then I pushed further:

List the top three improvements that would likely have the biggest impact on customer satisfaction.

No need to repeat the context. The model remembered the conversation and narrowed the list down intelligently.

Key takeaway: Bedrock excels at distilling unstructured human language into structured, actionable output. This is genuinely useful for product teams doing requirements gathering or user research synthesis.

Part 2: Using AI as a Coding Assistant

This is where things got interesting for me as a developer.

Fixing a ZeroDivisionError

I started with a simple Python function:

def divide(x, y):
    return x / y

Calling divide(10, 0) throws a ZeroDivisionError. I asked Bedrock:

Add exception handling for dividing by zero to this Python3 function:

def divide(x, y):
    return x / y

The model returned a version using try/except blocks, specifically catching ZeroDivisionError and returning a useful message instead of crashing. Clean, idiomatic Python.

A small but important habit I developed: always specify the language version. Python 2 and Python 3 differ significantly. The more context you give the model, the more relevant the output.

Improving the Fibonacci Algorithm

Next, I worked with a classic recursive Fibonacci implementation:

def fibonacci(n):
    if n <= 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)

This works but it's slow. The time complexity is O(2^n), which means for large values of n, it becomes unusably slow very quickly.

I asked Bedrock to evaluate the performance and suggest improvements. The model came back with two solid suggestions:

Memoization : Cache already-computed values so they're not recalculated
Iterative approach : Replace recursion with a loop, dropping complexity to O(n)

I then asked Bedrock to directly compare the time complexity of both approaches. It explained the difference clearly — the recursive version re-computes the same values exponentially, while the iterative version computes each value exactly once.

Key takeaway: Bedrock is genuinely useful for algorithm analysis. Asking "can this be improved?" and "compare these two implementations" are high-value prompts for any developer trying to write more performant code.

Part 3: Understanding and Testing Unfamiliar Code

The final section was perhaps the most practically useful for day-to-day work: using AI to understand code you didn't write.

Take this function:

def middle(arr):
    while len(arr) > 1:
        del arr[len(arr) // 2]
    return arr[0]

At a glance, it's not obvious what this does. I asked Bedrock to describe it:

Describe how the following Python 3 code works:

def middle(arr):
    ...

The model walked through it step-by-step. It also flagged some issues — notably that the function mutates the original list (a side effect most callers won't expect), and that there's no error handling.

From there, I used an improved version of the function and asked Bedrock to generate a unit test:

Generate a unit test for the following Python function using Python's built-in unittest module.
The test class should have one test that tests that None is returned if the argument is None.

def find_middle_element(arr):
    ...

The model produced a proper unittest.TestCase class, which I dropped into a file in VS Code and ran immediately. It passed.

Bonus: Generating Test Data

One underrated use case generating fake data for testing:

Generate some test data for users that includes name, address, phone number, 
and widget order history. Display the test data in the JSON data format.

Within seconds, I had realistic-looking JSON I could plug straight into tests or seed scripts. The model can also format this as YAML or TOML if you prefer.

Bedrock Pricing: What You Should Know

Bedrock prices by tokens chunks of text (roughly words or word fragments) processed as input and output.

Two pricing models are available:

Model	Best for
On-Demand	Light, infrequent, or unpredictable usage
Provisioned Throughput	Predictable, high-volume, or production use

The chat playground also shows you latency and token counts in real time useful for estimating costs before committing to a production integration.

Model Customization (If You Need It)

Out of the box, foundation models are general-purpose. But Bedrock also supports three customization approaches:

Fine-tuning : Adjust tone, verbosity, or vocabulary using labeled examples
Distillation : Transfer knowledge from a large model to a smaller, faster one (up to 500% faster, 75% cheaper)
Pre-training : Expose a model to your domain-specific data corpus

For most developer use cases, you won't need customization — the base models are powerful and flexible. But it's good to know the option exists.

Things to Watch Out For

A few honest caveats from working with Bedrock:

Hallucinations are real. LLMs sometimes generate confident-sounding output that's just wrong. Always review generated code before running it.

Responses aren't deterministic. The same prompt can yield different outputs each time. Don't expect bit-for-bit reproducibility.

Context windows are finite. Each model has a maximum context length. For very long codebases or documents, you may need to chunk your input across multiple prompts.

Prompt engineering matters. Adding context (e.g., "You are an expert AWS cloud engineer") meaningfully improves response quality. Being specific about language versions, libraries, and constraints helps too.

Final Thoughts

Amazon Bedrock made it easy to slot AI into workflows I already use — reviewing code, writing tests, making sense of user feedback. Nothing about it felt like magic. It felt like a well-calibrated tool that rewards thoughtful prompting.

If you're on AWS and haven't explored Bedrock yet, the Chat / Text playground is a zero-friction starting point. Pick a model, type a prompt, see what happens. The learning curve is low. The upside is real.

Have you used Amazon Bedrock or another LLM in your development workflow? I'd love to hear what worked (and what didn't) drop a comment below.

Building My First AI Agent with Strands SDK and Amazon Bedrock Errors, Fixes & Lessons Learned

Tidding Ramsey — Sat, 09 May 2026 21:24:26 +0000

Introduction

I recently attended an AWS event where we built our first AI agent using the Strands Agents SDK and Amazon Bedrock. The quickstart guide looked simple enough — a few lines of Python, some tools, and a running agent. But the real learning happened in the errors. This article walks you through what I built, every error I hit, and exactly how I fixed them.

What We Built

A simple AI agent that can:

Tell you the current time
Perform calculations
Count letters in a word

Three tools. One agent. Sounds easy. It wasn't — but that's what made it worth writing about.

Project Structure

Here's the folder structure I used:

Agent/
├── .venv/
├── agent.py
└── requirements.txt

Setting Up the Environment

First, create and activate a virtual environment:

python -m venv .venv
.venv\Scripts\Activate.ps1   # Windows PowerShell

Then install the required packages:

pip install strands-agents strands-agents-tools

The Agent Code (Starting Point)

The quickstart gave us this agent.py:

from strands import Agent, tool
from strands_tools import calculator, current_time

@tool
def letter_counter(word: str, letter: str) -> int:
    """
    Count occurrences of a specific letter in a word.

    Args:
        word (str): The input word to search in
        letter (str): The specific letter to count

    Returns:
        int: The number of occurrences of the letter in the word
    """
    if not isinstance(word, str) or not isinstance(letter, str):
        return 0
    if len(letter) != 1:
        raise ValueError("The 'letter' parameter must be a single character")
    return word.lower().count(letter.lower())

agent = Agent(tools=[calculator, current_time, letter_counter])

message = """
I have 3 requests:
1. What is the time right now?
2. Calculate 3111696 / 74088
3. Tell me how many letter R's are in the word "strawberry" 🍓
"""

agent(message)

Simple and clean. Then I ran it. And the errors began.

Error 1: Anthropic Use Case Form Not Submitted

botocore.errorfactory.ResourceNotFoundException: 
Model use case details have not been submitted for this account.
Fill out the Anthropic use case details form before using the model.
└ Model id: global.anthropic.claude-sonnet-4-6

What happened: The Strands SDK defaults to Amazon Bedrock with Claude Sonnet 4. But Anthropic requires first-time users to submit a use case form before accessing their models on Bedrock.

How I fixed it:

Went to the AWS Bedrock Console in us-east-1
Navigated to Model Catalog and searched for Claude Sonnet 4.6
Clicked the model and hit "Submit use case details"
Filled out the form:
- Company name, website, industry
- Intended users: Internal
- Use case: "Building an AI agent for learning and demonstration purposes at an AWS event using the Strands SDK and Amazon Bedrock"
Submitted and waited ~15 minutes

Once the "Submit use case details" button disappeared and "Open in playground" appeared, I knew it was approved.

Error 2: Missing Dependency for AWS Login

botocore.exceptions.MissingDependencyException: 
Using the login credential provider requires an additional dependency.
You will need to pip install "botocore[crt]" before proceeding.

What happened: The AWS credential provider I was using needed an extra C extension package called awscrt.

How I fixed it:

pip install "botocore[crt]"

That installed awscrt-0.32.2 and resolved the issue immediately.

Error 3: Wrong Model ID — ValidationException

botocore.errorfactory.ValidationException: 
The provided model identifier is invalid.
└ Model id: us.anthropic.claude-sonnet-4-6-20251031-v1:0

What happened: I had added a BedrockModel to my code with a guessed model ID, but it wasn't valid for my account.

How I fixed it: I ran this command to list all valid inference profile IDs available to my account:

aws bedrock list-inference-profiles --region us-east-1 --profile tidding --query "inferenceProfileSummaries[?contains(inferenceProfileId, 'anthropic')].inferenceProfileId"

It returned a list including:

"us.anthropic.claude-sonnet-4-6"

That was the correct ID. No version suffix needed.

The Final Working Code

After all the fixes, here is the final agent.py:

from strands import Agent, tool
from strands.models import BedrockModel
from strands_tools import calculator, current_time

# Define a custom tool using the @tool decorator
@tool
def letter_counter(word: str, letter: str) -> int:
    """
    Count occurrences of a specific letter in a word.

    Args:
        word (str): The input word to search in
        letter (str): The specific letter to count

    Returns:
        int: The number of occurrences of the letter in the word
    """
    if not isinstance(word, str) or not isinstance(letter, str):
        return 0
    if len(letter) != 1:
        raise ValueError("The 'letter' parameter must be a single character")
    return word.lower().count(letter.lower())

# Specify the Bedrock model explicitly using the correct inference profile ID
model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-6",
    region_name="us-east-1"
)

# Create the agent with tools
agent = Agent(model=model, tools=[calculator, current_time, letter_counter])

# Ask the agent to handle multiple tasks
message = """
I have 3 requests:
1. What is the time right now?
2. Calculate 3111696 / 74088
3. Tell me how many letter R's are in the word "strawberry" 🍓
"""

agent(message)

Run it with:

$env:AWS_PROFILE = "tidding"
python agent.py

Key Lessons Learned

1. Always submit the Anthropic use case form first. It's a one-time requirement per AWS account and blocks everything until done.

2. Don't guess model IDs. Use aws bedrock list-inference-profiles to get the exact valid ID for your account.

3. The default Strands model ID may not match what your account supports. Always specify the model explicitly using BedrockModel.

4. botocore[crt] is required when using the AWS login credential provider on Windows. Install it early.

5. Set your AWS profile before running the agent:

$env:AWS_PROFILE = "yourprofile"

What the Agent Loop Looks Like

Once running, the Strands agent follows this loop:

Input → Reasoning (LLM) → Tool Selection → Tool Execution → Back to Reasoning → Response

The agent automatically decides which tools to use based on your message. For our three requests, it fired current_time, calculator, and letter_counter — all in one go.

Conclusion

The Strands SDK makes building AI agents genuinely simple — once you get past the AWS setup hurdles. The errors I faced were all configuration-related, not code-related. Once the environment was right, the agent worked beautifully in just a few lines of Python.

If you're attending an AWS event and hitting these same errors, I hope this saves you time. Drop a comment if you're stuck — happy to help!

Built at an AWS Event · Strands Agents SDK · Amazon Bedrock · Claude Sonnet 4.6

How I Evaluated an AI Model on AWS Without Writing a Single Line of Training Code

Tidding Ramsey — Sat, 09 May 2026 13:26:57 +0000

A step-by-step guide to Amazon Bedrock's model evaluation feature from S3 setup to reading real results

Ever wondered whether the AI model you're about to plug into your production system actually knows what it's doing? Me too. That's exactly what Amazon Bedrock's model evaluation feature is built for and after running through it myself, I'm genuinely impressed at how accessible it is.

No PhD. No GPU clusters. No tears. Just AWS, an S3 bucket, and a few JSON prompts.

Let's walk through the whole thing start to finish.

What Even Is Amazon Bedrock?

Amazon Bedrock is AWS's managed Generative AI service. Instead of spending months training, hosting, and scaling foundation models yourself, Bedrock lets you call them like an API. Think of it as the "serverless" moment for AI — the infrastructure complexity disappears and you focus on what actually matters: using the models.

One of its best-kept features is model evaluation a way to run a model against a set of prompts, compare its responses to expected answers, and get a performance score back. It's perfect for building confidence before you commit a model to your workflow.

Here's what we're going to build today:

Prompt Dataset (S3) → Bedrock Evaluation Job → Results (S3) → Insights
The before architecture

The after architecture

Step 1: Log In and Orient Yourself

Head to the AWS Management Console and sign in. Make sure you're in the US West (Oregon) / us-west-2 region model evaluation support varies by region and you want to be where the models live.

Once you're in, you'll use two core services today:

Amazon S3 — to store your prompt dataset and receive evaluation results
Amazon Bedrock — to run the actual evaluation job

Step 2: Create Your S3 Buckets

You need two S3 buckets — one to hold your prompt dataset, another to receive evaluation results. Let's create both from scratch.
In the AWS search bar, type S3 and open the service. Click Create bucket.

Bucket 1: The Prompt Dataset Bucket
This bucket holds the questions you'll throw at the model.
Click Create bucket
Give it a name — something like bedrock-prompt-dataset-yourname-2026. S3 bucket names must be globally unique across all of AWS, so add something personal or random to the end
Make sure the region is set to us-west-2 (Oregon)
Leave everything else as default keep Block all public access enabled
Click Create bucket
Bucket 2: The Output Bucket
This is where Bedrock will write the evaluation results.
Click Create bucket again
Name it something like bedrock-eval-output-yourname-2025
Same region: us-west-2
Leave defaults, click Create bucket

You should now see both buckets in your S3 console.
Build Your Prompt Dataset

What's in the Prompt Dataset?

The prompt dataset is a .jsonl file (one JSON object per line) where each object has three fields:

json
{"prompt": "The chemical symbol for gold is", "category": "Chemistry", "referenceResponse": "Au"}
{"prompt": "The tallest mountain in the world is", "category": "Geography", "referenceResponse": "Mount Everest"}
{"prompt": "The author of 'Great Expectations' is", "category": "Literature", "referenceResponse": "Charles Dickens"}

Notice the structure:

prompt — what you'll send to the model
referenceResponse — the ground truth you're checking against
category — for grouping results later

In production, you'd replace these general-knowledge questions with prompts that mirror your real use case. Customer support queries. Code generation tasks. Medical summaries. Whatever you're building for.

Add a CORS Configuration to the Dataset Bucket

Bedrock needs cross-origin access to read from your S3 bucket. Here's how to enable it:

Click into your prompt dataset bucket
Go to the Permissions tab
Scroll to Cross-origin resource sharing (CORS)
Click Edit and paste this config: json

[
  {
    "AllowedHeaders": ["*"],
    "AllowedMethods": ["GET", "PUT", "POST", "DELETE"],
    "AllowedOrigins": ["*"],
    "ExposeHeaders": ["Access-Control-Allow-Origin"]
  }
]

Click Save changes

That's it. This tells S3: "Yes, Amazon Bedrock is allowed to read from me." Without this, the evaluation job will fail silently — so don't skip it.

Pro tip: In a real production setup, you'd tighten the AllowedOrigins to specific Bedrock endpoints rather than using "*". For now, this gets us moving.

Step 3: Create the Model Evaluation Job

Back to the AWS search bar — type Bedrock and open Amazon Bedrock.

Expand the left-hand menu (hamburger icon, top-left)
Under Assess, click Evaluations

Click Create Automatic: Programmatic

Now fill in the job configuration:

Model Evaluation Details

Evaluation name: Something unique like my-eval-job-abc123
Model provider: Amazon
Model: Nova Micro
Task type: Question and answer

Metrics

Click Remove on any extra metrics until only one remains. Set it to Accuracy. This metric compares the model's response against your referenceResponse and returns a score.

Prompt Dataset

Select Use your own prompt dataset and enter your S3 path:

s3://your-prompt-dataset-bucket-name/prompt_dataset.json

Evaluation Results

Point this to your output bucket:

s3://your-output-bucket-name/evaluation-results/

IAM Role

Bedrock needs a role to access your S3 buckets on its behalf. Let's create one real quick.
In a new browser tab, go to IAM → Roles → Create role and follow these steps:

Trusted entity Select AWS service, then under Use case search for and select Bedrock. Click Next.
Permissions Attach these two policies: AmazonBedrockFullAccess AmazonS3FullAccess Click Next.
Name and create Name the role something like bedrock-eval-role, then click Create role. Back on the Bedrock evaluation page, under Amazon Bedrock IAM role, select Use an existing role, click the dropdown, and pick bedrock-eval-role.

*In production you'd scope the S3 policy down to only your two specific buckets — but for getting started, AmazonS3FullAccess does the job.

Click Create and watch the job appear with status In progress.

Step 4: Read the Results (This Is the Fun Part)

Once the job completes, head back to S3 and look inside your output bucket under evaluation-results/. You'll find a .jsonl file with one result per prompt. Here's what the raw output looks like:

{
  "automatedEvaluationResult": {
    "scores": [{"metricName": "Builtin.Accuracy", "result": 0.0625}]
  },
  "inputRecord": {
    "prompt": "The chemical symbol for gold is",
    "referenceResponse": "Au",
    "category": "Chemistry"
  },
  "modelResponses": [{
    "response": "The chemical symbol for gold is Au.",
    "modelIdentifier": "us.amazon.nova-micro-v1:0",
    "stopReason": "end_turn"
  }]
}

Breaking Down the Accuracy Scores

Here's a summary of the three prompts from our run:

Looking at the three prompts from our run, the Chemistry question ("The chemical symbol for gold is") scored 0.0625, the Geography question ("The tallest mountain in the world is") came in slightly higher at 0.0870, and the Literature question ("The author of 'Great Expectations' is") landed at 0.0727. All three were answered correctly — Au, Mount Everest, and Charles Dickens respectively yet the scores are nowhere near 1.0.

Wait These Scores Look Low. Is That Bad?

Here's where it gets interesting. The accuracy scores seem low because the scoring algorithm is doing token-level matching between the model's verbose answer and the short reference response.

The model answered correctly in all three cases it said "Au", "Mount Everest", and "Charles Dickens". But it also said a lot of other things (it explained its reasoning step-by-step). Those extra tokens pulled the accuracy score down.

This is a critical lesson: how you write your prompts and reference responses dramatically affects your scores. If you want higher accuracy scores:

// Instead of this reference response:
{"referenceResponse": "Au"}

// Try instructing the model to answer concisely in your prompt:
{"prompt": "Answer in one word only. The chemical symbol for gold is:", "referenceResponse": "Au"}

That's the value of model evaluation — it surfaces these kinds of nuances before you go to production.

What's Next?

Now that you understand the pipeline, here's how to level it up:

Swap models — Run the same dataset against Nova Lite, Nova Pro, or even Claude models to compare them head-to-head
Use real prompts — Replace the sample dataset with 50-100 prompts from your actual use case
Automate — Trigger evaluation jobs via the AWS CLI or SDK as part of your CI/CD pipeline
Track over time — Save scores to a database and chart model performance as you update prompts or switch models

Quick Recap

Here's the full flow in one breath:

Upload a .jsonl prompt dataset to S3
Add a CORS config to the S3 bucket so Bedrock can read it
Create a Bedrock model evaluation job pointing at Nova Micro
Wait for it to run, then read the .jsonl results in your output bucket
Interpret the accuracy scores in context — verbose model answers score lower even when correct

Amazon Bedrock's model evaluation feature removes one of the biggest unknowns in AI integration: "Can this model actually answer my questions reliably?" Now you have a repeatable, automated answer.

Go build something confident.

Have questions or want to share your evaluation results? Drop them in the comments below.