<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tidding Ramsey</title>
    <description>The latest articles on DEV Community by Tidding Ramsey (@tidding).</description>
    <link>https://dev.to/tidding</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1451266%2Ff7e98406-f676-49ad-9e4f-74885e7450e6.jpeg</url>
      <title>DEV Community: Tidding Ramsey</title>
      <link>https://dev.to/tidding</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tidding"/>
    <language>en</language>
    <item>
      <title>I Used Amazon Bedrock as My AI Coding Partner for a Day Here's What Happened</title>
      <dc:creator>Tidding Ramsey</dc:creator>
      <pubDate>Wed, 27 May 2026 11:31:12 +0000</pubDate>
      <link>https://dev.to/tidding/i-used-amazon-bedrock-as-my-ai-coding-partner-for-a-day-heres-what-happened-3ga4</link>
      <guid>https://dev.to/tidding/i-used-amazon-bedrock-as-my-ai-coding-partner-for-a-day-heres-what-happened-3ga4</guid>
      <description>&lt;p&gt;&lt;em&gt;How generative AI on AWS helped me summarize feedback, squash bugs, and write better Python  without leaving the console.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I recently completed a hands-on lab using &lt;strong&gt;Amazon Bedrock&lt;/strong&gt;, AWS's managed generative AI service, and I came away genuinely impressed. Not just by what the AI could do, but by how quickly it slotted into real development workflows.&lt;/p&gt;

&lt;p&gt;In this article, I'll walk you through what I learned: from touring the Bedrock console to using it as a live coding assistant. Whether you're an AWS veteran or just curious about practical AI tooling, I think there's something here for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Amazon Bedrock?
&lt;/h2&gt;

&lt;p&gt;Amazon Bedrock is a fully managed AWS service that gives you on-demand access to large language models (LLMs)  without having to provision servers, manage infrastructure, or train models from scratch.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5xmmlr7j51zkubu6rwm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5xmmlr7j51zkubu6rwm.png" alt=" " width="691" height="189"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It comes in two flavors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Serverless models&lt;/strong&gt; : Fully managed foundation models from AWS and partner AI companies (like Anthropic, Meta, Mistral, and others)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Marketplace models&lt;/strong&gt; : Over 100 specialized models deployed on managed Amazon SageMaker endpoints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmhm84b0698hfmemqtv4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzmhm84b0698hfmemqtv4.png" alt=" " width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqqdkih2croi4dn0kv1yl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqqdkih2croi4dn0kv1yl.png" alt=" " width="800" height="388"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One thing that stood out to me: Bedrock isn't just a chat interface. It's a full &lt;strong&gt;AI application platform&lt;/strong&gt; with tools for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agents&lt;/strong&gt; : Automate tasks by connecting LLMs to APIs and data sources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl2yq4jq82gej18015dem.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl2yq4jq82gej18015dem.png" alt=" " width="799" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Flows&lt;/strong&gt; : Chain Bedrock tools and AWS services into end-to-end AI pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F287gw99097koqq0c45tq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F287gw99097koqq0c45tq.png" alt=" " width="800" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge Bases&lt;/strong&gt; : Upload your own docs and build Q&amp;amp;A bots grounded in real content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw2xsys5bepsqsi7s1f8u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw2xsys5bepsqsi7s1f8u.png" alt=" " width="763" height="429"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompt Management&lt;/strong&gt; : Version, test, and reuse prompts across multiple applications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcinva6an2jsobhdtdbvd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcinva6an2jsobhdtdbvd.png" alt=" " width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For this lab, I focused on the &lt;strong&gt;Chat / Text playground&lt;/strong&gt; using the &lt;strong&gt;Amazon Nova Micro&lt;/strong&gt; model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1: Summarizing Messy, Unstructured Feedback
&lt;/h2&gt;

&lt;p&gt;The first task felt immediately practical: summarize a wall of rambling customer feedback and extract actionable improvements.&lt;/p&gt;

&lt;p&gt;The prompt structure I used was simple but effective:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Summarize the following feedback and produce action points for fixes and improvements:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmivndc27zj6uvd7uqph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmivndc27zj6uvd7uqph.png" alt=" " width="800" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Separating the instruction from the data (with a blank line) made the prompt cleaner  both for me to read and, arguably, for the model to parse.&lt;/p&gt;

&lt;p&gt;The feedback itself was a fictional but realistic stream-of-consciousness review of a car parts app. Full of metaphors, colloquialisms, and run-on sentences. The kind of thing you'd actually get from a user interview.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the model returned:&lt;/strong&gt; A clean, structured list of positives and improvement suggestions  search UX, cart flow, missing features like a compatibility wizard and saved parts lists, shipping cost transparency, and live support access.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvb54j9sjtee0pyvuwi4o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvb54j9sjtee0pyvuwi4o.png" alt=" " width="799" height="295"&gt;&lt;/a&gt;&lt;br&gt;
Then I pushed further:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;List the top three improvements that would likely have the biggest impact on customer satisfaction.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No need to repeat the context. The model remembered the conversation and narrowed the list down intelligently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xjdpf6zy5hyng6zu3rk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xjdpf6zy5hyng6zu3rk.png" alt=" " width="800" height="272"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key takeaway:&lt;/strong&gt; Bedrock excels at distilling unstructured human language into structured, actionable output. This is genuinely useful for product teams doing requirements gathering or user research synthesis.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Part 2: Using AI as a Coding Assistant
&lt;/h2&gt;

&lt;p&gt;This is where things got interesting for me as a developer.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fixing a ZeroDivisionError
&lt;/h3&gt;

&lt;p&gt;I started with a simple Python function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;divide&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Calling &lt;code&gt;divide(10, 0)&lt;/code&gt; throws a &lt;code&gt;ZeroDivisionError&lt;/code&gt;. I asked Bedrock:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9qrfw235ujrntvkmuz7h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9qrfw235ujrntvkmuz7h.png" alt=" " width="531" height="153"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwgm3zk9d3kjdgg4qz9bq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwgm3zk9d3kjdgg4qz9bq.png" alt=" " width="335" height="125"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Add exception handling for dividing by zero to this Python3 function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def divide(x, y):
    return x / y
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model returned a version using &lt;code&gt;try/except&lt;/code&gt; blocks, specifically catching &lt;code&gt;ZeroDivisionError&lt;/code&gt; and returning a useful message instead of crashing. Clean, idiomatic Python.&lt;/p&gt;

&lt;p&gt;A small but important habit I developed: &lt;strong&gt;always specify the language version&lt;/strong&gt;. Python 2 and Python 3 differ significantly. The more context you give the model, the more relevant the output.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3qsyogjald4pjj7icfga.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3qsyogjald4pjj7icfga.png" alt=" " width="800" height="334"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Improving the Fibonacci Algorithm
&lt;/h3&gt;

&lt;p&gt;Next, I worked with a classic recursive Fibonacci implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;fibonacci&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fww74b7wtng043gzx28ur.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fww74b7wtng043gzx28ur.png" alt=" " width="485" height="229"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This works  but it's slow. The time complexity is &lt;strong&gt;O(2^n)&lt;/strong&gt;, which means for large values of &lt;code&gt;n&lt;/code&gt;, it becomes unusably slow very quickly.&lt;/p&gt;

&lt;p&gt;I asked Bedrock to evaluate the performance and suggest improvements. The model came back with two solid suggestions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Memoization&lt;/strong&gt; : Cache already-computed values so they're not recalculated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iterative approach&lt;/strong&gt; : Replace recursion with a loop, dropping complexity to &lt;strong&gt;O(n)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I then asked Bedrock to directly compare the time complexity of both approaches. It explained the difference clearly — the recursive version re-computes the same values exponentially, while the iterative version computes each value exactly once.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffwgh3rpcn652sii42uzq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffwgh3rpcn652sii42uzq.png" alt=" " width="798" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Key takeaway:&lt;/strong&gt; Bedrock is genuinely useful for algorithm analysis. Asking "can this be improved?" and "compare these two implementations" are high-value prompts for any developer trying to write more performant code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Part 3: Understanding and Testing Unfamiliar Code
&lt;/h2&gt;

&lt;p&gt;The final section was perhaps the most practically useful for day-to-day work: using AI to understand code you didn't write.&lt;/p&gt;

&lt;p&gt;Take this function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;middle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;del&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;arr&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At a glance, it's not obvious what this does. I asked Bedrock to describe it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Describe how the following Python 3 code works:

def middle(arr):
    ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz4hoyh3wdquq6c6eeqyy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz4hoyh3wdquq6c6eeqyy.png" alt=" " width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model walked through it step-by-step. It also flagged some issues — notably that the function &lt;strong&gt;mutates the original list&lt;/strong&gt; (a side effect most callers won't expect), and that there's no error handling.&lt;/p&gt;

&lt;p&gt;From there, I used an improved version of the function and asked Bedrock to generate a unit test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Generate a unit test for the following Python function using Python's built-in unittest module.
The test class should have one test that tests that None is returned if the argument is None.

def find_middle_element(arr):
    ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhqc4catuje86p07kl8fq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhqc4catuje86p07kl8fq.png" alt=" " width="799" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The model produced a proper &lt;code&gt;unittest.TestCase&lt;/code&gt; class, which I dropped into a file in VS Code and ran immediately. It passed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bonus: Generating Test Data
&lt;/h3&gt;

&lt;p&gt;One underrated use case  generating fake data for testing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Generate some test data for users that includes name, address, phone number, 
and widget order history. Display the test data in the JSON data format.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu9fhh3uks5jhe9l81jo5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu9fhh3uks5jhe9l81jo5.png" alt=" " width="800" height="275"&gt;&lt;/a&gt;&lt;br&gt;
Within seconds, I had realistic-looking JSON I could plug straight into tests or seed scripts. The model can also format this as YAML or TOML if you prefer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bedrock Pricing: What You Should Know
&lt;/h2&gt;

&lt;p&gt;Bedrock prices by &lt;strong&gt;tokens&lt;/strong&gt;  chunks of text (roughly words or word fragments) processed as input and output.&lt;/p&gt;

&lt;p&gt;Two pricing models are available:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;On-Demand&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Light, infrequent, or unpredictable usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Provisioned Throughput&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Predictable, high-volume, or production use&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The chat playground also shows you &lt;strong&gt;latency&lt;/strong&gt; and &lt;strong&gt;token counts&lt;/strong&gt; in real time  useful for estimating costs before committing to a production integration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Customization (If You Need It)
&lt;/h2&gt;

&lt;p&gt;Out of the box, foundation models are general-purpose. But Bedrock also supports three customization approaches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tuning&lt;/strong&gt; : Adjust tone, verbosity, or vocabulary using labeled examples&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distillation&lt;/strong&gt; : Transfer knowledge from a large model to a smaller, faster one (up to 500% faster, 75% cheaper)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-training&lt;/strong&gt; : Expose a model to your domain-specific data corpus&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most developer use cases, you won't need customization — the base models are powerful and flexible. But it's good to know the option exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to Watch Out For
&lt;/h2&gt;

&lt;p&gt;A few honest caveats from working with Bedrock:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hallucinations are real.&lt;/strong&gt; LLMs sometimes generate confident-sounding output that's just wrong. Always review generated code before running it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Responses aren't deterministic.&lt;/strong&gt; The same prompt can yield different outputs each time. Don't expect bit-for-bit reproducibility.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context windows are finite.&lt;/strong&gt; Each model has a maximum context length. For very long codebases or documents, you may need to chunk your input across multiple prompts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt engineering matters.&lt;/strong&gt; Adding context (e.g., &lt;em&gt;"You are an expert AWS cloud engineer"&lt;/em&gt;) meaningfully improves response quality. Being specific about language versions, libraries, and constraints helps too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Amazon Bedrock made it easy to slot AI into workflows I already use — reviewing code, writing tests, making sense of user feedback. Nothing about it felt like magic. It felt like a well-calibrated tool that rewards thoughtful prompting.&lt;/p&gt;

&lt;p&gt;If you're on AWS and haven't explored Bedrock yet, the Chat / Text playground is a zero-friction starting point. Pick a model, type a prompt, see what happens. The learning curve is low. The upside is real.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have you used Amazon Bedrock or another LLM in your development workflow? I'd love to hear what worked (and what didn't)  drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>python</category>
      <category>programming</category>
    </item>
    <item>
      <title>Building My First AI Agent with Strands SDK and Amazon Bedrock Errors, Fixes &amp; Lessons Learned</title>
      <dc:creator>Tidding Ramsey</dc:creator>
      <pubDate>Sat, 09 May 2026 21:24:26 +0000</pubDate>
      <link>https://dev.to/tidding/building-my-first-ai-agent-with-strands-sdk-and-amazon-bedrock-errors-fixes-lessons-learned-4753</link>
      <guid>https://dev.to/tidding/building-my-first-ai-agent-with-strands-sdk-and-amazon-bedrock-errors-fixes-lessons-learned-4753</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I recently attended an AWS event where we built our first AI agent using the &lt;strong&gt;Strands Agents SDK&lt;/strong&gt; and &lt;strong&gt;Amazon Bedrock&lt;/strong&gt;. The quickstart guide looked simple enough — a few lines of Python, some tools, and a running agent. But the real learning happened in the errors. This article walks you through what I built, every error I hit, and exactly how I fixed them.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Built
&lt;/h2&gt;

&lt;p&gt;A simple AI agent that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tell you the current time&lt;/li&gt;
&lt;li&gt;Perform calculations&lt;/li&gt;
&lt;li&gt;Count letters in a word&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three tools. One agent. Sounds easy. It wasn't — but that's what made it worth writing about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Structure
&lt;/h2&gt;

&lt;p&gt;Here's the folder structure I used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Agent/
├── .venv/
├── agent.py
└── requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6cumwlf3564k9wty5l2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6cumwlf3564k9wty5l2.png" alt=" " width="358" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up the Environment
&lt;/h2&gt;

&lt;p&gt;First, create and activate a virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-m&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;venv&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;venv&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;venv&lt;/span&gt;&lt;span class="nx"&gt;\Scripts\Activate.ps1&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="c"&gt;# Windows PowerShell&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then install the required packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;strands-agents&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;strands-agents-tools&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsccqupew5wxtlw2ghr9o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsccqupew5wxtlw2ghr9o.png" alt=" " width="619" height="40"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent Code (Starting Point)
&lt;/h2&gt;

&lt;p&gt;The quickstart gave us this &lt;code&gt;agent.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_time&lt;/span&gt;

&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;letter_counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;letter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Count occurrences of a specific letter in a word.

    Args:
        word (str): The input word to search in
        letter (str): The specific letter to count

    Returns:
        int: The number of occurrences of the letter in the word
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;letter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;letter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;letter&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; parameter must be a single character&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;letter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;letter_counter&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
I have 3 requests:
1. What is the time right now?
2. Calculate 3111696 / 74088
3. Tell me how many letter R&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s are in the word &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strawberry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; 🍓
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwk9a0phmksfdzm4k567l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwk9a0phmksfdzm4k567l.png" alt=" " width="800" height="636"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2qqfohfgmz8cijcfjcqf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2qqfohfgmz8cijcfjcqf.png" alt=" " width="418" height="137"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fedmc83mbgqmvkapqt83s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fedmc83mbgqmvkapqt83s.png" alt=" " width="362" height="103"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Simple and clean. Then I ran it. And the errors began.&lt;/p&gt;




&lt;h2&gt;
  
  
  Error 1: Anthropic Use Case Form Not Submitted
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;botocore.errorfactory.ResourceNotFoundException: 
Model use case details have not been submitted for this account.
Fill out the Anthropic use case details form before using the model.
└ Model id: global.anthropic.claude-sonnet-4-6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; The Strands SDK defaults to Amazon Bedrock with Claude Sonnet 4. But Anthropic requires first-time users to submit a use case form before accessing their models on Bedrock.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How I fixed it:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Went to the &lt;a href="https://console.aws.amazon.com/bedrock" rel="noopener noreferrer"&gt;AWS Bedrock Console&lt;/a&gt; in &lt;strong&gt;us-east-1&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Navigated to &lt;strong&gt;Model Catalog&lt;/strong&gt; and searched for &lt;strong&gt;Claude Sonnet 4.6&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Clicked the model and hit &lt;strong&gt;"Submit use case details"&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Filled out the form:

&lt;ul&gt;
&lt;li&gt;Company name, website, industry&lt;/li&gt;
&lt;li&gt;Intended users: Internal&lt;/li&gt;
&lt;li&gt;Use case: &lt;em&gt;"Building an AI agent for learning and demonstration purposes at an AWS event using the Strands SDK and Amazon Bedrock"&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Submitted and waited ~15 minutes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once the &lt;strong&gt;"Submit use case details"&lt;/strong&gt; button disappeared and &lt;strong&gt;"Open in playground"&lt;/strong&gt; appeared, I knew it was approved.&lt;/p&gt;




&lt;h2&gt;
  
  
  Error 2: Missing Dependency for AWS Login
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;botocore.exceptions.MissingDependencyException: 
Using the login credential provider requires an additional dependency.
You will need to pip install "botocore[crt]" before proceeding.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; The AWS credential provider I was using needed an extra C extension package called &lt;code&gt;awscrt&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How I fixed it:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"botocore[crt]"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That installed &lt;code&gt;awscrt-0.32.2&lt;/code&gt; and resolved the issue immediately.&lt;/p&gt;




&lt;h2&gt;
  
  
  Error 3: Wrong Model ID — ValidationException
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;botocore.errorfactory.ValidationException: 
The provided model identifier is invalid.
└ Model id: us.anthropic.claude-sonnet-4-6-20251031-v1:0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; I had added a &lt;code&gt;BedrockModel&lt;/code&gt; to my code with a guessed model ID, but it wasn't valid for my account.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How I fixed it:&lt;/strong&gt; I ran this command to list all valid inference profile IDs available to my account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;aws&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;bedrock&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;list-inference-profiles&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;us-east-1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--profile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;tidding&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;--query&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"inferenceProfileSummaries[?contains(inferenceProfileId, 'anthropic')].inferenceProfileId"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It returned a list including:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="s2"&gt;"us.anthropic.claude-sonnet-4-6"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was the correct ID. No version suffix needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Final Working Code
&lt;/h2&gt;

&lt;p&gt;After all the fixes, here is the final &lt;code&gt;agent.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BedrockModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strands_tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_time&lt;/span&gt;

&lt;span class="c1"&gt;# Define a custom tool using the @tool decorator
&lt;/span&gt;&lt;span class="nd"&gt;@tool&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;letter_counter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;letter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Count occurrences of a specific letter in a word.

    Args:
        word (str): The input word to search in
        letter (str): The specific letter to count

    Returns:
        int: The number of occurrences of the letter in the word
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;letter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;letter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;letter&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; parameter must be a single character&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;word&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;letter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Specify the Bedrock model explicitly using the correct inference profile ID
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us.anthropic.claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create the agent with tools
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current_time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;letter_counter&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Ask the agent to handle multiple tasks
&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
I have 3 requests:
1. What is the time right now?
2. Calculate 3111696 / 74088
3. Tell me how many letter R&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s are in the word &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;strawberry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; 🍓
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="nf"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;AWS_PROFILE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tidding"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="n"&gt;python&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;agent.py&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fontb1mvybbzhta3q8b11.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fontb1mvybbzhta3q8b11.png" alt=" " width="799" height="428"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Lessons Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Always submit the Anthropic use case form first.&lt;/strong&gt; It's a one-time requirement per AWS account and blocks everything until done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Don't guess model IDs.&lt;/strong&gt; Use &lt;code&gt;aws bedrock list-inference-profiles&lt;/code&gt; to get the exact valid ID for your account.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The default Strands model ID may not match what your account supports.&lt;/strong&gt; Always specify the model explicitly using &lt;code&gt;BedrockModel&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. &lt;code&gt;botocore[crt]&lt;/code&gt; is required&lt;/strong&gt; when using the AWS login credential provider on Windows. Install it early.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Set your AWS profile&lt;/strong&gt; before running the agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nv"&gt;AWS_PROFILE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"yourprofile"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What the Agent Loop Looks Like
&lt;/h2&gt;

&lt;p&gt;Once running, the Strands agent follows this loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Input → Reasoning (LLM) → Tool Selection → Tool Execution → Back to Reasoning → Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent automatically decides which tools to use based on your message. For our three requests, it fired &lt;code&gt;current_time&lt;/code&gt;, &lt;code&gt;calculator&lt;/code&gt;, and &lt;code&gt;letter_counter&lt;/code&gt; — all in one go.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Strands SDK makes building AI agents genuinely simple — once you get past the AWS setup hurdles. The errors I faced were all configuration-related, not code-related. Once the environment was right, the agent worked beautifully in just a few lines of Python.&lt;/p&gt;

&lt;p&gt;If you're attending an AWS event and hitting these same errors, I hope this saves you time. Drop a comment if you're stuck — happy to help!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Built at an AWS Event · Strands Agents SDK · Amazon Bedrock · Claude Sonnet 4.6&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>machinelearning</category>
      <category>cloudcomputing</category>
    </item>
    <item>
      <title>How I Evaluated an AI Model on AWS Without Writing a Single Line of Training Code</title>
      <dc:creator>Tidding Ramsey</dc:creator>
      <pubDate>Sat, 09 May 2026 13:26:57 +0000</pubDate>
      <link>https://dev.to/tidding/how-i-evaluated-an-ai-model-on-aws-without-writing-a-single-line-of-training-code-20o9</link>
      <guid>https://dev.to/tidding/how-i-evaluated-an-ai-model-on-aws-without-writing-a-single-line-of-training-code-20o9</guid>
      <description>&lt;p&gt;A step-by-step guide to Amazon Bedrock's model evaluation feature  from S3 setup to reading real results&lt;/p&gt;

&lt;p&gt;Ever wondered whether the AI model you're about to plug into your production system actually &lt;strong&gt;knows what it's doing&lt;/strong&gt;? Me too. That's exactly what Amazon Bedrock's &lt;strong&gt;model evaluation&lt;/strong&gt; feature is built for  and after running through it myself, I'm genuinely impressed at how accessible it is.&lt;/p&gt;

&lt;p&gt;No PhD. No GPU clusters. No tears. Just AWS, an S3 bucket, and a few JSON prompts.&lt;/p&gt;

&lt;p&gt;Let's walk through the whole thing  start to finish.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Even Is Amazon Bedrock?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Amazon Bedrock is AWS's managed Generative AI service. Instead of spending months training, hosting, and scaling foundation models yourself, Bedrock lets you call them like an API. Think of it as the "serverless" moment for AI — the infrastructure complexity disappears and you focus on what actually matters: using the models.&lt;/p&gt;

&lt;p&gt;One of its best-kept features is &lt;strong&gt;model evaluation&lt;/strong&gt;  a way to run a model against a set of prompts, compare its responses to expected answers, and get a performance score back. It's perfect for building confidence before you commit a model to your workflow.&lt;/p&gt;

&lt;p&gt;Here's what we're going to build today:&lt;/p&gt;

&lt;p&gt;Prompt Dataset (S3) → Bedrock Evaluation Job → Results (S3) → Insights&lt;br&gt;
The before architecture&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxfueqy2mglh4zzbm8qbf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxfueqy2mglh4zzbm8qbf.png" alt=" " width="563" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The after architecture&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdxkkaqjxaa2m99hg1tpi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdxkkaqjxaa2m99hg1tpi.png" alt=" " width="538" height="456"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 1: Log In and Orient Yourself
&lt;/h2&gt;

&lt;p&gt;Head to the &lt;a href="https://aws.amazon.com/console/" rel="noopener noreferrer"&gt;AWS Management Console&lt;/a&gt; and sign in. Make sure you're in the &lt;strong&gt;US West (Oregon) / us-west-2&lt;/strong&gt; region  model evaluation support varies by region and you want to be where the models live.&lt;/p&gt;

&lt;p&gt;Once you're in, you'll use two core services today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Amazon S3&lt;/strong&gt; — to store your prompt dataset and receive evaluation results&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Amazon Bedrock&lt;/strong&gt; — to run the actual evaluation job&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Step 2: Create Your S3 Buckets
&lt;/h2&gt;

&lt;p&gt;You need two S3 buckets — one to hold your prompt dataset, another to receive evaluation results. Let's create both from scratch.&lt;br&gt;
In the AWS search bar, type S3 and open the service. Click Create bucket.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyu3nni706yu9jxs4em73.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyu3nni706yu9jxs4em73.png" alt=" " width="800" height="133"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Bucket 1: The Prompt Dataset Bucket&lt;br&gt;
This bucket holds the questions you'll throw at the model.&lt;br&gt;
Click Create bucket&lt;br&gt;
Give it a name — something like bedrock-prompt-dataset-yourname-2026. S3 bucket names must be globally unique across all of AWS, so add something personal or random to the end&lt;br&gt;
Make sure the region is set to us-west-2 (Oregon)&lt;br&gt;
Leave everything else as default keep Block all public access enabled&lt;br&gt;
Click Create bucket&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bucket 2: The Output Bucket&lt;br&gt;
This is where Bedrock will write the evaluation results.&lt;br&gt;
Click Create bucket again&lt;br&gt;
Name it something like bedrock-eval-output-yourname-2025&lt;br&gt;
Same region: us-west-2&lt;br&gt;
Leave defaults, click Create bucket&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You should now see both buckets in your S3 console.&lt;br&gt;
&lt;strong&gt;Build Your Prompt Dataset&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  What's in the Prompt Dataset?
&lt;/h3&gt;

&lt;p&gt;The prompt dataset is a &lt;code&gt;.jsonl&lt;/code&gt; file (one JSON object per line) where each object has three fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The chemical symbol for gold is"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Chemistry"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"referenceResponse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Au"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The tallest mountain in the world is"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Geography"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"referenceResponse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Mount Everest"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The author of 'Great Expectations' is"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Literature"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"referenceResponse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Charles Dickens"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjjvjscotsg0zi4ncjqvb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjjvjscotsg0zi4ncjqvb.png" alt=" " width="794" height="61"&gt;&lt;/a&gt;&lt;br&gt;
Notice the structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;prompt&lt;/code&gt;&lt;/strong&gt; — what you'll send to the model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;referenceResponse&lt;/code&gt;&lt;/strong&gt; — the ground truth you're checking against&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;category&lt;/code&gt;&lt;/strong&gt; — for grouping results later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In production, you'd replace these general-knowledge questions with prompts that mirror your real use case. Customer support queries. Code generation tasks. Medical summaries. Whatever you're building for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Add a CORS Configuration to the Dataset Bucket&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Bedrock needs cross-origin access to read from your S3 bucket. Here's how to enable it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click into your &lt;strong&gt;prompt dataset bucket&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Go to the &lt;strong&gt;Permissions&lt;/strong&gt; tab&lt;/li&gt;
&lt;li&gt;Scroll to &lt;strong&gt;Cross-origin resource sharing (CORS)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Edit&lt;/strong&gt; and paste this config:
json
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"AllowedHeaders"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"AllowedMethods"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"GET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PUT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"POST"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DELETE"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"AllowedOrigins"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ExposeHeaders"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Access-Control-Allow-Origin"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwcc7amybi2n5jdeudjf7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwcc7amybi2n5jdeudjf7.png" alt=" " width="795" height="85"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click &lt;strong&gt;Save changes&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's it. This tells S3: "Yes, Amazon Bedrock is allowed to read from me." Without this, the evaluation job will fail silently — so don't skip it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Pro tip:&lt;/strong&gt; In a real production setup, you'd tighten the &lt;code&gt;AllowedOrigins&lt;/code&gt; to specific Bedrock endpoints rather than using &lt;code&gt;"*"&lt;/code&gt;. For now, this gets us moving.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Step 3: Create the Model Evaluation Job
&lt;/h2&gt;

&lt;p&gt;Back to the AWS search bar — type &lt;strong&gt;Bedrock&lt;/strong&gt; and open Amazon Bedrock.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fufadbwfyr11nmsivy1p1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fufadbwfyr11nmsivy1p1.png" alt=" " width="798" height="136"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Expand the left-hand menu (hamburger icon, top-left)&lt;/li&gt;
&lt;li&gt;Under &lt;strong&gt;Assess&lt;/strong&gt;, click &lt;strong&gt;Evaluations&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqiz0s0dsnqkf0oa2wopj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqiz0s0dsnqkf0oa2wopj.png" alt=" " width="800" height="791"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Click &lt;strong&gt;Create Automatic: Programmatic&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd5rbmortepmmw84wqmc4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd5rbmortepmmw84wqmc4.png" alt=" " width="793" height="468"&gt;&lt;/a&gt;&lt;br&gt;
Now fill in the job configuration:&lt;/p&gt;
&lt;h2&gt;
  
  
  Model Evaluation Details
&lt;/h2&gt;

&lt;p&gt;Evaluation name: Something unique like &lt;code&gt;my-eval-job-abc123&lt;/code&gt; &lt;br&gt;
Model provider: Amazon&lt;br&gt;
Model: &lt;strong&gt;Nova Micro&lt;/strong&gt; &lt;br&gt;
Task type: Question and answer &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa7nzpoh2l4aikgx240ok.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa7nzpoh2l4aikgx240ok.png" alt=" " width="799" height="667"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Metrics
&lt;/h2&gt;

&lt;p&gt;Click &lt;strong&gt;Remove&lt;/strong&gt; on any extra metrics until only one remains. Set it to &lt;strong&gt;Accuracy&lt;/strong&gt;. This metric compares the model's response against your &lt;code&gt;referenceResponse&lt;/code&gt; and returns a score.&lt;/p&gt;
&lt;h2&gt;
  
  
  Prompt Dataset
&lt;/h2&gt;

&lt;p&gt;Select &lt;strong&gt;Use your own prompt dataset&lt;/strong&gt; and enter your S3 path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;s3://your-prompt-dataset-bucket-name/prompt_dataset.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Evaluation Results
&lt;/h2&gt;

&lt;p&gt;Point this to your output bucket:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;s3://your-output-bucket-name/evaluation-results/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  IAM Role
&lt;/h2&gt;

&lt;p&gt;Bedrock needs a role to access your S3 buckets on its behalf. Let's create one real quick.&lt;br&gt;
In a new browser tab, go to IAM → Roles → Create role and follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Trusted entity
Select AWS service, then under Use case search for and select Bedrock. Click Next.&lt;/li&gt;
&lt;li&gt;Permissions
Attach these two policies:
&lt;em&gt;AmazonBedrockFullAccess
AmazonS3FullAccess&lt;/em&gt;
Click Next.&lt;/li&gt;
&lt;li&gt;Name and create
Name the role something like bedrock-eval-role, then click Create role.
Back on the Bedrock evaluation page, under Amazon Bedrock IAM role, select Use an existing role, click the dropdown, and pick bedrock-eval-role.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;*In production you'd scope the S3 policy down to only your two specific buckets — but for getting started, AmazonS3FullAccess does the job.&lt;/p&gt;

&lt;p&gt;Click &lt;strong&gt;Create&lt;/strong&gt;  and watch the job appear with status &lt;strong&gt;In progress&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffc0b8qg0euhhxadrcbt3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffc0b8qg0euhhxadrcbt3.png" alt=" " width="799" height="167"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Step 4: Read the Results (This Is the Fun Part)
&lt;/h2&gt;

&lt;p&gt;Once the job completes, head back to S3 and look inside your output bucket under &lt;code&gt;evaluation-results/&lt;/code&gt;. You'll find a &lt;code&gt;.jsonl&lt;/code&gt; file with one result per prompt. Here's what the raw output looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"automatedEvaluationResult"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"scores"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"metricName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Builtin.Accuracy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.0625&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"inputRecord"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The chemical symbol for gold is"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"referenceResponse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Au"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Chemistry"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"modelResponses"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"response"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The chemical symbol for gold is Au."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"modelIdentifier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"us.amazon.nova-micro-v1:0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"stopReason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"end_turn"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy87zfii6ntxmtyxritx2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy87zfii6ntxmtyxritx2.png" alt=" " width="800" height="91"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Breaking Down the Accuracy Scores
&lt;/h3&gt;

&lt;p&gt;Here's a summary of the three prompts from our run:&lt;/p&gt;

&lt;p&gt;Looking at the three prompts from our run, the Chemistry question ("The chemical symbol for gold is") scored 0.0625, the Geography question ("The tallest mountain in the world is") came in slightly higher at 0.0870, and the Literature question ("The author of 'Great Expectations' is") landed at 0.0727. All three were answered correctly — Au, Mount Everest, and Charles Dickens respectively  yet the scores are nowhere near 1.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wait  These Scores Look Low. Is That Bad?
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting. The accuracy scores seem low because the scoring algorithm is doing &lt;strong&gt;token-level matching&lt;/strong&gt; between the model's verbose answer and the short reference response. &lt;/p&gt;

&lt;p&gt;The model answered correctly in all three cases it said "Au", "Mount Everest", and "Charles Dickens". But it also said a &lt;em&gt;lot of other things&lt;/em&gt; (it explained its reasoning step-by-step). Those extra tokens pulled the accuracy score down.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is a critical lesson:&lt;/strong&gt; how you write your prompts and reference responses &lt;em&gt;dramatically&lt;/em&gt; affects your scores. If you want higher accuracy scores:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Instead&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;reference&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;response:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"referenceResponse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Au"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Try&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;instructing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;answer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;concisely&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;your&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;prompt:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Answer in one word only. The chemical symbol for gold is:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"referenceResponse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Au"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the value of model evaluation — it surfaces these kinds of nuances before you go to production.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;Now that you understand the pipeline, here's how to level it up:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Swap models&lt;/strong&gt; — Run the same dataset against &lt;code&gt;Nova Lite&lt;/code&gt;, &lt;code&gt;Nova Pro&lt;/code&gt;, or even Claude models to compare them head-to-head&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use real prompts&lt;/strong&gt; — Replace the sample dataset with 50-100 prompts from your actual use case&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automate&lt;/strong&gt; — Trigger evaluation jobs via the AWS CLI or SDK as part of your CI/CD pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Track over time&lt;/strong&gt; — Save scores to a database and chart model performance as you update prompts or switch models&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Quick Recap
&lt;/h2&gt;

&lt;p&gt;Here's the full flow in one breath:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Upload a &lt;code&gt;.jsonl&lt;/code&gt; prompt dataset to S3&lt;/li&gt;
&lt;li&gt; Add a CORS config to the S3 bucket so Bedrock can read it&lt;/li&gt;
&lt;li&gt; Create a Bedrock model evaluation job pointing at Nova Micro&lt;/li&gt;
&lt;li&gt; Wait for it to run, then read the &lt;code&gt;.jsonl&lt;/code&gt; results in your output bucket&lt;/li&gt;
&lt;li&gt; Interpret the accuracy scores in context — verbose model answers score lower even when correct&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Amazon Bedrock's model evaluation feature removes one of the biggest unknowns in AI integration: &lt;em&gt;"Can this model actually answer my questions reliably?"&lt;/em&gt; Now you have a repeatable, automated answer.&lt;/p&gt;

&lt;p&gt;Go build something confident. &lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have questions or want to share your evaluation results? Drop them in the comments below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>machinelearning</category>
      <category>cloudcomputing</category>
    </item>
  </channel>
</rss>
