<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ana Silva</title>
    <description>The latest articles on DEV Community by Ana Silva (@acsilva).</description>
    <link>https://dev.to/acsilva</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2202650%2F65e325dd-3aef-4c76-a4d0-70b5b3128efc.png</url>
      <title>DEV Community: Ana Silva</title>
      <link>https://dev.to/acsilva</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/acsilva"/>
    <language>en</language>
    <item>
      <title>Partition and Sort Keys on DynamoDB: Modeling data for batch-and-stream convergence</title>
      <dc:creator>Ana Silva</dc:creator>
      <pubDate>Fri, 22 May 2026 15:11:53 +0000</pubDate>
      <link>https://dev.to/aws-builders/partition-and-sort-keys-on-dynamodb-modeling-data-for-batch-and-stream-convergence-5gl0</link>
      <guid>https://dev.to/aws-builders/partition-and-sort-keys-on-dynamodb-modeling-data-for-batch-and-stream-convergence-5gl0</guid>
      <description>&lt;h2&gt;
  
  
  How to design partition keys, sort keys and a serving layer when your data has two sources with different SLAs
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwe26i2xpf8ett73s55t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwe26i2xpf8ett73s55t.png" alt="The Delivery of the Keys to Saint Peter — Pietro Perugino" width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Batch pipelines are reliable, auditable, and built to handle complexity at scale, but they come with a cost: by the time data is consolidated, validated, and ready to serve, hours or days might have passed. For internal analytics that’s often fine, but for a customer staring at a mobile app wondering whether their action was registered, it is not.&lt;/p&gt;

&lt;p&gt;This article is about a DynamoDB modeling problem that lives at that boundary. The case study below is fictional and designed to keep the focus on the technical problem rather than on any specific domain. Business rules and delays are simplified accordingly.&lt;/p&gt;

&lt;p&gt;The core challenge is one that appears frequently in data-intensive systems: a batch pipeline owns the authoritative view of the data, but its consolidation cycle introduces a lag that makes the user-facing experience feel stale. The goal is to bridge that gap with a near-real-time layer without replacing the batch, without duplicating business logic, and without turning the serving layer into a consistency problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Use Case
&lt;/h2&gt;

&lt;p&gt;I've covered this use case in my previous articles about DynamoDB. Here’s a quick recap:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We’re a financial institution developing a new feature in our mobile app to promote our customers’ financial health, offering them monthly saving goals. As a reward for reaching these goals, the saved amount is automatically invested under special conditions and higher interest rates.&lt;/p&gt;

&lt;p&gt;The goals are calculated in batch by a data platform and then loaded into the database to be displayed in the app. The only information the app writes to the database is which goal the customer chose.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For this article, we’re extending the use case. The batch pipeline remains the source of truth, but business needs a new capability: &lt;strong&gt;when a customer makes a deposit toward their goal, it should appear in the app within minutes, not the next day.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each customer has a monthly saving goal per investment type: savings account, government bonds and a certificate of deposit. Each goal has a target amount and accumulates progress as the customer makes deposits over the course of the month. The batch pipeline computes the consolidated state of each goal daily and writes it to DynamoDB.&lt;/p&gt;

&lt;p&gt;The catch is that a contribution is not immediately final; it goes through reconciliation across multiple systems before it can be considered confirmed. The new capability is about making contributions visible before the batch catches up and display them with a status of “pending confirmation.” If reconciliation fails, it disappears. If it succeeds, the batch pipeline will eventually pick it up and reflect it in the authoritative view. This gives us two data sources with different freshness and different levels of trust, both feeding the same API response. That is the problem this article addresses.&lt;/p&gt;




&lt;h2&gt;
  
  
  Solution diagram
&lt;/h2&gt;

&lt;p&gt;The diagram below shows how the two pipelines are organized and where they converge.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0g6upa2k9ztvum78i6o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv0g6upa2k9ztvum78i6o.png" alt=" " width="800" height="389"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The batch pipeline reads from consolidated gold tables stored as files on an object storage, S3 in this case, and a Spark-based job applies business rules to compute the updated state for each customer. Those rules live in a relational database, shared between both pipelines. Once processed, the results are bulk loaded into DynamoDB.&lt;/p&gt;

&lt;p&gt;The stream pipeline consumes customer activity events from a managed Kafka cluster and a containerized consumer service enriches each event before writing to a second DynamoDB table in near real-time. The two pipelines never write to the same table. They converge only at read time, where a single service merges both into one API response.&lt;/p&gt;

&lt;p&gt;This is a practical application of the lambda architecture pattern, but what matters here is what that pattern demands from the data modeling and serving layer, and that is what the next section addresses.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why two tables
&lt;/h2&gt;

&lt;p&gt;The instinct when modeling in DynamoDB is to reach for the single-table design. In this case, two reasons pull the tables apart.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The first is TTL.&lt;/strong&gt; DynamoDB’s time-to-live is a table-level configuration, not item-level behavior you can selectively enable. The NRT table needs items to expire automatically after a few days, while the batch table does not; its lifecycle is controlled by the batch job itself, which rewrites the data on each run. Sharing a table would mean either applying TTL to records that should never expire or managing expiration manually through application logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The second is the write pattern.&lt;/strong&gt; The batch pipeline performs a bulk load on a daily schedule, replacing all existing records. The NRT consumer writes continuously, one event at a time, with idempotency requirements that bulk loads do not need. These are different operational profiles, and mixing them in a single table introduces coupling between two pipelines that can be independent. It also sounds like a nightmare for scenarios where reprocessing might be needed.&lt;/p&gt;

&lt;p&gt;Splitting the tables keeps each pipeline simple, isolated and easier to operate. &lt;strong&gt;The complexity of merging them is contained in the serving layer.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Modeling the tables
&lt;/h2&gt;

&lt;p&gt;A DynamoDB table does not have a fixed schema in the traditional sense, since each item can have different attributes, but every item must have a primary key. That key can be a single attribute, the partition key, or a composite of two, the partition key plus a sort key. DynamoDB uses the partition key to distribute data across storage nodes, and when a sort key is present, it orders items within the same partition.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;The choice of key matters more than anything else in DynamoDB modeling, because it determines how data is distributed, how it is accessed and what queries are possible.&lt;br&gt;
*&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhk0hhd088myqupr6b1na.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhk0hhd088myqupr6b1na.png" alt=" " width="800" height="160"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The batch table
&lt;/h3&gt;

&lt;p&gt;The batch table holds the consolidated view of each customer: personal attributes, preferences, the goals they have chosen, the progress computed by the last batch run, and a few other fields that the API exposes to the mobile app. It is, effectively, the full customer profile + information about the goals the customer has already accomplished.&lt;/p&gt;

&lt;p&gt;Every read starts with a customer. The API asks “what does this person look like right now?” and expects one consolidated response. Given that access pattern, there is no reason to spread the data across multiple items. &lt;strong&gt;The partition key is personId.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"personId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"person-uuid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Ana Silva"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"preferences"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"notifications"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"language"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pt-BR"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"goals"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"goalId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SAVINGS_ACCOUNT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"targetAmount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;500.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"savedAmount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;320.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"IN_PROGRESS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"specialRate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.08&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"goalId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GOVERNMENT_BONDS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"targetAmount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1500.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"savedAmount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"NOT_STARTED"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"specialRate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.12&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"goalId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CERTIFICATE_OF_DEPOSIT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"targetAmount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;2000.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"savedAmount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;2000.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"COMPLETED"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"specialRate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.14&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lastUpdated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-15T03:00:00Z"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is no TTL on this table because nothing needs to expire passively between runs. The batch job owns the lifecycle of every item, rewriting records on each run.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F603qb1nij1qkib00sv4p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F603qb1nij1qkib00sv4p.png" alt=" " width="800" height="726"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The NRT table
&lt;/h3&gt;

&lt;p&gt;The NRT table has a narrower scope. It holds only what the stream pipeline produces: the most recent balance update per investment type per customer, written as deposit events arrive. There is no need to model the rest of the customer here, because the serving layer will combine this data with the batch table at read time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The partition key is personId and the sort key is goalId.&lt;/strong&gt; Every Kafka event carries the customer’s updated balance for one investment type, and the consumer writes it to the NRT table using PutItem. Because there can be at most one pending balance per investment type at any moment, newer events naturally overwrite older ones for the same (personId, goalId) pair. The serving layer can retrieve all pending balances for a customer in a single query.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"personId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"person-uuid"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"goalId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SAVINGS_ACCOUNT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"currentBalance"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;370.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PENDING_CONFIRMATION"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"eventId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"event-uuid-123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"updatedAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-16T14:23:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ttl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1747699380&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ttl attribute is a Unix timestamp telling DynamoDB when to delete this item automatically. That is how time-to-live works in DynamoDB: you designate an attribute to hold the expiration timestamp, enable TTL on the table pointing to that attribute and Dynamo handles deletion in the background. Items in the NRT table expire after five days, long enough for the batch pipeline to process the contribution and take ownership, short enough to prevent stale pending records from accumulating unnecessarily.&lt;/p&gt;

&lt;p&gt;The updatedAt attribute carries the event timestamp from Kafka and plays a role in the merge logic: the serving layer compares it against the batch's lastUpdated attribute to decide whether the NRT snapshot is fresher than the batch view. If the batch has already caught up, the NRT value is ignored; otherwise it is used and the goal is displayed with a "pending confirmation" indicator.&lt;/p&gt;




&lt;h2&gt;
  
  
  Serving the data
&lt;/h2&gt;

&lt;p&gt;The two tables exist to be merged. Every read from the API does the same thing: fetch the customer’s batch view, fetch any pending updates from the NRT table and combine them into a single response before returning it to the app.&lt;/p&gt;

&lt;p&gt;The two reads are independent and can run in parallel. The first is a GetItem on the batch table, keyed by personId. It returns the customer profile and the consolidated state of every goal as of the last batch run. The second is a Query on the NRT table, also keyed by personId. It returns every pending balance update for that customer across all investment types, typically zero to three items.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;batch_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;TableName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CustomerBatch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;personId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;nrt_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;TableName&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CustomerNRT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;KeyConditionExpression&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;personId = :pid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ExpressionAttributeValues&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:pid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;person_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;batch_item&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Item&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
&lt;span class="n"&gt;pending_by_goal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goalId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;nrt_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Items&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;response_goals&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;goal&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;batch_item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goals&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
    &lt;span class="n"&gt;pending&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pending_by_goal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;goalId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pending&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;pending&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;updatedAt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;batch_item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lastUpdated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;response_goals&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;savedAmount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pending&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;currentBalance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PENDING_CONFIRMATION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response_goals&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;goal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A subtle but important property of this design is that the API does not need to know which events have already been incorporated by the batch, nor does it need to track any history of pending updates. The timestamp comparison is enough and the TTL eventually removes the NRT item.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjp60mvr1076dj9wuunsd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjp60mvr1076dj9wuunsd.png" alt=" " width="799" height="591"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  About the costs
&lt;/h3&gt;

&lt;p&gt;The GetItem on the batch table consumes 1 RCU for a strongly consistent read or 0.5 RCU for an eventually consistent one on items under 4KB.&lt;/p&gt;

&lt;p&gt;The Query on the NRT table consumes RCUs proportional to the total size of the items returned, rounded up to the nearest 4KB block. With at most 3 items per customer (one per investment type) and small payloads, this almost always lands within a single 4KB block, meaning 1 RCU strongly consistent or 0.5 RCU eventually consistent.&lt;/p&gt;

&lt;p&gt;For this use case, eventually consistent reads are acceptable on both tables. The batch item is small enough to fit in a single read unit and the NRT query returns at most 3 small items per customer, so the total read cost per API request lands at roughly 1 RCU. This is a direct consequence of the modeling choices: bounded goal count, single-item batch view, one NRT item per investment type.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;What makes this work in DynamoDB is the modeling, and most of it comes down to how you use partition keys and sort keys. &lt;strong&gt;They are not just identifiers, they are the access path.&lt;/strong&gt; The partition key decides how data is distributed and which queries are cheap. The sort key decides how items are organized within a partition and which queries are even possible.&lt;/p&gt;

&lt;p&gt;Sort keys can do much more than what is depicted here. They support range queries with operators like begins_with, between and inequality comparisons, which means you can model hierarchies, timelines or status transitions directly in the key. Composite sort keys like STATUS#ACTIVE#DATE#2026-05-16 let a single Query retrieve, sort and filter items in ways that would otherwise require a secondary index or a scan.&lt;/p&gt;

&lt;p&gt;If you are working on something similar or have approached the same problem differently, I would love to hear about it.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>dynamodb</category>
      <category>nosql</category>
      <category>database</category>
    </item>
    <item>
      <title>Cracking the Bedrock, Reaching the Core: Building Agents with AWS AgentCore Runtime and Memory</title>
      <dc:creator>Ana Silva</dc:creator>
      <pubDate>Fri, 08 May 2026 23:49:40 +0000</pubDate>
      <link>https://dev.to/aws-builders/cracking-the-bedrock-reaching-the-core-building-agents-with-aws-agentcore-runtime-and-memory-32kg</link>
      <guid>https://dev.to/aws-builders/cracking-the-bedrock-reaching-the-core-building-agents-with-aws-agentcore-runtime-and-memory-32kg</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fu3zhmezad44ijjrqxg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fu3zhmezad44ijjrqxg.png" alt="Moses Striking the Rock — Joachim Anthonisz Wtewael, 1624" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article walks through a project I built on Amazon Bedrock AgentCore: an agent that turns campaign briefings into ranked email subject lines, and improves across sessions as it learns from the user.&lt;/p&gt;

&lt;p&gt;The goal here isn’t to cover every AgentCore primitive, but to show how a few of them (Runtime and Memory) fit together in a real loop, and to be honest about which ones I deliberately left out. The project itself is intentionally small. The interesting part is the architecture around it: where scoring runs, why the optimization loop stays framework-agnostic, what memory actually stores and which tradeoffs come with each decision.&lt;/p&gt;

&lt;p&gt;*👉 You can access the project on &lt;a href="https://github.com/anacds/subjectLineOptimizer/" rel="noopener noreferrer"&gt;Github&lt;/a&gt;.&lt;br&gt;
*&lt;/p&gt;


&lt;h2&gt;
  
  
  Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Agents-to-AgentCore Evolution&lt;/li&gt;
&lt;li&gt;The use case: briefing in, subject lines out&lt;/li&gt;
&lt;li&gt;Drafting a solution&lt;/li&gt;
&lt;li&gt;I. The entrypoint (main.py)&lt;/li&gt;
&lt;li&gt;II. The imperative shell (agent/builder.py)&lt;/li&gt;
&lt;li&gt;III. The functional core (agent/iteration.py)&lt;/li&gt;
&lt;li&gt;IV. Scoring&lt;/li&gt;
&lt;li&gt;V. The two agents and Bedrock&lt;/li&gt;
&lt;li&gt;VI. AgentCore Memory: four strategies&lt;/li&gt;
&lt;li&gt;VII. Observability&lt;/li&gt;
&lt;li&gt;VIII. What I didn't use this time&lt;/li&gt;
&lt;li&gt;IX. How to deploy and run it&lt;/li&gt;
&lt;li&gt;Closing thoughts&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  Agents-to-AgentCore Evolution
&lt;/h2&gt;

&lt;p&gt;Bedrock was launched in 2023 as AWS's response to the rapid growth of foundation models use: a single API to call models from companies such as Anthropic, AI21 Labs, Cohere or even Amazon's own Titan family, without managing inference infrastructure. Later that year AWS added Bedrock Agents, a configuration-driven product that bolted tool-calling, knowledge bases, and memory onto a Bedrock model.&lt;/p&gt;

&lt;p&gt;It works for many cases, but it is a closed product: the ecosystem strongly revolves around Lambda-based tool execution, retrieval has to be Knowledge Bases, models have to be Bedrock-hosted, and you can't see or control how the agent decides what to do at each step. For more ambitious use cases, teams end up bypassing Bedrock Agents and writing their own harness on EC2 or Lambda, which meant rebuilding the same plumbing every team had to rebuild: session management, sandboxing, identity, memory and observability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4ap50pr38rf8jh9iq9b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4ap50pr38rf8jh9iq9b.png" alt="From AWS Docs: https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/what-is-bedrock-agentcore.html" width="800" height="571"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AgentCore, announced in 2025, was AWS's evolution to that pattern. Instead of a single "agent product," it broke the harness apart into composable services and made them framework-agnostic, so you could bring Strands, LangGraph, CrewAI or anything else. April 2026 added the managed Harness, which closed the loop: Harness offers the same easy, configuration-based approach as Bedrock Agents, but it runs on the AgentCore platform and lets you switch to code when you need more control.&lt;/p&gt;

&lt;p&gt;AWS continues to maintain both Bedrock Agents and AgentCore in parallel. Bedrock Agents remains available for teams that already use it or prefer a fully managed, configuration-only approach, while AgentCore is positioned as the path forward for new projects that need flexibility, framework choice or production-grade infrastructure.&lt;/p&gt;


&lt;h2&gt;
  
  
  The use case: briefing in, subject lines out
&lt;/h2&gt;

&lt;p&gt;Email subject lines have an outsized effect on open rates and they're often the only impression a campaign makes. Marketers who have the volume and the time for A/B testing can ship two or more variants and let the data decide. Many marketers don't, so they write a subject line, second-guess it and hit send.&lt;/p&gt;

&lt;p&gt;Imagine that you run email marketing for a specialty coffee brand. You open the optimizer, fill in the briefing and set a few constraints: nothing longer than 55 characters, no discount language, no emojis. You hit optimize and watch the rounds come in. Round one produces 8 candidates covering the full stylistic range, from urgency-led to curiosity-led to plain and direct. The scorer immediately tells you which ones carry spam risk, which ones are the right length, which ones align with a retention audience. The weakest three get dropped and the Critic explains why. Round two regenerates those slots with that guidance in mind. By round three the scores have converged and you have a ranked shortlist of five, each with a predicted open-rate band and a breakdown of what drove the score.&lt;/p&gt;

&lt;p&gt;Next week you run a cross-sell campaign for the same brand. The briefing is different but the session ID carries your name. Before generating a single candidate, the optimizer reads what it learned from your prior sessions: urgency-led lines consistently underperformed for this brand; premium and exclusivity framing reached the shortlist every time. Round one already looks different from what a first-time user would see.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rsgbv313uyttm6srsnf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6rsgbv313uyttm6srsnf.png" alt=" " width="772" height="544"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Input: a campaign briefing
&lt;/h3&gt;

&lt;p&gt;A generic JSON brief with the objective, audience, offer, brand voice and constraints. The kind of structure you'd find in any agency template, nothing platform-specific.&lt;/p&gt;
&lt;h3&gt;
  
  
  Output: a ranked shortlist of 5
&lt;/h3&gt;

&lt;p&gt;Each variant comes with predicted open-rate range, the dimensions where it scored highest and any flagged risks (spam triggers, length penalty, audience mismatch). The user can ask for follow-ups in the same session — "give me shorter versions of #2 and #4" — and the agent refines while preserving what made those variants score well.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fapgqacprpqx7n821g8s2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fapgqacprpqx7n821g8s2.png" alt="UI mockup" width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Drafting a solution
&lt;/h2&gt;

&lt;p&gt;Three tiers, top to bottom.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ge20upo4kuaq76rg2i4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ge20upo4kuaq76rg2i4.png" alt=" " width="799" height="612"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The caller sends a campaign briefing JSON and gets a streamed response back. That exchange happens through a single &lt;code&gt;@app.entrypoint&lt;/code&gt; function inside AgentCore Runtime, a managed AWS service that handles the HTTP transport, session lifecycle, and streaming framing so the agent code doesn't have to.&lt;/p&gt;

&lt;p&gt;Inside the Runtime, the architecture splits into two layers. The &lt;strong&gt;imperative shell&lt;/strong&gt; (&lt;code&gt;agent/builder.py&lt;/code&gt;) owns everything that touches a framework: two Strands agents (Generator and Critic), an in-process scorer, and a memory recall helper. It wires these into four plain Python callables and injects them into the &lt;strong&gt;functional core&lt;/strong&gt; (&lt;code&gt;agent/iteration.py&lt;/code&gt;). The core runs the generate/score/critique/regenerate loop and returns a ranked shortlist.&lt;/p&gt;

&lt;p&gt;At the bottom, two managed AWS services sit off the request path. Bedrock serves every LLM call via Strands' &lt;code&gt;BedrockModel&lt;/code&gt;. AgentCore Memory receives session events automatically from the Strands session manager, and returns extracted patterns when the loop asks for them at the start of each run. The async strategy extraction that makes cross-session learning possible runs roughly 60 seconds after each session ends.&lt;/p&gt;


&lt;h2&gt;
  
  
  I. The entrypoint (main.py)
&lt;/h2&gt;

&lt;p&gt;The entrypoint has one job: receive a campaign briefing, run the optimization loop and stream results back as they arrive.&lt;/p&gt;

&lt;p&gt;AgentCore Runtime gives you this as a single decorator. You don't write a router, configure middleware or manage a server process; instead, you hand it an async generator and it handles the rest: HTTP transport, session lifecycle, streaming framing, &lt;code&gt;session_id&lt;/code&gt; and &lt;code&gt;user_id&lt;/code&gt; extraction from request headers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BedrockAgentCoreApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@app.entrypoint&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;session_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default-session&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;default-user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;briefing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;validate_briefing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;except &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invalid briefing: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;

    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_initial_optimization&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;briefing&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;round_log&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rounds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;_format_round_lines&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;round_log&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nf"&gt;_format_shortlist&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;_serialize_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;ensure_ascii&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every &lt;code&gt;yield&lt;/code&gt; sends a chunk to the caller immediately. This matters because an optimization run takes from 30 to 90 seconds and the caller sees each round's scores as they complete, not a blank screen followed by a wall of text.&lt;/p&gt;

&lt;p&gt;Refinement works by re-submitting a modified briefing with the same &lt;code&gt;session_id&lt;/code&gt; — with, for example, a tighter length constraint, different brand voice or added avoid-words. AgentCore Memory carries learned patterns from prior sessions forward automatically; the entrypoint doesn't need to know about that.&lt;/p&gt;




&lt;h2&gt;
  
  
  II. The imperative shell (agent/builder.py)
&lt;/h2&gt;

&lt;p&gt;The shell is the wiring layer. It knows about Strands, AgentCore and Bedrock while the functional core doesn't (it receives callables). The shell is what turns those framework dependencies into plain Python functions the core can call without importing anything.&lt;/p&gt;

&lt;p&gt;Four things live here: a generator agent, a critic agent, an in-process scorer and a memory recall helper. At the start of each optimization run, all four get injected into the loop as callables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_initial_optimization&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;briefing&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;generator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_make_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GENERATOR_SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;critic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_make_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CRITIQUE_SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;_strip_json_array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;critique&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scored&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_drop&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;critic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;build_critique_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scored&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;to_drop&lt;/span&gt;&lt;span class="p"&gt;))).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;run_optimization&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;briefing&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;score_candidates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;critique&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;critique&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;recall&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;recall_for_user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;on_round&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_emit_round_telemetry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;round_one_prompt_builder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;round_one_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;regenerate_prompt_builder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;regenerate_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The agents
&lt;/h3&gt;

&lt;p&gt;Generator and Critic are separate Strands &lt;code&gt;Agent&lt;/code&gt; instances with separate system prompts. The generator is told to produce strict JSON arrays of strings and nothing else. The critic is told to produce two to four sentences of explicit, actionable guidance.&lt;/p&gt;

&lt;p&gt;Both share the same &lt;code&gt;AgentCoreMemorySessionManager&lt;/code&gt;, so they write to the same session namespace and see the same conversation history.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scoring stays in-process
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;score_candidates&lt;/code&gt; is not an agent and not a service call — it's a direct import of &lt;code&gt;score_subject_line&lt;/code&gt; from &lt;code&gt;scoring/score.py&lt;/code&gt;, with the heuristic rules loaded once. No network, no latency, no failure mode beyond a bad CSV row.&lt;/p&gt;

&lt;h3&gt;
  
  
  The observability hook
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;on_round=_emit_round_telemetry&lt;/code&gt; is the last injection. After each completed round, the loop calls it with a &lt;code&gt;RoundLog&lt;/code&gt;. The shell opens an OpenTelemetry span, records seven attributes (round number, candidate count, top score, top subject line, guidance excerpt), closes the span and emits a structured log line.&lt;/p&gt;




&lt;h2&gt;
  
  
  III. The functional core (agent/iteration.py)
&lt;/h2&gt;

&lt;p&gt;Keeping the loop free of framework imports means it can be read, tested and reasoned about without knowing anything about the surrounding infrastructure. You can swap the LLM, replace the scoring backend or drop AgentCore entirely and the loop doesn't change. The entire contract is in the signature:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_optimization&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;briefing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
    &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
    &lt;span class="n"&gt;critique&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;MemoryContext&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;on_round&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;RoundLog&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;actor_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anonymous&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_rounds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;plateau_epsilon&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;round_one_prompt_builder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt;
    &lt;span class="n"&gt;regenerate_prompt_builder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;IterationResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Round one generates eight candidates across eight archetypes and scores them all. Each subsequent round generates only the slots vacated by pruning. New and surviving candidates are merged, deduped by subject line and sorted by score.&lt;/p&gt;

&lt;p&gt;After sorting, the loop checks whether the top-3 average improved by at least &lt;code&gt;plateau_epsilon&lt;/code&gt; points over the prior round. If not, the scores have converged and the loop stops early to avoid wasting LLM calls.&lt;/p&gt;

&lt;p&gt;If there's still room to improve and rounds remain, the bottom 40% are pruned and handed to the critic. The critic explains what patterns made them lose in two to four sentences, referencing specific signals. That guidance feeds into the next round's regeneration prompt.&lt;/p&gt;

&lt;p&gt;After at most four rounds, the loop returns the top five candidates with the full per-round log and the memory context read at the start.&lt;/p&gt;




&lt;h2&gt;
  
  
  IV. Scoring
&lt;/h2&gt;

&lt;p&gt;Every candidate produced by the generator gets a score before the loop decides what to keep and what to prune. The scorer runs 45 heuristics against each subject line and returns a composite score between 0 and 100, a predicted open-rate band, per-dimension breakdowns and an explanation of the top contributions. They're stored in a simple CSV and cover 9 dimensions: length, urgency, spam risk, curiosity triggers, value signals, personalization, style, audience fit and brand voice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A heuristic is not a rule.&lt;/strong&gt; A rule is binary — pass or fail, always. A heuristic is a signal that contributes positively or negatively to a score based on what the literature says tends to correlate with open rates. "Subject lines between 30 and 50 characters score better" is an empirical observation, not a constraint.&lt;/p&gt;

&lt;p&gt;The same urgency words that lift acquisition open rates by +1.0 point hurt retention campaigns by −2.0 and are inappropriate for regulatory notices at −5.0. The &lt;code&gt;audience_modifier&lt;/code&gt; column captures that context-dependence per rule; audience tags are inferred automatically from briefing free-text.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rule_id,category,pattern,match_type,weight,audience_modifier
LEN_SWEET_SPOT_30_50,length,30-50,range,8.0,,
URGENCY_BASE_LIFT,urgency,urgent|hurry|now|today,word_any,3.5,acquisition:+1.0;retention:-2.0;regulatory:-5.0
SPAM_TRIGGER_FREE,spam_risk,free|100% free,word_any,-6.0,,
VALUE_FREE_SHIPPING,value_signals,free shipping,phrase,4.0,acquisition:+1.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;These weights are starting points. A real company would probably use a model trained on their own send history or something similar.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  V. The two agents and Bedrock
&lt;/h2&gt;

&lt;p&gt;The Generator and Critic share a model and a session manager but have nothing else in common.&lt;/p&gt;

&lt;p&gt;Both agents are Strands &lt;code&gt;Agent&lt;/code&gt; instances backed by &lt;strong&gt;Amazon Nova 2 Lite&lt;/strong&gt;, invoked via Bedrock's cross-region inference profile. Both are wired to &lt;code&gt;AgentCoreMemorySessionManager&lt;/code&gt;, which writes each turn as a session event automatically. Neither agent needs to know about memory because the session manager handles it transparently.&lt;/p&gt;

&lt;p&gt;The Generator enforces its output shape through Strands' structured output support rather than prompt engineering. Instead of instructing the model to "output strict JSON arrays only" and then parsing whatever it produces, you pass a Pydantic schema to the agent call and Bedrock enforces it at the model level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SubjectLineList&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;subject_lines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Email subject line candidates, one per requested slot.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;structured_output_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SubjectLineList&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;structured_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;subject_lines&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Critic is told to produce two to four sentences of explicit, actionable guidance. No generic advice — it must reference specific patterns in the candidates it's reviewing. "These lines are too long" is not useful guidance for regeneration. "The urgency phrasing in candidates 3 and 5 reads as promotional spam rather than genuine time pressure. Try anchoring to a specific benefit instead" is.&lt;/p&gt;

&lt;p&gt;That guidance becomes part of the next round's prompt directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;regenerate_prompt_builder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;briefing&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;memory_context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subject_line&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;survivors&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;guidance&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n_to_generate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The surviving candidates and the Critic's diagnosis arrive together. The Generator sees what worked, what didn't and why.&lt;/p&gt;




&lt;h2&gt;
  
  
  VI. AgentCore Memory: four strategies
&lt;/h2&gt;

&lt;p&gt;Memory touches this project in two distinct ways. The Strands session manager writes conversation events automatically and every Generator and Critic turn lands in the session namespace without any code in the agent to make that happen. Separately, &lt;code&gt;recall_for_user&lt;/code&gt; reads extracted patterns explicitly at the start of each optimization run, before any candidate is generated.&lt;/p&gt;

&lt;p&gt;Memory also works on two timescales:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Short-term:&lt;/strong&gt; Within a session, the Strands session manager keeps the full conversation in context, making sure the Critic sees every prior round and the Generator sees every prior critique.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Long-term:&lt;/strong&gt; Across sessions, AgentCore's four extraction strategies run asynchronously, roughly 60 seconds after a session ends, and populate long-term namespaces:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Namespace&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SEMANTIC&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;General facts inferred from session content&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/users/{actor_id}/facts&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;USER_PREFERENCE&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Behavioral patterns inferred from what consistently scored well or was pruned&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/users/{actor_id}/preferences&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;SUMMARIZATION&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;A compressed summary of the session&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/summaries/{actor_id}/{session_id}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;EPISODIC&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;A record of what happened&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/episodes/{actor_id}/{session_id}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The agent never writes subject lines or briefings directly to long-term storage. It writes session events and the strategies decide what is worth keeping and in what form.&lt;/p&gt;

&lt;h3&gt;
  
  
  What the loop reads
&lt;/h3&gt;

&lt;p&gt;At the start of each run, &lt;code&gt;recall_for_user&lt;/code&gt; queries the facts and preferences namespaces using the briefing as the retrieval query. It returns up to five patterns per namespace, ranked by relevance score. Those patterns flow into the round-one generation prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Patterns observed across this user's prior sessions:
- Urgency-led subject lines were pruned in 3 of 4 prior sessions
- Premium and exclusivity framing consistently reached the final shortlist

Inferred preferences for this user:
- Discount and price-led language does not align with observed brand voice
- Sentence case outperformed title case across recent campaigns
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Generator sees what worked and what didn't before producing a single candidate.&lt;/p&gt;




&lt;h2&gt;
  
  
  VII. Observability
&lt;/h2&gt;

&lt;p&gt;The optimization loop runs for 30–90 seconds across up to four rounds. Without instrumentation, a slow run and a failing run look identical from the outside.&lt;/p&gt;

&lt;p&gt;After each completed round, the loop calls an &lt;code&gt;on_round&lt;/code&gt; callback with a &lt;code&gt;RoundLog&lt;/code&gt;. The shell's implementation emits two concurrent signals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_emit_round_telemetry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;round_log&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RoundLog&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;span&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_tracer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;optimization_round&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;round.number&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;round_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;round_number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;round.candidate_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;round.pruned_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;round_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pruned&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;round.top3_average&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top3_avg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;round.top_score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;round.top_subject_line&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_subject&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attribute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;round.guidance_excerpt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;round_log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;guidance&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;optimization_round_complete&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;extra&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{...})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An &lt;code&gt;optimization_round&lt;/code&gt; OTel span with seven attributes appears as a child span under each invocation in AgentCore traces. The &lt;code&gt;round.top3_average&lt;/code&gt; and &lt;code&gt;round.top_score&lt;/code&gt; attributes show whether scores are improving across rounds. &lt;code&gt;round.guidance_excerpt&lt;/code&gt; shows what the critic said before each regeneration step — the most useful signal when a run plateaus unexpectedly.&lt;/p&gt;

&lt;p&gt;The same fields appear as a structured &lt;code&gt;log.info&lt;/code&gt; event, queryable in CloudWatch Logs Insights:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;fields&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;round_number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top3_average&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;guidance_excerpt&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;filter&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;"optimization_round_complete"&lt;/span&gt;
&lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;sort&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="k"&gt;asc&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;traceId&lt;/code&gt; field on the log event matches the span ID in the traces UI, so you can move between the two surfaces without losing context.&lt;/p&gt;




&lt;h2&gt;
  
  
  VIII. What I didn't use this time
&lt;/h2&gt;

&lt;p&gt;AgentCore ships with more primitives than this project uses. Some of the ones that didn't make it in are worth naming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code Interpreter&lt;/strong&gt; is the right choice when code has to run in a sandbox: the LLM authors it at runtime, it carries untrusted dependencies or it comes from a source outside the agent's own deployment artifact. The scoring script here has 250 lines of standard library Python, authored by the agent's owner, deployed in the same place as &lt;code&gt;main.py&lt;/code&gt;. There is no organizational or security boundary for Code Interpreter to enforce. Adding a managed sandbox would introduce spin-up latency, a separate billing line, a separate failure mode and a service dependency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Policy&lt;/strong&gt; manages what agents are allowed to do — which tools they can call, which users can invoke them, which actions are gated behind approval. It earns its place in multi-tenant deployments where different users have different permissions or in agentic workflows where the consequences of a wrong action are hard to reverse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gateway&lt;/strong&gt; exposes an agent as a managed API endpoint with authentication, rate limiting, and request routing. It's the right choice when the agent is a shared service consumed by multiple callers across organizational boundaries.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Every primitive on this list is genuinely useful for the problems it was designed to solve. The real challenge is identifying which problem you actually have. Adopting every available managed service does not make an agent more sophisticated, and it often makes the system harder to reason about, more expensive to run or even more fragile.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When evaluating a new service or functionality, think about: what complexity am I introducing alongside it, what failure modes come with it, and whether it meaningfully simplifies the system.&lt;/p&gt;




&lt;h2&gt;
  
  
  IX. How to deploy and run it
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Before deploying, make sure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An AWS account with Amazon Bedrock access&lt;/li&gt;
&lt;li&gt;The agentcore CLI installed (&lt;code&gt;npm install -g @aws/agentcore-cli&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;AWS credentials configured with permissions for Bedrock, CloudFormation, IAM, ECR, and S3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To confirm your credentials are working:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws sts get-caller-identity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Testing locally before deploying
&lt;/h3&gt;

&lt;p&gt;To iterate on the agent without incurring a full deploy cycle, you can run the local dev server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This starts the runtime at &lt;code&gt;http://localhost:8080&lt;/code&gt;. Bedrock model calls still go to AWS, so you need valid credentials, but no CloudFormation changes are made. Memory is not active in dev mode unless you export &lt;code&gt;MEMORY_ID&lt;/code&gt; manually with the ID from your deployed memory resource.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqrnvmprq6udly9i6p62.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyqrnvmprq6udly9i6p62.png" alt=" " width="800" height="553"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploying
&lt;/h3&gt;

&lt;p&gt;From the project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CLI runs CDK under the hood, synthesizing a CloudFormation stack, bootstrapping the CDK environment if needed, and provisioning the runtime, the memory resource, and the IAM roles. The first deploy takes a couple of minutes. When it completes you'll see a runtime ARN in the output — that ARN is the deployed endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  Running an invocation
&lt;/h3&gt;

&lt;p&gt;The project ships with four example briefings under &lt;code&gt;app/subject_line_optimizer/briefing/examples/&lt;/code&gt;. Each is a self-contained campaign briefing JSON ready to send.&lt;/p&gt;

&lt;p&gt;To invoke the deployed agent with the reactivation example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentcore invoke &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--target&lt;/span&gt; default &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--prompt-file&lt;/span&gt; app/subject_line_optimizer/briefing/examples/reactivation.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--session-id&lt;/span&gt; &lt;span class="s2"&gt;"create-an-id-here-001"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--user-id&lt;/span&gt; &lt;span class="s2"&gt;"your-user-id"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--stream&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to note:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--target default&lt;/code&gt; routes to the deployed endpoint, not the local dev server&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--prompt-file&lt;/code&gt; reads the briefing directly; the CLI wraps it in &lt;code&gt;{"prompt": "..."}&lt;/code&gt; before sending, so pass the raw briefing file&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--session-id&lt;/code&gt; must be at least 33 characters; a UUID works well&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--user-id&lt;/code&gt; scopes the AgentCore Memory namespaces — use a consistent identifier across sessions so the agent accumulates preferences for that user over time&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--stream&lt;/code&gt; prints each chunk as it arrives, so you see the rounds progressing in real time&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example output
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Optimizing subject lines for: Q3 Lapsed Customer Reactivation

[round 1]
  83.0  Ready to rediscover your favorite things? Claim 25% off now
        (LEN_ACCEPTABLE_20_60, URGENCY_BASE_LIFT, VALUE_PERCENT_OFF)
  80.0  Unlock 25% off – your loyalty reward is ready to claim
        (LEN_ACCEPTABLE_20_60, VALUE_PERCENT_OFF, LOYALTY_LANGUAGE)
  ...
  pruned: 3
  guidance for next round: Avoid overly conversational openings...

[round 2]
  91.2  Claim your 25% loyalty reward – 14 days to save
        (LEN_SWEET_SPOT_30_50, LOYALTY_LANGUAGE)
  ...

=== Final shortlist ===
1. Claim your 25% loyalty reward – 14 days to save
   score 91.2   open-rate band 42.0–50.0%
   ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final chunk is a machine-readable JSON object with the full shortlist, per-round logs, and plateau status.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;There is currently no built-in UI in the AWS console to browse memory contents. A community-built tool called &lt;strong&gt;AgentCore Memory Browser&lt;/strong&gt; fills this gap.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7nwfqoxg0hj2930bz2bt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7nwfqoxg0hj2930bz2bt.png" alt=" " width="800" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqcldl972s83wmmtyrop.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqcldl972s83wmmtyrop.png" alt=" " width="800" height="354"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;The thing I keep coming back to is that AgentCore doesn't have an opinion about what the agent should look like. Bedrock Agents did, and the opinion was reasonable for a lot of cases. AgentCore gives you a Runtime, a Memory service, a set of primitives, and trusts you to assemble them. &lt;strong&gt;That trust is the feature.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>bedrock</category>
      <category>agentcore</category>
      <category>aws</category>
      <category>ai</category>
    </item>
    <item>
      <title>Upload, Describe, Discover: Architecting a Marketing Assets Library</title>
      <dc:creator>Ana Silva</dc:creator>
      <pubDate>Sat, 02 May 2026 00:46:13 +0000</pubDate>
      <link>https://dev.to/aws-builders/upload-describe-discover-architecting-a-marketing-assets-library-3odi</link>
      <guid>https://dev.to/aws-builders/upload-describe-discover-architecting-a-marketing-assets-library-3odi</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpdix85tye30dj2pmt8i9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpdix85tye30dj2pmt8i9.png" alt="Glórund sets forth to seek Túrin, J.R.R. Tolkien" width="800" height="597"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Glórund crossing heterogeneous terrain in search of Túrin felt like an apt opener for an article about searching across heterogeneous assets!&lt;/p&gt;

&lt;p&gt;Is this too much of a stretch? Well, anyway…&lt;/p&gt;

&lt;p&gt;If you search for "digital asset management software," you'll find many mature solutions. Adobe Experience Manager — probably the most recognizable name in enterprise marketing infrastructure — handles digital assets as part of a broader content management platform. Cloudinary and Bynder represent the more focused end of the spectrum: purpose-built DAMs with polished interfaces, rich metadata management, and integrations designed for marketing teams. These are mature, well-funded products with years of iteration behind them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So why build one from scratch?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The honest answer: I didn't build this because the market had a gap. I built it because I had some questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How do you model metadata for creative assets that are structurally heterogeneous: a PNG, an HTML email and a push notification living in the same library?&lt;/li&gt;
&lt;li&gt;How do you integrate an LLM into an indexing pipeline without making uploads feel slow?&lt;/li&gt;
&lt;li&gt;How do you expose a single search endpoint that handles both rigid filter-based queries and natural language, without the interface becoming a mess?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are questions that appear the moment you try to build anything resembling a searchable content repository. Whether you're integrating with an off-the-shelf DAM via API, building a lightweight internal tool or extending an existing platform, the underlying mechanics are the same. Understanding them gives you leverage regardless of which path you choose.&lt;/p&gt;

&lt;p&gt;The fictional system I built — Orqestra Assets — is a DAM focused on marketing creative pieces: app banners (PNG), email templates (HTML), and SMS/push payloads (JSON). It's not a production system, it's a deliberate architecture built to answer those questions, with real code, real tradeoffs, and a stack that maps directly to what you'd use in an AWS environment.&lt;/p&gt;

&lt;p&gt;It's also part of a larger platform I've been working on, so there may be more parts to come. Here, I'll walk through the architecture for Assets: how assets are ingested, how they're indexed asynchronously with LLM-generated descriptions and how search works across both structured filters and natural language queries.&lt;/p&gt;

&lt;p&gt;The code is available on &lt;a href="https://dev.toYOUR_GITHUB_URL_HERE"&gt;Github&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The solution draft
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhcwkbdx0tcavix9hl79.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffhcwkbdx0tcavix9hl79.png" alt="Architecture diagram" width="800" height="613"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;"Orqestra Assets" is built around three distinct flows that happen in sequence but are deliberately decoupled from each other:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Upload&lt;/strong&gt;: a client uploads an asset or submits a text payload; the API stores the file in S3, registers a row in PostgreSQL, and publishes a message to an SQS queue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Describe&lt;/strong&gt;: a worker consumes the queue, generates a description if needed, creates an embedding and upserts a document into OpenSearch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discover&lt;/strong&gt;: a client queries the library, either through structured filters resolved in SQL, or through natural language resolved via hybrid search in OpenSearch, enriched with data from Postgres.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The stack maps directly to AWS primitives you'd use in production: S3 for object storage, SQS for async decoupling, OpenSearch for vector and full-text search, PostgreSQL as the source of truth for structured metadata, and the OpenAI API for both description generation and embeddings.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Upload
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gx7cst6rgqtlfwjwntd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gx7cst6rgqtlfwjwntd.png" alt="Asset library screenshot" width="800" height="631"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The upload layer has the job of accepting an asset, persist it reliably and hand off to the indexing pipeline without blocking the client.&lt;/p&gt;

&lt;p&gt;"Without blocking" is the key constraint. A multimodal LLM call for a PNG can take several seconds, so if the upload endpoint waited for indexing to complete before responding, the client experience would be unacceptable. Because of this, the API does the minimum necessary synchronously, and delegates everything else to a queue.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8wf62ko8wnwkbq8kkk8f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8wf62ko8wnwkbq8kkk8f.png" alt="Upload sequence diagram" width="800" height="294"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The API receives the file, generates a S3 key, stores the object, writes a row to PostgreSQL and publishes a message to SQS. The response returns immediately with the asset ID and &lt;code&gt;indexing_status: pending&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The S3 key encodes the asset's channel and type in the path — a PNG uploaded to a campaign might land at &lt;code&gt;campaigns/{id}/App/{space}/{uuid}.png&lt;/code&gt; — and uses a UUID as the filename. Separately, the API computes a SHA-256 of the content and stores it in the asset's metadata, giving you a foundation for deduplication logic if you need it later.&lt;/p&gt;

&lt;p&gt;The OpenSearch document ID is derived from a hash of the S3 key. This means that if the same object triggers multiple indexing attempts — a duplicate queue message, an S3 notification racing with an explicit publish — the upsert always lands on the same document. Re-indexing is safe; OpenSearch doesn't accumulate duplicates.&lt;/p&gt;

&lt;p&gt;PNG and HTML come as file uploads — &lt;code&gt;POST /assets/upload-app&lt;/code&gt; and &lt;code&gt;POST /assets/upload-email&lt;/code&gt; respectively. The API validates the format, reads the bytes, and writes the object to S3. SMS and push work differently: the client submits the message text as a JSON body to &lt;code&gt;POST /assets/text&lt;/code&gt;, and the API itself serialises it into a &lt;code&gt;.json&lt;/code&gt; file before writing it to the bucket. There is no file to upload; the file is constructed server-side.&lt;/p&gt;

&lt;p&gt;All three paths write a row to the &lt;code&gt;assets&lt;/code&gt; table with a &lt;code&gt;channel&lt;/code&gt; and &lt;code&gt;format&lt;/code&gt; field — App/png, E-mail/html, SMS/text, Push/text — and then publish to the same queue. By the time the worker picks up the message, it knows what it's dealing with: the combination of channel and format is enough to decide whether to call the vision model, the text completion model, or neither and go straight to embedding.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Describe
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2hdj4c8ghtz4r8zh24tc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2hdj4c8ghtz4r8zh24tc.png" alt="Describe sequence diagram" width="800" height="554"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once a message lands in the queue, a background worker takes over. Its job is to do everything the upload endpoint deliberately skipped: fetch the asset from S3, generate a description if the asset type requires one, create an embedding, and push the result to OpenSearch.&lt;/p&gt;

&lt;p&gt;So, "the worker" is a long-polling SQS consumer. It receives batches of up to ten messages, processes each one concurrently using a thread pool and deletes a message from the queue only after its asset has been successfully indexed. If processing fails, the message is not deleted, SQS makes it visible again after the visibility timeout, and the worker will retry on the next poll. Failures that exhaust all retries land in the DLQ (dead-letter queue).&lt;/p&gt;

&lt;p&gt;For each message, the worker reads the &lt;code&gt;s3_key&lt;/code&gt; from the payload, downloads the object from S3 and decides what to do based on &lt;code&gt;channel&lt;/code&gt; and &lt;code&gt;format&lt;/code&gt;. The decision tree from that point is straightforward. For PNG app banners, the worker encodes the image in base64 and sends it to a multimodal model with a prompt asking for a concise marketing description: dominant colours, visible text, campaign theme, appropriate channel. For HTML email templates, it decodes the file and sends it to the same model with a different prompt focused on the email's call to action, tone and campaign fit. For SMS and push payloads there is no LLM call, the text is extracted directly from the JSON stored in S3 and used as-is.&lt;/p&gt;

&lt;p&gt;In our case, minor overruns are acceptable, so a 500-character prompt instruction is sufficient to keep descriptions within a reasonable size without needing hard truncation or other techniques in code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;_SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a digital marketing expert specialized in creative asset cataloguing. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generate concise, retrieval-optimized descriptions of marketing assets. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reply only with the description text, in English, in at most 500 characters.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No introduction, no title, no &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;here is&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, no numbered lists, no meta-commentary.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;_PNG_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Describe this creative asset for search retrieval. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Include: dominant colors, main visual elements, visible text, campaign theme, and suitable channel.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;_HTML_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Describe this HTML email template for search retrieval. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Include: main theme, call to action, message tone, and suitable campaign type.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the asymmetry the queueing design was built to absorb. A push notification costs one fast JSON parse. An app banner costs a vision model call that might take several seconds. From the upload client's perspective, both are the same: post the asset, get a response, check back later.&lt;/p&gt;

&lt;p&gt;Once a description exists, the worker prepends the asset's display title if one was set, and passes the combined text to OpenAI's &lt;code&gt;text-embedding-3-small&lt;/code&gt; to generate a vector. That vector, along with the description and the asset's structured fields (channel, format, locale, lifecycle status, campaign id) is upserted into OpenSearch under the document ID derived from the S3 key.&lt;/p&gt;

&lt;p&gt;The final step is a write back to Postgres: the description and &lt;code&gt;embedding_id&lt;/code&gt; columns are updated and &lt;code&gt;indexing_status&lt;/code&gt; is set to &lt;code&gt;indexed&lt;/code&gt;. If anything fails before that point, the status is set to &lt;code&gt;error&lt;/code&gt; instead, and the message stays in the queue for retry.&lt;/p&gt;

&lt;p&gt;One thing this pipeline doesn't do is validate description quality before indexing. A description that's technically successful but semantically weak lands in OpenSearch indistinguishably from a good one. The practical consequence is that recall degrades silently: users searching for "bold red promotional banner" may not surface an asset that matches visually, if the model described it as "a marketing creative with promotional messaging." Validating description quality without a reference set is hard. The most honest mitigation at this stage is observability: log every description, monitor length distributions across batches and treat significant anomalies as a signal to inspect manually.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ The local approach was to make embedding generation sit outside OpenSearch entirely. The trade-off of this option is that you own the orchestration: every indexing job and every search request carries an outbound API call to a model provider, with the associated latency and failures.&lt;/p&gt;

&lt;p&gt;One alternative, available in production on Amazon OpenSearch Service, is to register a model connector either pointing to Amazon Bedrock or to an external provider, and delegate embedding generation to OpenSearch itself via an ingest pipeline processor at index time and a neural query at search time. In that setup, the worker would send plain text and OpenSearch would handle the vector internally, removing the custom embedding code from the application entirely.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  3. Discover
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5y7fbxzdi9mura4hm50h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5y7fbxzdi9mura4hm50h.png" alt=" " width="800" height="521"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the library page, assets appear as a card grid, where PNG banners render with a thumbnail, HTML and text assets show an icon and a type badge. Clicking any card opens a detail sheet with the full metadata, the generated description, the indexing status and a download link.&lt;/p&gt;

&lt;p&gt;Above the grid sits a search bar and a row of filters: channel, format, locale, lifecycle status, tags and campaign partition. They all coexist on the same view and feed the same request. A user can narrow to all active push notifications in Brazilian Portuguese or type a natural language query like "summer promotion with red background" and let the ranking handle the rest. The user can also do both at once, combining structured filters with semantic search in a single call. Typing triggers a debounced query, so the grid updates as the user types without hammering the API on every keystroke.&lt;/p&gt;

&lt;p&gt;When no query text is provided, the request goes entirely through Postgres. Assets are filtered by the supplied fields, ordered by creation date and paginated. It's a straightforward SQL query and returns quickly. When a query string is present, the path is different.&lt;/p&gt;

&lt;h3&gt;
  
  
  How hybrid search works
&lt;/h3&gt;

&lt;p&gt;To understand why hybrid search matters here, it helps to understand what each component does on its own.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;BM25&lt;/strong&gt; is the algorithm behind traditional keyword search. It ranks documents by how often the query terms appear in them, adjusted for document length and term frequency across the corpus. It's fast, interpretable and works well when the user knows the right words. But it's brittle: a query for "urgent promotional tone" returns nothing if none of those exact words appear in the indexed descriptions, even if a perfectly relevant asset exists.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;kNN&lt;/strong&gt; (k-nearest neighbors) operates on embeddings — vector representations that encode semantic meaning rather than surface text. When you embed the query and search for the nearest vectors in the index, you're finding assets that are conceptually similar, regardless of whether they share any words with the query. This is what makes "something warm and summery for a mobile audience" a valid search. kNN is indifferent to exact matches, though, so a query for a specific campaign name or a precise tag will often return semantically adjacent but wrong results.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F84khqg2nn3x7xzld2dye.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F84khqg2nn3x7xzld2dye.png" alt=" " width="800" height="571"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid search&lt;/strong&gt; combines both. In this project, when a natural language query arrives, the API embeds it in real time using &lt;code&gt;text-embedding-3-small&lt;/code&gt;, the same model used during indexing, and sends both the query string and the embedding to OpenSearch as a hybrid query. OpenSearch runs the BM25 and kNN sub-queries in parallel, normalizes each score set independently using min-max normalization, and combines them into a single ranking via weighted arithmetic mean. The weights favor the vector component slightly, on the assumption that semantic similarity is more useful than keyword overlap for creative asset retrieval.&lt;/p&gt;

&lt;p&gt;Those asset IDs come back from OpenSearch without the full metadata. The API then fetches the corresponding rows from Postgres — joined to whatever SQL-only filters remain, such as tags — and re-orders them to match the ranking OpenSearch produced. What the client receives is a page of fully hydrated asset objects, ordered by relevance, with pagination driven by the original limit and offset parameters.&lt;/p&gt;

&lt;p&gt;The separation between the two stores is intentional. OpenSearch owns relevance ranking, while Postgres owns the "facts" metadata.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does it work?
&lt;/h2&gt;

&lt;p&gt;Hybrid search returns results, but that doesn't mean it returns the right results. Without a way to measure retrieval quality, tuning the pipeline is guesswork: you don't know whether changing parameters helped or whether a prompt revision improved description usefulness. Evaluation doesn't need to be elaborate to be useful, but it needs to be systematic.&lt;/p&gt;

&lt;p&gt;What I built in this project is a lightweight evaluation pipeline to test the natural-language retrieval quality only. That means no deterministic UI filters (channel, format, locale, etc.) are allowed to influence the score. Each test query is sent as plain language, and the system must rank relevant assets using the same hybrid search path the system uses.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw1tdjsayh8b6h6j5yd6c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw1tdjsayh8b6h6j5yd6c.png" alt=" " width="800" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What was built
&lt;/h3&gt;

&lt;p&gt;The evaluation flow is split into three scripts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;evals/generate_eval_dataset.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;evals/upload_assets.py&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;evals/run_eval.py&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, they form a reproducible loop from synthetic asset generation to scored retrieval results.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1) Dataset generation (&lt;code&gt;generate_eval_dataset.py&lt;/code&gt;)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This script creates a controlled benchmark corpus:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;synthetic creative assets (PNG, HTML, SMS, Push);&lt;/li&gt;
&lt;li&gt;a manifest describing each asset;&lt;/li&gt;
&lt;li&gt;a query specification file (&lt;code&gt;query_specs.json&lt;/code&gt;) containing query, expected_ids and type.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The query types were reorganized around the search intent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;exact_intent&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;paraphrase_intent&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cross_channel_intent&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ambiguous_intent&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes reporting easier to interpret: you can see whether the engine performs differently on literal requests, paraphrased requests, cross-channel intents or harder ambiguous intents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2) Upload + dataset resolution (&lt;code&gt;upload_assets.py&lt;/code&gt;)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This script uploads generated assets to the DAM API, waits for indexing, and resolves logical IDs into real API asset IDs. It then builds &lt;code&gt;eval_dataset.json&lt;/code&gt;, which is what the runner consumes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3) Evaluation runner (&lt;code&gt;run_eval.py&lt;/code&gt;)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The runner reads &lt;code&gt;eval_dataset.json&lt;/code&gt;, sends each query to &lt;code&gt;POST /assets/search&lt;/code&gt; and compares ranked results against expected relevant assets.&lt;/p&gt;

&lt;h3&gt;
  
  
  What it measures
&lt;/h3&gt;

&lt;p&gt;The evaluation reports quality per query, per intent category and globally.&lt;/p&gt;

&lt;p&gt;The metrics are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Success@1&lt;/strong&gt; — Did the first result match any expected relevant asset?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Success@3&lt;/strong&gt; — Did at least one relevant asset appear in top 3?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recall@3&lt;/strong&gt; — How much of the relevant set appears in top 3?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MRR&lt;/strong&gt; — How early does the first relevant result appear?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to run it
&lt;/h3&gt;

&lt;p&gt;From the repository root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1) Start the stack&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--build&lt;/span&gt;

&lt;span class="c"&gt;# 2) Build eval dataset from existing uploaded mapping&lt;/span&gt;
docker compose &lt;span class="nt"&gt;--profile&lt;/span&gt; &lt;span class="nb"&gt;eval &lt;/span&gt;run &lt;span class="nt"&gt;--rm&lt;/span&gt; eval-upload &lt;span class="nt"&gt;--build-dataset-only&lt;/span&gt;

&lt;span class="c"&gt;# 3) Run evaluation&lt;/span&gt;
docker compose &lt;span class="nt"&gt;--profile&lt;/span&gt; &lt;span class="nb"&gt;eval &lt;/span&gt;run &lt;span class="nt"&gt;--rm&lt;/span&gt; eval-run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Outputs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;evals/output/eval_results.json&lt;/code&gt; for per-query details&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;evals/output/eval_summary.json&lt;/code&gt; for aggregate metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Interpreting results
&lt;/h3&gt;

&lt;p&gt;With &lt;code&gt;K=3&lt;/code&gt;, the benchmark produced:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Success@1: 0.7143
Success@3: 1.0000
Recall@3:  0.8988
MRR:       0.8393
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The big picture: the right asset is always somewhere in the top 3, but it only shows up in first place about 71% of the time. The system is good at finding the right assets, but not always at ranking them first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Paraphrase intent — perfect&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;S@1 = 1.00, S@3 = 1.00, Recall@3 = 1.00, MRR = 1.00
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When describing what is wanted in natural language (&lt;code&gt;pink floral banner for mother's day&lt;/code&gt;, &lt;code&gt;abandoned cart recovery email&lt;/code&gt;) the system got it right every time. All 12 queries in this category landed the correct asset at rank 1.&lt;/p&gt;

&lt;p&gt;This is the category the system is built for: the LLM-generated descriptions and the embeddings are doing exactly what they should, bridging the gap between how a user phrases a request and how the asset was originally described. With the hybrid weighting set at 0.35 lexical / 0.65 vector, this is also the category that benefits most from the current configuration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Cross-channel intent — mostly limited by the K=3 cap&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;S@1 = 1.00, S@3 = 1.00, MRR = 1.00, Recall@3 = 0.7083
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a query expects 4 assets (one per channel) and we only look at the top 3, we can never get full recall. That's a limitation of how we chose to measure, not of the system itself. Three of the four queries hit this ceiling cleanly.&lt;/p&gt;

&lt;p&gt;The exception is &lt;code&gt;reactivation of inactive customers with offer&lt;/code&gt;: the email shows up first, but the SMS and Push versions don't make the top 3, even though they exist and the system finds them on other queries. This one query is dragging the average down.&lt;/p&gt;

&lt;p&gt;It's also worth noting that K=3 is a deliberate choice, not the only sensible one. It reflects what users actually see in the first row of results, but for cross-channel queries it under-rewards the system. A small refinement worth considering would be reporting Recall@N (where N matches the number of expected assets).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Exact intent — the weak spot&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;S@1 = 0.125, S@3 = 1.00, Recall@3 = 1.00, MRR = 0.5208
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For keyword-style queries (&lt;code&gt;free shipping&lt;/code&gt;, &lt;code&gt;order tracking&lt;/code&gt;, &lt;code&gt;black friday urgency countdown&lt;/code&gt;) the right asset is always in the top 3, but almost never at rank 1. It usually lands at rank 2 or 3, behind thematically similar assets.&lt;/p&gt;

&lt;p&gt;The cause is fairly direct: the hybrid search currently weights lexical matches at 0.35 and vector matches at 0.65. That bias works beautifully for paraphrased queries, but for short, literal queries it lets thematically related assets outrank the one that matches the exact words. This is still good enough, though.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Ambiguous intent — better than expected&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;S@1 = 0.75, S@3 = 1.00, Recall@3 = 0.5833, MRR = 0.8333
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Vague queries actually do better at S@1 than the exact ones above, which reinforces the idea that the system favours semantic matches. For example, &lt;code&gt;creative with warm tone and soft visual elements&lt;/code&gt; correctly surfaces the Mother's Day pink floral banner at rank 1, even though nothing in the query mentions Mother's Day or florals.&lt;/p&gt;

&lt;p&gt;Recall@3 is the lowest of all categories, but that's expected: when a query is broad, more assets could plausibly be relevant, and not all of them fit in the 3 slots.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;This project forced a series of decisions that documentation tends to skip over: where exactly to place filters, why description quality matters, how the failure model of a queue-based pipeline is fundamentally different from a synchronous one. Those things only become visible when you have to make them yourself.&lt;/p&gt;

&lt;p&gt;One thing I'd revisit is embedding ownership. Generating embeddings in the application layer works fine at this scale, but it's something that Amazon OpenSearch can absorb in production through model connectors and neural queries. Whether that tradeoff is worth it depends on how much you want to own.&lt;/p&gt;

&lt;p&gt;Evaluation showed that the search reliably finds the right things. The main area that could be improved is ranking, especially for short keyword queries. One option worth exploring could be to give literal word matches a bit more weight in the hybrid search.&lt;/p&gt;

&lt;p&gt;If you've built something similar or made different trade-offs around indexing, search or evaluation, I'd be curious to hear how you approached it.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>opensearch</category>
      <category>ai</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
