<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: rohan bhosale</title>
    <description>The latest articles on DEV Community by rohan bhosale (@rohan_06).</description>
    <link>https://dev.to/rohan_06</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3322978%2Fc286eca8-907d-4bf9-8fde-b524f495df5f.jpg</url>
      <title>DEV Community: rohan bhosale</title>
      <link>https://dev.to/rohan_06</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rohan_06"/>
    <language>en</language>
    <item>
      <title>From 15-Minute Lambda Timeouts to Sub-Second Runs — A DynamoDB Optimization Story</title>
      <dc:creator>rohan bhosale</dc:creator>
      <pubDate>Mon, 11 May 2026 15:58:57 +0000</pubDate>
      <link>https://dev.to/rohan_06/from-15-minute-lambda-timeouts-to-sub-second-runs-a-dynamodb-optimization-story-5717</link>
      <guid>https://dev.to/rohan_06/from-15-minute-lambda-timeouts-to-sub-second-runs-a-dynamodb-optimization-story-5717</guid>
      <description>&lt;h2&gt;
  
  
  The morning the dashboards went quiet
&lt;/h2&gt;

&lt;p&gt;It started, as these things usually do, with a Teams ping.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Hey, the subscription stats haven't refreshed in two days."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;TTwo scheduled Lambdas in our observability project — &lt;code&gt;calculateSubscriptionStats&lt;/code&gt; and &lt;code&gt;calculatePartnerNotificationStats&lt;/code&gt; — were responsible for crunching the daily numbers our internal teams relied on. For a while they had been &lt;em&gt;slow but working&lt;/em&gt;. Now they were just… not finishing.&lt;/p&gt;

&lt;p&gt;A quick look at CloudWatch confirmed the bad news. Both jobs were hammering against the 15-minute Lambda ceiling (900,000 ms), getting killed mid-flight, and occasionally tripping &lt;code&gt;OutOfMemory&lt;/code&gt; errors before the timeout could even register.&lt;/p&gt;

&lt;p&gt;Here's what the duration log looked like before I touched anything — every single run, day after day, glued to the 900,000 ms ceiling:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;code&gt;calculateSubscriptionStats&lt;/code&gt; — multiple invocations per day, all timing out:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fij0v5ruod9hvfzuhxw5m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fij0v5ruod9hvfzuhxw5m.png" alt="CloudWatch durations before optimization — both Lambdas timing out at 900,000ms" width="354" height="190"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;code&gt;calculateSubscriptionStats&lt;/code&gt; — multiple invocations per day, all timing out:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwlsxz6kblpqe970vt4u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqwlsxz6kblpqe970vt4u.png" alt="CloudWatch durations before optimization — both Lambdas timing out at 900,000ms" width="337" height="187"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every row in both panels is &lt;code&gt;900,000.00&lt;/code&gt; — Lambda's hard 15-minute ceiling. The jobs weren't slow, they were &lt;em&gt;dying&lt;/em&gt;. And because they died mid-flight, the downstream stats were either stale or partial.&lt;/p&gt;

&lt;p&gt;The pattern was clear: the more data we accumulated, the closer to the cliff we got. And we'd just gone over it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why it was actually breaking
&lt;/h2&gt;

&lt;p&gt;I expected to find one bad query. Instead I found three compounding problems, each making the others worse.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 1 — We were downloading everything, every time
&lt;/h3&gt;

&lt;p&gt;Our DynamoDB access layer was written in the &lt;em&gt;just-give-me-the-row&lt;/em&gt; style. Every call returned the full item schema. For our high-volume tables (Addon, Bundle, Partner Notifications), each item carried fat fields nobody downstream actually used: &lt;code&gt;oldData&lt;/code&gt;, &lt;code&gt;newData&lt;/code&gt;, and deeply nested &lt;code&gt;metadata&lt;/code&gt; blobs from audit history.&lt;/p&gt;

&lt;p&gt;DynamoDB paginates results at a hard 1 MB per page. When 70% of each item's bytes are fields we'll throw away, the math gets ugly fast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More pages → more network roundtrips&lt;/li&gt;
&lt;li&gt;More JSON to parse → more CPU&lt;/li&gt;
&lt;li&gt;More objects in heap → Node.js GC working overtime&lt;/li&gt;
&lt;li&gt;Eventually → &lt;code&gt;JavaScript heap out of memory&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Problem 2 — Recomputing history every single day
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;calculatepartnerNotificationStats&lt;/code&gt; was doing something even worse: a full table sweep on every invocation, just to compute running distributions like &lt;code&gt;uniqueCustomers&lt;/code&gt; and &lt;code&gt;monthlyCounts&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This works fine when the table has 10,000 rows. It is a death sentence when the table has millions and grows daily. Yesterday's run had to re-read yesterday's data &lt;em&gt;and&lt;/em&gt; every day before it — forever.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 3 — The accumulators lived in Lambda memory
&lt;/h3&gt;

&lt;p&gt;Because the running totals were rebuilt from scratch each run, there was no concept of "previous state." The Lambda was simultaneously the worker &lt;em&gt;and&lt;/em&gt; the source of truth. If it died, the numbers were just… wrong until the next successful run.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix, in three layers
&lt;/h2&gt;

&lt;p&gt;I'll walk through these in the order I implemented them, because each one unlocked the next.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1 — &lt;code&gt;ProjectionExpression&lt;/code&gt; everywhere
&lt;/h3&gt;

&lt;p&gt;The cheapest win, and the one I should have done a year ago. Instead of fetching whole items, ask DynamoDB for exactly the fields you'll use.&lt;/p&gt;

&lt;p&gt;Before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;TableName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PartnerNotifications&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;KeyConditionExpression&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;partner_id = :pid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;ExpressionAttributeValues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;:pid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;partnerId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ddb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;promise&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// result.Items each ~8KB, mostly oldData/newData/metadata we don't use&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;TableName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PartnerNotifications&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;KeyConditionExpression&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;partner_id = :pid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;ProjectionExpression&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;smc, sk, created_at, #status, #type, #event&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;ExpressionAttributeNames&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#status&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;status&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#event&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;event&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;ExpressionAttributeValues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;:pid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;partnerId&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ddb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;promise&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="c1"&gt;// result.Items each ~600 bytes — same rows, fraction of the bytes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A gotcha I hit immediately: &lt;code&gt;status&lt;/code&gt;, &lt;code&gt;type&lt;/code&gt;, &lt;code&gt;event&lt;/code&gt;, &lt;code&gt;action&lt;/code&gt;, &lt;code&gt;plan&lt;/code&gt;, and &lt;code&gt;bundle&lt;/code&gt; are all &lt;a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ReservedWords.html" rel="noopener noreferrer"&gt;DynamoDB reserved words&lt;/a&gt;. The query throws &lt;code&gt;ValidationException&lt;/code&gt; the moment you use them in a &lt;code&gt;ProjectionExpression&lt;/code&gt; without aliasing. &lt;code&gt;ExpressionAttributeNames&lt;/code&gt; with &lt;code&gt;#&lt;/code&gt;-prefixed placeholders fixes it cleanly.&lt;/p&gt;

&lt;p&gt;The payoff: because DynamoDB packs items into 1 MB pages, slimmer items mean &lt;em&gt;exponentially&lt;/em&gt; more rows per page. We estimated network payload and memory dropped by &lt;strong&gt;80–90%&lt;/strong&gt; across these queries. Same data, way less weight.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2 — Incremental delta fetching with S3 as the state store
&lt;/h3&gt;

&lt;p&gt;This is where the real timeout fix came from. Even with skinnier items, scanning the entire history daily was still O(n) on a dataset growing daily. The trick is to stop doing that.&lt;/p&gt;

&lt;p&gt;The new flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Read previous state&lt;/strong&gt; from S3 at the top of the Lambda — &lt;code&gt;stats.json&lt;/code&gt; and &lt;code&gt;event-stats.json&lt;/code&gt;. Each carries a top-level &lt;code&gt;lastExecutionTime&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query DynamoDB for the delta only&lt;/strong&gt; — pass &lt;code&gt;lastExecutionTime&lt;/code&gt; into the &lt;code&gt;KeyConditionExpression&lt;/code&gt; (when the sort key is time-based) or &lt;code&gt;FilterExpression&lt;/code&gt; (when it isn't).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write the new state&lt;/strong&gt; back to S3 at the end, with the updated &lt;code&gt;lastExecutionTime&lt;/code&gt; set to the start of this run.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Sketch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// 1. Hydrate previous state from S3&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prevStats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;s3GetJson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;observability/stats.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lastExecutionTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;prevStats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lastExecutionTime&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;nowIso&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// 2. Fetch only what's new&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;TableName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PartnerNotifications&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;KeyConditionExpression&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;partner_id = :pid AND created_at &amp;gt; :since&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;ProjectionExpression&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;smc, sk, created_at, #status, #event&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;ExpressionAttributeNames&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#status&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;status&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;#event&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;event&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;ExpressionAttributeValues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;:pid&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;   &lt;span class="nx"&gt;partnerId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;:since&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lastExecutionTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;queryAllPages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ddb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 3. Hand the delta + previous state to the processor (next layer)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newStats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;processPartnerEventLogs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;prevStats&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// 4. Persist new state&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;s3PutJson&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;observability/stats.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;newStats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;lastExecutionTime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;nowIso&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;S3 here is the perfect fit: cheap, durable, atomic on a single PUT, and the file is small enough to read and write in milliseconds. This is the move that took &lt;code&gt;calculatePartnerNotificationStats&lt;/code&gt; from "times out at 15 minutes" to "finishes in under a second."&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3 — In-memory merging instead of recomputation
&lt;/h3&gt;

&lt;p&gt;Incremental fetching breaks if your processors still think they're seeing the whole dataset. They'd overwrite the running totals with just today's slice.&lt;/p&gt;

&lt;p&gt;So I refactored the analytic processors — &lt;code&gt;processPrimePartnerEventLogs&lt;/code&gt;, &lt;code&gt;processMaxPartnerEventLogs&lt;/code&gt;, &lt;code&gt;processScheduledActionsCalculation&lt;/code&gt; — to hydrate themselves from the previous snapshot and &lt;strong&gt;merge&lt;/strong&gt; the delta in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;processPartnerEventLogs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;deltaItems&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;prevEventAnalysis&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Start from previous state, not from zero&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;structuredClone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prevEventAnalysis&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nf"&gt;emptyAnalysis&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;deltaItems&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Increment counters&lt;/span&gt;
    &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;totalEvents&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;eventsByType&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;eventsByType&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="c1"&gt;// Maintain a set-like structure for uniqueness across runs&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;seenCustomers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;smc&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;seenCustomers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;smc&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;uniqueCustomers&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Bucket into monthly distribution&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;month&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// YYYY-MM&lt;/span&gt;
    &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;monthlyCounts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;month&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;monthlyCounts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;month&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The accumulators now live in S3 between runs. Lambda becomes a pure incremental worker — read state, read delta, merge, write state. Stateless code, stateful job.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers, after
&lt;/h2&gt;

&lt;p&gt;Once the changes were deployed, the durations dropped off a cliff — in the good direction this time.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;code&gt;calculateSubscriptionStats&lt;/code&gt; — from 900,000 ms down to ~88,000 ms, every run:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F96s9o5xm8bb8tpdn985k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F96s9o5xm8bb8tpdn985k.png" alt="CloudWatch durations after optimization — sub-second and ~1.5 min runs" width="346" height="310"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;code&gt;calculatePartnerNotificationStats&lt;/code&gt; — from 900,000 ms down to under a second:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xbtgqsaqjj2es84i32v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xbtgqsaqjj2es84i32v.png" alt="CloudWatch durations after optimization — sub-second and ~1 second runs" width="349" height="312"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;calculateSubscriptionStats&lt;/code&gt;&lt;/strong&gt;: 900,000 ms → ~88,000 ms. About a &lt;strong&gt;10× speedup&lt;/strong&gt;, and more importantly, no longer timing out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;calculatePartnerNotificationStats&lt;/code&gt;&lt;/strong&gt;: 900,000 ms → ~700 ms. A &lt;strong&gt;~1,300× speedup&lt;/strong&gt;. From timing out to finishing before you finish reading this sentence.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;calculateSubscriptionStats&lt;/code&gt; is still chunkier because it processes more partners with more per-partner logic. There's room to push it further (parallelizing per-partner work via &lt;code&gt;Promise.all&lt;/code&gt; is the next obvious lever), but it now lives well inside its budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons I'm taking with me
&lt;/h2&gt;

&lt;p&gt;A few things I'd tell past-me before he started:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. &lt;code&gt;ProjectionExpression&lt;/code&gt; should be the default, not the optimization.&lt;/strong&gt; If you don't need a field downstream, don't ask for it. The 1 MB page boundary makes this not a micro-optimization — it's a structural multiplier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. "Full scan on a schedule" is a time bomb.&lt;/strong&gt; It runs fine until the day it doesn't, and that day is rarely the day you have spare hours to fix it. If a job recomputes history every run, ask whether it needs to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Lambda memory ≠ application state.&lt;/strong&gt; Anything stateful has to live somewhere durable between invocations. S3 is a wildly underrated state store for small-to-medium snapshots — cheap, atomic on single PUTs, and trivial to evolve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Watch out for the double-fire.&lt;/strong&gt; Now that the job merges deltas, an EventBridge misfire (or a manual re-run before S3 is updated) could double-count. We're idempotent-ish today via the &lt;code&gt;lastExecutionTime&lt;/code&gt; cursor, but I want to harden that with an explicit lock or version check next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Reserved words will bite you the moment you reach for &lt;code&gt;ProjectionExpression&lt;/code&gt;.&lt;/strong&gt; Just alias everything via &lt;code&gt;ExpressionAttributeNames&lt;/code&gt; from the start — future-you will thank present-you.&lt;/p&gt;




&lt;p&gt;If you've hit the same wall with DynamoDB-backed stats jobs, I'd love to hear what you tried. Especially if you've found a clean pattern for the double-fire idempotency problem — that's the next puzzle on my desk.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>dynamodb</category>
      <category>serverless</category>
      <category>performance</category>
    </item>
  </channel>
</rss>
