<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: JOOJO DONTOH</title>
    <description>The latest articles on DEV Community by JOOJO DONTOH (@joojodontoh).</description>
    <link>https://dev.to/joojodontoh</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F238457%2Ff673858d-3131-485a-b2c2-f7366fbe00a1.PNG</url>
      <title>DEV Community: JOOJO DONTOH</title>
      <link>https://dev.to/joojodontoh</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/joojodontoh"/>
    <language>en</language>
    <item>
      <title>How My personal Agent Alfred talks to my vacuum</title>
      <dc:creator>JOOJO DONTOH</dc:creator>
      <pubDate>Mon, 23 Mar 2026 08:48:20 +0000</pubDate>
      <link>https://dev.to/joojodontoh/how-my-personal-agent-alfred-talks-to-my-vacuum-4iba</link>
      <guid>https://dev.to/joojodontoh/how-my-personal-agent-alfred-talks-to-my-vacuum-4iba</guid>
      <description>&lt;h2&gt;
  
  
  The Idea Came First
&lt;/h2&gt;

&lt;p&gt;Hi guys, I'm here again. After building Alfred &lt;a href="https://dev.to/joojodontoh/an-autonomous-agentic-ai-assistant-meet-alfred-and-this-is-how-i-built-him-4e7m"&gt;here&lt;/a&gt; I wanted him to be able to control Juliana, my Xiaomi X20+ robot vacuum. I did not know how that was going to work and I did not have a clear path forward, but the goal was clear tbh. Ask Alfred something and then he can act on the available tools to him. So I started where most network curiosity starts. I ran an nmap scan on my LAN to see what Juliana was actually exposing to the network. All TCP ports were closed. But UDP port 54321 was open and listening. Bingo!.&lt;/p&gt;

&lt;p&gt;If you have not read about Alfred yet, I wrote about how I built him &lt;a href="https://dev.to/joojodontoh/an-autonomous-agentic-ai-assistant-meet-alfred-and-this-is-how-i-built-him-4e7m"&gt;here&lt;/a&gt;. This feature is a direct result of what I called the Floodgate Effect in that article. The moment Alfred works in one area of your life, you immediately want to connect everything else. Juliana was next on the list. Its weird that I have names for things in my house but yh that's me.&lt;/p&gt;

&lt;h2&gt;
  
  
  Speaking Juliana's Language
&lt;/h2&gt;

&lt;p&gt;Having a port is not the same as having a conversation unfortunately. I needed to understand what protocol was running on that port and how to speak it. After some research I discovered that Xiaomi smart home devices communicate using MiIO, a proprietary protocol that runs over UDP port 54321. That was the first real breakthrough.&lt;/p&gt;

&lt;p&gt;MiIO follows a three step flow. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First you do a handshake. You send a 32 byte "hello" packet made entirely of 0xFF bytes and the device responds with its device ID and a timestamp called a stamp. &lt;/li&gt;
&lt;li&gt;Second you send a command. Every command is a 32 byte header combined with an AES-128-CBC encrypted JSON body. The encryption uses an MD5 derived key and IV generated from the device token. The header carries magic bytes, the packet length, the device ID, the stamp incremented by one, and an MD5 checksum. &lt;/li&gt;
&lt;li&gt;Third you receive a response in the exact same format and you decrypt it using the same token to get your JSON back.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The JSON itself follows a JSON-RPC style structure. A request looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"get_properties"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And a response comes back like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is a clean protocol once you understand it, but getting to that understanding took some work.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Roadblocks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Getting the Token
&lt;/h3&gt;

&lt;p&gt;Understanding the protocol was one thing. Actually talking to Juliana required a device token, This is the way Xiaomi handles auth in communication with the device, and getting that token turned out to be its own challenge. My first instinct was to extract it programmatically through the Xiaomi cloud, but that path was immediately blocked by captcha. I had to find another way.&lt;/p&gt;

&lt;p&gt;I ended up using the Xiaomi Cloud Tokens Extractor tool, which supports a QR code based login flow. That got me what I needed. The token, the device ID, and confirmation that Juliana was registered under the MiIO protocol. With those two values in hand, I could finally start sending real commands.&lt;/p&gt;

&lt;h3&gt;
  
  
  The MiOT Property System
&lt;/h3&gt;

&lt;p&gt;Modern Xiaomi devices do not use simple named commands. They use MiOT, the Mi IoT specification, where every property and action is addressed by a service ID called siid and either a property ID called piid or an action ID called aiid. Reading the battery level means sending a get_properties request with siid 3 and piid 1. Setting the suction to strong means sending a set_properties request with siid 4, piid 4, and a value of 2. Starting a cleaning session means triggering an action with siid 2 and aiid 1. Every single thing the vacuum does maps to one of these numeric combinations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Trial and Error All the Way Down
&lt;/h3&gt;

&lt;p&gt;There is no official documentation that maps these IDs for the X20+. I had to probe them manually by iterating through siid values from 1 to 30 and piid values from 1 to 30 and cross referencing what came back against what I could see in the Xiaomi app. It was tedious work. Some of the latest firmware implementations returned values that did not line up with what you would logically expect, which made matching them to real behaviour even harder.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cleaning History Lives in the Cloud
&lt;/h3&gt;

&lt;p&gt;One limitation I hit fairly early was around maps and cleaning history. Both are stored in the Xiaomi cloud rather than on the device itself. The only thing you can reliably read from Juliana directly is the last cleaning session. I turned this into an advantage by tracking every session result myself and building a local history. That way Alfred always has full context about when Juliana last cleaned, how long it took, and how much area was covered, without depending on the cloud for any of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Value
&lt;/h2&gt;

&lt;h3&gt;
  
  
  It Is Not About the Commands
&lt;/h3&gt;

&lt;p&gt;Sending commands to Juliana via an API is not the interesting part. The Xiaomi app already does all of that. You can start a clean, dock the vacuum, set suction levels, and check the battery from your phone in seconds. Replicating that through code alone would not be worth writing about.&lt;/p&gt;

&lt;p&gt;The interesting part is what happens when those commands become tools that an AI agent can reason about and invoke on your behalf.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tools Alfred Can Use
&lt;/h3&gt;

&lt;p&gt;Every capability I built around Juliana was wrapped into a tool that Alfred can call. There are 3 of them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;executeVacuumStatus&lt;/strong&gt; reads the current state of the device including battery level, cleaning mode, error codes, and consumable wear levels.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;executeVacuumCommand&lt;/strong&gt; sends operational commands like start, stop, pause, resume, dock, and locate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;executeVacuumHistory&lt;/strong&gt; pulls from the locally tracked session log so Alfred can reason about when and where Juliana has cleaned.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What This Actually Looks Like
&lt;/h3&gt;

&lt;p&gt;With those tools in place, my conversations with Alfred around the vacuum feel completely natural. I can ask things like:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Status and maintenance&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What is Juliana's battery level?"&lt;/li&gt;
&lt;li&gt;"Does Juliana need any maintenance?"&lt;/li&gt;
&lt;li&gt;"How are Juliana's consumables holding up?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Control&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Start cleaning the living room"&lt;/li&gt;
&lt;li&gt;"Clean the master bedroom and the office"&lt;/li&gt;
&lt;li&gt;"Send Juliana home"&lt;/li&gt;
&lt;li&gt;"Find Juliana" (this makes her announce her location out loud)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;History&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"When did Juliana last clean?"&lt;/li&gt;
&lt;li&gt;"How much has Juliana cleaned today?"&lt;/li&gt;
&lt;li&gt;"Show me Juliana's cleaning history"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Combined and conversational&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"How is everything at home?"&lt;/li&gt;
&lt;li&gt;"Start cleaning the guest bedroom and let me know when it is done"&lt;/li&gt;
&lt;li&gt;"Is Juliana's mop pad due for replacement?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Alfred does not just relay commands. He reads the context, decides which tools to call, and responds with a full picture. That is the difference between a smart home app and an agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftv7psx6r9pgbyn0csyaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftv7psx6r9pgbyn0csyaf.png" alt=" " width="800" height="959"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  From a Single Open Port to a Talking Vacuum
&lt;/h2&gt;

&lt;p&gt;What started as curiosity about controlling my robot vacuum and an open UDP port turned into a fully functional integration between Alfred and Juliana. Along the way there were real obstacles and each one had to be solved before the next step was even possible.&lt;/p&gt;

&lt;p&gt;Getting the device token could not be automated due to captcha blocks on the Xiaomi cloud, so I used the Xiaomi Cloud Tokens Extractor with QR code login instead. With no official documentation for the property IDs on the X20+, I probed siid 1 through 30 and piid 1 through 30 manually and matched results against the Xiaomi app. UDP has no built in request response correlation so I built a command serialization queue that keeps exactly one command in flight at a time. Hello responses were mixing with command responses so I added an isHelloResponse() check to skip those 32 byte packets. Timeouts were killing subsequent commands so I reset the stamp to zero on timeout to force a fresh handshake. Sending too many properties at once caused failures so I batched them into groups of ten. The locate command was not triggering a beep until I found the right combination at siid 7, aiid 1, and piid 1 set to 1. Consumable values were coming back wrong because the correct properties were at siid 9, 10, 11, and 18 rather than siid 4 where I originally looked.&lt;/p&gt;

&lt;p&gt;The one limitation I could not fully solve is map data. The X20+ stores its maps in the Xiaomi cloud and not on the device itself. Valetudo would solve this but that means a total OS flush of the robot which voids my warranty.&lt;/p&gt;

&lt;p&gt;Every solution in this list is a direct result of building things the right way rather than the fast way. The command queue, the constants file, the local history store, all of it exists because the goal was never just to control a vacuum. The goal was to give Alfred enough context and capability to reason about the home the same way he reasons about a calendar or an inbox. Juliana is now part of that picture.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>iot</category>
      <category>showdev</category>
    </item>
    <item>
      <title>An Autonomous, Agentic, AI Assistant, Meet Alfred and this is how I built him.</title>
      <dc:creator>JOOJO DONTOH</dc:creator>
      <pubDate>Mon, 16 Mar 2026 15:39:44 +0000</pubDate>
      <link>https://dev.to/joojodontoh/an-autonomous-agentic-ai-assistant-meet-alfred-and-this-is-how-i-built-him-4e7m</link>
      <guid>https://dev.to/joojodontoh/an-autonomous-agentic-ai-assistant-meet-alfred-and-this-is-how-i-built-him-4e7m</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;My people it's me again. This time I have built something fun but mostly useful. I gave building an autonomous agent a chance and it's turning out well. I know it's a cliché but his name is Alfred. The thing is AI agents are no longer a novelty. It all started out as simple chatbots chaining a few prompts together. Now it has evolved into something far more capable. These systems that can "reason" (I know it's just a lot of math and not actual reasoning), plan, use tools, and execute multi-step workflows with minimal human intervention. Agentic flows, where an AI iteratively breaks down a goal, takes actions, evaluates results, and course-corrects, are quickly becoming the backbone of serious productivity tooling.&lt;/p&gt;

&lt;p&gt;But the thing is not all models are created equal. The market is crowded. GPT-4o, Gemini, Mistral, Llama, DeepSeek all have their own strengths, trade-offs, and devoted user bases. Picking the right model for a given task has become something of an art form in itself. Especially because the benchmarks keep getting blurrier and blurrier.&lt;/p&gt;

&lt;p&gt;For me, that choice keeps coming back to Anthropic's Claude and specifically to Opus. As an engineer, I spend a significant portion of my day thinking in systems: abstractions, edge cases, failure modes and architecture trade-offs. Opus is the only model that consistently feels like it's doing the same while cleverly grabbing my immediate system context. Where other models can produce code that technically compiles but misses the intent entirely, Opus tends to understand the why behind what I'm building, not just the what. That distinction, subtle as it sounds, makes an enormous practical difference when you're deep in a complex codebase. Opus has downsides, especially because sometimes it takes shortcuts without adhering to the principles you intended.&lt;/p&gt;

&lt;p&gt;On the bright side what sealed it for me, though, was the CLI experience. Claude's command-line interface is genuinely pleasant to use: fast, composable, and unobtrusive in a way that fits naturally into my existing workflow. It doesn't feel like a detour. It feels like a tool that belongs in your terminal alongside the rest of my stack.&lt;/p&gt;

&lt;p&gt;In this article I'm going to talk about why I needed Alfred, the problem he solves for me, how I built him and how I improve him on this ever changing landscape where engineering meets productivity.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Monday Morning Problem Every Developer Knows
&lt;/h2&gt;

&lt;p&gt;It is Monday, 8:30 AM. Before I have written a single line of code, I already have a full-time job just figuring out where to start.&lt;/p&gt;

&lt;p&gt;Over the weekend, 47 new Gmail messages came in. Some are spam. Some are newsletters I never unsubscribed from. But buried somewhere in that pile is an escalation that needs urgent attention and a teammate asking for a code review. I do not know which email it is yet. I have to dig for it.&lt;/p&gt;

&lt;p&gt;That is just Gmail. I also have 12 Outlook emails from work: meeting updates, an HR policy change, and my manager asking about feature progress. Then there are 8 Teams messages spread across 3 different channels covering a production incident from Saturday, a design review thread, and standup notes. On top of that, 3 pull requests were opened against repos I review, and 2 calendar conflicts appeared for Tuesday that I need to sort out before the day gets going.&lt;/p&gt;

&lt;p&gt;None of these systems talk to each other. So my morning routine becomes a manual context-switching exercise. I open Gmail, scan subject lines, try to mentally rank urgency. Then I switch to Outlook and do the same. Then Teams. Then Azure DevOps. By the time I have a rough picture of what actually needs my attention, 45 to 60 minutes have passed. And that client escalation? Still buried under newsletters when I finally find it.&lt;/p&gt;

&lt;p&gt;The frustrating part is that most of that time is not real work. It is just triage. It is the overhead that comes before the actual job even starts. The other option is to close everything and wait for someone to walk to my table. Lmao I do this all the time.&lt;/p&gt;

&lt;p&gt;But well, this is the problem I built Alfred to solve.&lt;/p&gt;

&lt;h2&gt;
  
  
  What do I want from Alfred?
&lt;/h2&gt;

&lt;p&gt;Unification! Alfred is a personal AI agent built around a single idea: collapsing the chaos of my digital workday into one intelligent, unified system. It continuously polls Gmail at configurable intervals and receives Outlook emails and Microsoft Teams messages via Power Automate webhooks, storing everything locally in SQLite so that regardless of the source, nothing slips through the cracks.&lt;br&gt;
Every incoming email is then put through an AI classification pipeline that assigns it one of six categories (Urgent, Personal, Work, Newsletter, Transactional, or Spam), gives it a priority level from 1 to 5, generates a human-readable summary, extracts action items with optional due dates, and flags whether a follow-up is needed.&lt;br&gt;
From there, a configurable rules engine evaluates each classified email and proposes an appropriate action: archive it, delete it, forward it, draft a reply, or surface it for attention via a notify action with quick-action buttons.&lt;br&gt;
Destructive actions like deletions, sends, and PR approvals wait behind an explicit approval gate in the dashboard, while non-destructive ones like classification and drafting execute automatically.&lt;br&gt;
Every action is tracked through a full lifecycle from proposed to executed, with timestamps, rollback data, and execution results all stored in an append-only audit log.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F95yuzxjs049aes4in1j0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F95yuzxjs049aes4in1j0.png" alt="Email flow" width="800" height="865"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Beyond email, Alfred integrates deeply with the rest of my work toolchain. It connects to Google Calendar and Outlook Calendar for listing, creating, updating, and searching events, and handles Azure DevOps for querying and managing work items, approving pull requests, tracking pipeline runs, and browsing repositories. When a pull request is opened, a dedicated webhook handler automatically fetches the PR details, checks pipeline status, attempts to link related work items from branch name patterns, generates an LLM summary, and proposes approval or work item creation actions accordingly. Microsoft Teams is covered too, with channel message search and webhook-based ingestion keeping Alfred aware of conversations happening outside of email. Tying everything together is a conversational chat interface powered by an agentic loop that extracts intents from natural language, executes them across services, and returns structured, context-aware responses.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1vd5d31qjizviyka1ty3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1vd5d31qjizviyka1ty3.png" alt="devops" width="800" height="708"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Let's look at some of Alfred's core flows in detail
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Email Polling and Synchronization
&lt;/h3&gt;

&lt;p&gt;Alfred's background worker is built around an &lt;code&gt;AgentLoop&lt;/code&gt; flow. When the server starts, the agentLoop runs an initial poll immediately, then sets a repeating &lt;code&gt;setInterval&lt;/code&gt; timer at a configurable cadence. Each tick calls a listMessages request &lt;code&gt;emailPort.listMessages("in:inbox", 50)&lt;/code&gt; to fetch up to 50 messages from Gmail via the Gmail API. 50 is a reasonable number for my personal workflow&lt;/p&gt;

&lt;p&gt;To avoid reprocessing emails Alfred has already seen, the loop maintains an in-memory string set of message IDs. Every polled message is checked against this set, and only genuinely new messages pass through:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newMessages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;seenIds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;newMessages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;seenIds&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;New messages are immediately persisted to SQLite through &lt;code&gt;EmailRepo.upsert()&lt;/code&gt;. The upsert uses SQLite's &lt;code&gt;INSERT ... ON CONFLICT(id) DO UPDATE&lt;/code&gt; pattern, which means if Alfred encounters the same email ID twice (for example after a server restart), it updates the existing row rather than creating a duplicate. The repository stores the full email body, sender, recipients, labels, attachments as serialized JSON, and a &lt;code&gt;source&lt;/code&gt; field that distinguishes Gmail emails from Outlook emails. I cover the exact upsert schema in the Data Integrity section.&lt;/p&gt;

&lt;p&gt;Before sending any email to the classifier, the loop applies a set of skip rules. Social media notifications from Facebook, Instagram, Twitter, TikTok, Reddit, Discord, and similar platforms are matched by regex against the sender address. Emails carrying Gmail's &lt;code&gt;CATEGORY_PROMOTIONS&lt;/code&gt; or &lt;code&gt;CATEGORY_SOCIAL&lt;/code&gt; labels are also skipped. LinkedIn is explicitly exempted from this filter because its emails often contain actionable professional content. This pre-filtering avoids burning LLM API calls on emails that would reliably classify as low priority anyway.&lt;/p&gt;

&lt;p&gt;The loop also checks whether each email already has a classification in the database before sending it to the classifier. If a record exists, the email is skipped entirely. This means restarting the server does not trigger re-classification of previously processed emails. I wrote it this way to ensure minimum cost and idempotency.&lt;/p&gt;

&lt;p&gt;When the classifier encounters a fatal error such as an expired API key, exhausted credit balance, or a 429 rate limit response, the loop enters a paused state rather than crashing or retrying in a tight loop. It sets &lt;code&gt;classifierPaused = true&lt;/code&gt; and stops classifying. This is sort of a circuit breaker. On subsequent polls, it still persists new emails to the database so no mail is lost, but it attempts a single test classification to check whether the service has recovered. Once the test succeeds, classification resumes automatically. Error messages are also deduplicated so the same error is only logged once regardless of how many polls occur while paused.&lt;/p&gt;

&lt;p&gt;For Outlook, Alfred does not poll directly. Instead, an adapter calls a Power Automate flow that returns Outlook messages. A dedicated payload mapper normalizes Microsoft field names, timestamp formats, and nested structures into the same &lt;code&gt;EmailMessage&lt;/code&gt; domain object that Gmail produces. This means the rest of the pipeline, including classification, action rules, and chat, works identically regardless of whether an email originated from Gmail or Outlook. I wrote it this way so that I can later extend email providers by just adding a normalization mapper and then it should be plug and play.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fafkk78ad1zi7gyavlitq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fafkk78ad1zi7gyavlitq.png" alt=" " width="800" height="1175"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Action Proposal, Approval, and Execution
&lt;/h3&gt;

&lt;p&gt;Actions in Alfred follow an event-sourced lifecycle. Every state transition is recorded as an append-only entry in action log in an SQLite table. No rows are ever updated in place or deleted. The lifecycle flows through a fixed set of &lt;code&gt;ActionStatus&lt;/code&gt; states: &lt;code&gt;Proposed&lt;/code&gt; → &lt;code&gt;Approved&lt;/code&gt; → &lt;code&gt;Executed&lt;/code&gt;, or alternatively &lt;code&gt;Rejected&lt;/code&gt; or &lt;code&gt;RolledBack&lt;/code&gt;. This is purely for auditing so that I can track autonomous actions from the agent.&lt;/p&gt;

&lt;h4&gt;
  
  
  Proposal
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;ProposeAction&lt;/code&gt; use case starts with an idempotency check. It queries the action log for any existing entry with the same &lt;code&gt;resourceId&lt;/code&gt; and &lt;code&gt;type&lt;/code&gt;. If one already exists, it returns &lt;code&gt;null&lt;/code&gt; and stops. Otherwise, it appends a new entry with &lt;code&gt;status: Proposed&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;From there, the action's &lt;code&gt;RiskLevel&lt;/code&gt; determines what happens next. Low-risk actions like &lt;code&gt;Classify&lt;/code&gt;, &lt;code&gt;Draft&lt;/code&gt;, and &lt;code&gt;Notify&lt;/code&gt; carry &lt;code&gt;RiskLevel.Auto&lt;/code&gt; and execute immediately without my input. High-risk actions like &lt;code&gt;Archive&lt;/code&gt;, &lt;code&gt;Delete&lt;/code&gt;, &lt;code&gt;Send&lt;/code&gt;, and &lt;code&gt;Forward&lt;/code&gt; carry &lt;code&gt;RiskLevel.ApprovalRequired&lt;/code&gt; and sit in the proposed state until I act on them from the dashboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;risk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ACTION_RISK_LEVELS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;risk&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;RiskLevel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Auto&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;strategies&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;canExecute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;resultData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resourceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;actionLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;updateStatus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;actionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ActionStatus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Executed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the action produces result data such as a created draft ID or classification details, that data is stored alongside the log entry via &lt;code&gt;updateResultData()&lt;/code&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Approval and Execution
&lt;/h4&gt;

&lt;p&gt;When I click Approve in the dashboard, the &lt;code&gt;ApproveAction&lt;/code&gt; use case first updates the log entry's status to &lt;code&gt;Approved&lt;/code&gt; with a timestamp, then immediately attempts execution. It finds the correct &lt;code&gt;ActionExecutionStrategy&lt;/code&gt; by matching the action's &lt;code&gt;source&lt;/code&gt; field. Three strategies exist: &lt;code&gt;GmailActionStrategy&lt;/code&gt; handles archive, delete, send, and draft operations via the Gmail API; &lt;code&gt;OutlookActionStrategy&lt;/code&gt; handles equivalent operations through Power Automate; and &lt;code&gt;DevOpsActionStrategy&lt;/code&gt; handles work item creation and PR approval via the Azure DevOps REST API. This is based on the open-closed principle to allow for the extension and registration of multiple strategies.&lt;/p&gt;

&lt;p&gt;Each strategy declares which action types it supports through a &lt;code&gt;canExecute()&lt;/code&gt; method. If a strategy exists but cannot execute the specific action type, the action is marked as executed without performing any real mutation. If execution succeeds, the status moves to &lt;code&gt;Executed&lt;/code&gt;. If it fails, the error is returned to the caller but the action remains in &lt;code&gt;Approved&lt;/code&gt; state so the user can retry without losing the approval.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;Notify&lt;/code&gt; action type is intentionally a no-op at the execution level. It exists so the rules engine can propose surfacing an email to the user without triggering any mutation on the mailbox. The notification itself is handled by the push notification system, not the action executor.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0hlsxet1ww864ovtyvi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi0hlsxet1ww864ovtyvi.png" alt=" " width="474" height="586"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Chat Interface (Intent and Tool Use Modes)
&lt;/h3&gt;

&lt;p&gt;Alfred's chat is the primary way I interact with my workspace data through natural language. I designed it to support two distinct modes of operation, an intent extraction mode (the default) and &lt;code&gt;tool_use&lt;/code&gt; mode powered by Anthropic's internal tool choice algorithm. Both implement a &lt;code&gt;ChatStrategy&lt;/code&gt; interface defined in a &lt;code&gt;chat-strategy&lt;/code&gt; file, which standardises the input (message, history, context, system prompt, dependencies) and output (response text, result strings, action steps).&lt;/p&gt;

&lt;h4&gt;
  
  
  Intent Extraction Mode
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;IntentExtractionStrategy&lt;/code&gt; uses a two-LLM architecture. A fast, cheap model (Claude Haiku) handles intent extraction, while the main model (Claude Sonnet) composes the final user-facing response.&lt;/p&gt;

&lt;p&gt;The strategy runs an agentic loop of up to 5 rounds. In each round, it sends the user's message, the last 20 conversation history entries (each truncated to 2000 characters), and any results from prior rounds to the fast LLM. The system prompt includes detailed routing rules that map natural language patterns to intent types: "check my Outlook" routes to &lt;code&gt;search_emails&lt;/code&gt; with &lt;code&gt;source: "outlook"&lt;/code&gt;, "calendar" without a provider routes to &lt;code&gt;list_calendar_events&lt;/code&gt; without a source, and "work items" routes to &lt;code&gt;query_work_items&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The LLM returns a JSON object with an &lt;code&gt;intents&lt;/code&gt; array. Each intent specifies a type matching a registered tool name, along with type-specific fields like &lt;code&gt;query&lt;/code&gt;, &lt;code&gt;source&lt;/code&gt;, and &lt;code&gt;timeMin&lt;/code&gt;. Invalid tool names are filtered out against the &lt;code&gt;ToolRegistry&lt;/code&gt;. The strategy then executes each intent by calling the corresponding tool's &lt;code&gt;execute()&lt;/code&gt; function, which delegates to the appropriate &lt;code&gt;IntentExecutorDeps&lt;/code&gt; method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;round&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;round&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;MAX_ROUNDS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;round&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;intents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extractIntents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;extractionLlm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;recentHistory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;priorResults&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;validToolNames&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;intents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executeTools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;intents&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;allResults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`--- Round &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;round&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; ---\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multi-round execution is what makes complex queries possible. A request like "invite Sabrina to my 3pm meeting tomorrow" requires two rounds: round 1 searches for tomorrow's calendar events, and round 2 uses the event ID from that result to update the event with a new attendee. The LLM receives prior results in an &lt;code&gt;ACTIONS ALREADY EXECUTED THIS TURN&lt;/code&gt; block and can return &lt;code&gt;{"intents": [{"type": "none"}]}&lt;/code&gt; to signal that all needed data has been gathered and the loop should stop.&lt;/p&gt;

&lt;p&gt;After the loop completes, the &lt;code&gt;ChatService&lt;/code&gt; combines all gathered results with local context (email stats, pending actions, and follow-ups from the database) and sends everything to the main LLM for final response composition, with extended thinking enabled.&lt;/p&gt;

&lt;h4&gt;
  
  
  Tool Use Mode
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;ToolUseStrategy&lt;/code&gt; takes a fundamentally different approach. Rather than extracting intents and executing them as a separate step, it gives the LLM direct access to tools via &lt;code&gt;completeWithTools()&lt;/code&gt;. The LLM decides which tools to call, receives structured results, and continues the conversation until it produces a final text response.&lt;/p&gt;

&lt;p&gt;This mode requires the LLM adapter to support the Claude tool-use API. The strategy converts all registered tools into Claude tool definitions (name, description, input schema) and passes them alongside the message. The loop runs for up to 5 rounds, checking the &lt;code&gt;stopReason&lt;/code&gt; after each response. When the model returns &lt;code&gt;end_turn&lt;/code&gt;, the final text becomes the response. When it returns tool calls, the strategy executes each tool, packages the results as &lt;code&gt;ToolResultBlock&lt;/code&gt; objects with matching &lt;code&gt;tool_use_id&lt;/code&gt;, and sends them back as a user message for the next round:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;completeWithTools&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stopReason&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;end_turn&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;allResults&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;allActions&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the model exhausts all 5 rounds without reaching &lt;code&gt;end_turn&lt;/code&gt;, the strategy returns a graceful fallback message in Alfred's butler voice rather than surfacing a raw error to the user.&lt;/p&gt;

&lt;h4&gt;
  
  
  Tool Registry
&lt;/h4&gt;

&lt;p&gt;Both modes share the &lt;code&gt;ToolRegistry&lt;/code&gt; class in a &lt;code&gt;tool-registry&lt;/code&gt; file, which acts as a central catalogue of all available tools. Each tool is registered with a name, description, JSON input schema, an &lt;code&gt;execute&lt;/code&gt; function, and a &lt;code&gt;summarize&lt;/code&gt; function that produces human-readable action steps such as "Searched Gmail for 'invoice'". The registry can export its tools in two formats: &lt;code&gt;toToolDefinitions()&lt;/code&gt; for Claude's native tool-use API, and &lt;code&gt;toIntentPrompt()&lt;/code&gt; for building the intent extraction system prompt.&lt;/p&gt;

&lt;h4&gt;
  
  
  System Prompts
&lt;/h4&gt;

&lt;p&gt;All persona and mode-specific instructions are centralised in a &lt;code&gt;system-prompts&lt;/code&gt; file. The &lt;code&gt;BASE_PERSONA&lt;/code&gt; establishes Alfred's character as a refined English butler who addresses the user as "Master Jo" and has access to Google Workspace, Microsoft 365, and Azure DevOps. (Jeremy Irons is my favorite Alfred btw) Mode-specific instructions are appended on top: intent mode tells Alfred that actions have already been executed and results are in context so it should not pretend to be searching, while tool-use mode tells Alfred to actively call tools to fetch fresh data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Authentication and Security
&lt;/h3&gt;

&lt;p&gt;Alfred enforces security at multiple levels across both the dashboard and the agent server.&lt;/p&gt;

&lt;h4&gt;
  
  
  Dashboard Authentication
&lt;/h4&gt;

&lt;p&gt;The dashboard uses NextAuth.js v5 configured in &lt;code&gt;auth.ts&lt;/code&gt; with Google OAuth as the sole provider. Sessions use a JWT strategy with a 7-day maximum age. Access is restricted to a single authorised user through an email allowlist: the &lt;code&gt;signIn&lt;/code&gt; callback compares the Google profile's email against the &lt;code&gt;ALLOWED_EMAIL&lt;/code&gt; environment variable and rejects any mismatch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;callbacks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;signIn&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;allowedEmail&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The auth system uses a custom sign-in page at &lt;code&gt;/auth/login&lt;/code&gt; and redirects errors back to the same page for a clean user experience. Since Alfred is a personal, single-user tool, the allowlist approach is both simpler and more appropriate than a full role-based access system.&lt;/p&gt;

&lt;h4&gt;
  
  
  Server-Side Credentials
&lt;/h4&gt;

&lt;p&gt;The agent server stores sensitive credentials in the macOS Keychain. Both are fetched lazily on first use and cached in memory for the lifetime of the process. This means credentials never appear in environment variables, config files, or logs.&lt;/p&gt;

&lt;h4&gt;
  
  
  Architectural Isolation
&lt;/h4&gt;

&lt;p&gt;The dashboard is a pure client-rendered application. It contains no provider SDK imports, no direct database access, and no secret values. All data access flows through the agent server's HTTP API. I made sure that all credentials are ignored. This means that even if the dashboard source code were fully exposed, it would not leak any credentials or grant any access to the underlying data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resilience and Caching
&lt;/h3&gt;

&lt;p&gt;Alfred applies several resilience patterns across the system to handle network failures, API rate limits, and performance constraints.&lt;/p&gt;

&lt;h4&gt;
  
  
  In-Memory TTL Cache
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;TtlCache&lt;/code&gt; class in &lt;code&gt;cache.ts&lt;/code&gt; provides a simple time-to-live cache backed by a JavaScript &lt;code&gt;Map&lt;/code&gt;. Each entry stores its data alongside an &lt;code&gt;expiresAt&lt;/code&gt; timestamp. The &lt;code&gt;get()&lt;/code&gt; method checks expiration on every access and automatically evicts stale entries. The &lt;code&gt;getOrFetch()&lt;/code&gt; method combines cache lookup with lazy population:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nx"&gt;getOrFetch&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ttlMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fetcher&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;get&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetcher&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ttlMs&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is used for calendar events and DevOps data, both cached with a 3-minute TTL. During a multi-round chat conversation where Alfred might query the calendar several times, only the first call hits the API and subsequent calls return the cached result. The 3-minute window balances data freshness with meaningful API call reduction.&lt;/p&gt;

&lt;h4&gt;
  
  
  Agent Loop Resilience
&lt;/h4&gt;

&lt;p&gt;The classifier pause behavior is covered in the Email Polling section above. Beyond that, the polling loop is designed so that a failure in any single stage — classification, action proposal, or action execution, does not crash or block the rest of the loop. Each stage fails independently and logs the error without taking down the whole cycle.&lt;/p&gt;

&lt;h4&gt;
  
  
  Power Automate Retries
&lt;/h4&gt;

&lt;p&gt;The Power Automate client implements a 3-attempt retry with linear backoff (1s, 2s, 3s) for transient HTTP errors and timeouts. Non-retryable errors such as 4xx client errors (excluding 429) fail immediately without retrying. Each request uses &lt;code&gt;AbortController&lt;/code&gt; with a 30-second timeout to prevent indefinite hangs.&lt;/p&gt;

&lt;h4&gt;
  
  
  Push Notification Delivery
&lt;/h4&gt;

&lt;p&gt;The web push delivery mechanics including concurrent sends, &lt;code&gt;Promise.allSettled()&lt;/code&gt;, and automatic cleanup of expired subscriptions are covered in the Push Notifications section under Discoveries where the full implementation is explained in context.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment and Operations
&lt;/h3&gt;

&lt;p&gt;Alfred runs as three persistent background services on macOS, managed by launchd, Apple's native process manager. The deployment system is entirely script-based with no containers, no cloud infrastructure, and no external process managers. Everything runs on a single Mac.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Three Services
&lt;/h4&gt;

&lt;p&gt;The agent server is the core process. It runs the Node.js HTTP API, the background email polling loop, the action execution pipeline, and the finance statement processor. It owns all external API calls to Gmail, Google Calendar, Anthropic, Azure DevOps, and Power Automate, along with all OAuth credentials stored in macOS Keychain and the SQLite database.&lt;/p&gt;

&lt;p&gt;The dashboard is a Next.js application serving the client-rendered UI. In production it runs against a pre-built output directory and makes no direct calls to any external service. All data comes through the agent server's HTTP API. It receives a bearer token as an environment variable so it can authenticate its requests to the agent server.&lt;/p&gt;

&lt;p&gt;The Cloudflare tunnel creates an encrypted outbound connection from the Mac to Cloudflare's edge network, making the dashboard publicly accessible without opening any inbound ports or touching the router. It routes HTTPS traffic from the public domain down to the local Next.js server on a local port.&lt;/p&gt;

&lt;h4&gt;
  
  
  launchd Service Configuration
&lt;/h4&gt;

&lt;p&gt;Each service is defined as a &lt;code&gt;.plist&lt;/code&gt; property list file. The plist files use placeholder tokens that are replaced with real values at deploy time using &lt;code&gt;sed&lt;/code&gt;. The key properties are &lt;code&gt;RunAtLoad: true&lt;/code&gt; to start on login, &lt;code&gt;KeepAlive: true&lt;/code&gt; to auto-restart on crash, and &lt;code&gt;ThrottleInterval: 10&lt;/code&gt; to wait at least 10 seconds between restart attempts and prevent tight crash loops:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;key&amp;gt;&lt;/span&gt;ProgramArguments&lt;span class="nt"&gt;&amp;lt;/key&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;array&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;string&amp;gt;&lt;/span&gt;PROJECT_ROOT/node_modules/.bin/tsx&lt;span class="nt"&gt;&amp;lt;/string&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;string&amp;gt;&lt;/span&gt;apps/agent-server/src/index.ts&lt;span class="nt"&gt;&amp;lt;/string&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/array&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;key&amp;gt;&lt;/span&gt;KeepAlive&lt;span class="nt"&gt;&amp;lt;/key&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;true/&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;key&amp;gt;&lt;/span&gt;ThrottleInterval&lt;span class="nt"&gt;&amp;lt;/key&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;integer&amp;gt;&lt;/span&gt;10&lt;span class="nt"&gt;&amp;lt;/integer&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each service logs stdout and stderr to separate files that can be tailed in real time for debugging.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Deploy Script
&lt;/h4&gt;

&lt;p&gt;Deployment runs through a single script that orchestrates six steps in order: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;creating the log directory&lt;/li&gt;
&lt;li&gt;sourcing the &lt;code&gt;.env&lt;/code&gt; file to load environment variables &lt;/li&gt;
&lt;li&gt;running &lt;code&gt;npm install&lt;/code&gt; at the monorepo root to install all workspace dependencies&lt;/li&gt;
&lt;li&gt;running &lt;code&gt;npm run build&lt;/code&gt; to compile all TypeScript packages in dependency order (domain → application → infrastructure → contracts → agent server, then the Next.js dashboard) &lt;/li&gt;
&lt;li&gt;copying each plist template into &lt;code&gt;~/Library/LaunchAgents/&lt;/code&gt; with placeholders replaced by real paths,&lt;/li&gt;
&lt;li&gt;And finally loading all three services with &lt;code&gt;launchctl load&lt;/code&gt; to start them immediately. 
Before installing each plist it unloads any previously running version to prevent conflicts, resulting in a brief restart with minimal downtime:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;plist &lt;span class="k"&gt;in &lt;/span&gt;com.alfred.agent.plist com.alfred.dashboard.plist com.alfred.cloudflared.plist&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;launchctl unload &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LAUNCH_AGENTS_DIR&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;$plist&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; 2&amp;gt;/dev/null &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true
  sed&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"s|PROJECT_ROOT|&lt;/span&gt;&lt;span class="nv"&gt;$PROJECT_ROOT&lt;/span&gt;&lt;span class="s2"&gt;|g"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"s|USER_HOME|&lt;/span&gt;&lt;span class="nv"&gt;$USER_HOME&lt;/span&gt;&lt;span class="s2"&gt;|g"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"s|CLOUDFLARED_BIN|&lt;/span&gt;&lt;span class="nv"&gt;$CLOUDFLARED_BIN&lt;/span&gt;&lt;span class="s2"&gt;|g"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"s|NODE_BIN_PATH|&lt;/span&gt;&lt;span class="nv"&gt;$NODE_BIN_PATH&lt;/span&gt;&lt;span class="s2"&gt;|g"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"s|BEARER_TOKEN_VALUE|&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;BEARER_TOKEN&lt;/span&gt;&lt;span class="k"&gt;:-}&lt;/span&gt;&lt;span class="s2"&gt;|g"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DEPLOY_DIR&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;$plist&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LAUNCH_AGENTS_DIR&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;$plist&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script automatically detects the Node.js binary path across nvm, Homebrew, and system installs, and locates the &lt;code&gt;cloudflared&lt;/code&gt; binary for both Apple Silicon and Intel Homebrew paths. At the end it prints a macOS settings checklist reminding me to enable auto-login, prevent sleep, and configure startup after power failure, since the Mac effectively acts as a persistent home server.&lt;/p&gt;

&lt;h4&gt;
  
  
  First-Time Setup
&lt;/h4&gt;

&lt;p&gt;Initial installation is handled by a setup script that checks prerequisites (Homebrew and Node.js 20 or above), installs &lt;code&gt;cloudflared&lt;/code&gt;, creates the &lt;code&gt;.env&lt;/code&gt; file interactively, runs the Google OAuth flow by opening a browser for consent and storing the resulting refresh token in Keychain, authenticates with Cloudflare, creates the tunnel, configures DNS routes, and then kicks off the deploy script to bring everything up.&lt;/p&gt;

&lt;h4&gt;
  
  
  Operational Commands
&lt;/h4&gt;

&lt;p&gt;I have scripts for the full operational lifecycle. A status command shows whether each service is running, its PID, and the last 5 log lines. A teardown command unloads all services and removes the plist files from LaunchAgents while preserving logs. A universal launcher supports multiple modes: &lt;code&gt;all&lt;/code&gt; for full production, &lt;code&gt;dev&lt;/code&gt; for hot-reload development, &lt;code&gt;agent&lt;/code&gt; or &lt;code&gt;dashboard&lt;/code&gt; individually, &lt;code&gt;status&lt;/code&gt; for health checks, and &lt;code&gt;doctor&lt;/code&gt; for preflight validation.&lt;/p&gt;

&lt;h4&gt;
  
  
  Configuration
&lt;/h4&gt;

&lt;p&gt;All configuration flows through environment variables loaded from a &lt;code&gt;.env&lt;/code&gt; file at the project root. A &lt;code&gt;config.ts&lt;/code&gt; module reads these and returns a typed &lt;code&gt;AppConfig&lt;/code&gt; object. Three variables are required: &lt;code&gt;GOOGLE_CLIENT_ID&lt;/code&gt;, &lt;code&gt;GOOGLE_CLIENT_SECRET&lt;/code&gt;, and &lt;code&gt;ANTHROPIC_API_KEY&lt;/code&gt;. Everything else is optional and enables features progressively. Setting &lt;code&gt;AZURE_DEVOPS_ORG&lt;/code&gt; enables DevOps integration. Setting &lt;code&gt;PA_FLOW_MAIL_SEARCH&lt;/code&gt; enables Outlook. Setting &lt;code&gt;VAPID_PUBLIC_KEY&lt;/code&gt; enables push notifications and so on. If an optional config block is absent, the composition root simply skips registering those adapters and use cases, so the system degrades gracefully rather than failing to start.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cl18v9vs4hn72k132oh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0cl18v9vs4hn72k132oh.png" alt=" " width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Integrity
&lt;/h3&gt;

&lt;p&gt;Ensuring that Alfred handles data meticulously was very important to me. It does not make sense to build an assistant that is sloppy with the information it presents. Therefore I wrote Alfred in such a way that he prevents duplicate and inconsistent data through idempotency checks, upsert semantics, and schema separation at every data boundary.&lt;/p&gt;

&lt;h4&gt;
  
  
  Idempotent Action Proposals
&lt;/h4&gt;

&lt;p&gt;Before creating a new entry in the action log, the proposal system queries for any existing entry with the same &lt;code&gt;resourceId&lt;/code&gt; and &lt;code&gt;type&lt;/code&gt;. If a match is found, the proposal is silently skipped and returns &lt;code&gt;null&lt;/code&gt;. This means the polling loop can encounter the same email multiple times, such as after a server restart, without generating duplicate action proposals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;actionLog&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getByResourceIdAndType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resourceId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;action&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Email Upsert Semantics
&lt;/h4&gt;

&lt;p&gt;Whether an email arrives via polling, a webhook, or is encountered again after a restart, the upsert guarantees exactly one row per email ID. All fields including subject, body, labels, and read status are updated to their latest values, and an &lt;code&gt;updated_at&lt;/code&gt; timestamp records when the last refresh occurred:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;emails&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;from_address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;threadId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="nb"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'now'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;CONFLICT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DO&lt;/span&gt; &lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt;
  &lt;span class="n"&gt;thread_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;from_address&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;excluded&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_address&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;
  &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'now'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Conversation Ordering
&lt;/h4&gt;

&lt;p&gt;Chat messages are stored with a &lt;code&gt;created_at&lt;/code&gt; timestamp and always queried in chronological order using &lt;code&gt;ORDER BY created_at ASC&lt;/code&gt;. Messages are never reordered, edited, or deleted after creation. This ensures the conversation history Alfred sees when composing a response exactly matches what the user experienced.&lt;/p&gt;

&lt;h4&gt;
  
  
  Normalised Schema Design
&lt;/h4&gt;

&lt;p&gt;Classifications are stored in a separate &lt;code&gt;classifications&lt;/code&gt; table linked to emails by &lt;code&gt;email_id&lt;/code&gt;. This separation means re-classifying an email, whether due to a model update or a rule change, only touches the classification row without affecting the underlying email data. The email's original content, headers, labels, and metadata remain untouched. Follow-ups and action log entries follow the same pattern. Each table has a single source of truth for its own data, and no operation on one table can corrupt another.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pitfalls: From Intent Extraction to Tool Use
&lt;/h2&gt;

&lt;p&gt;I started Alfred's chat system with a pure intent extraction approach. The idea was straightforward: send my message to a fast LLM, ask it to return structured JSON with an intent type and parameters, then map that intent to an executor function. A message like "show me today's calendar" would produce &lt;code&gt;{"type": "list_calendar_events", "timeMin": "2026-03-16", "timeMax": "2026-03-16"}&lt;/code&gt;, and the system would call the calendar adapter directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;intents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extractIntents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;extractionLlm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;recentHistory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;priorResults&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;validToolNames&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;intent&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;intents&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;intentExecutor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I built this following the Open/Closed Principle. Each intent type was a self-contained &lt;code&gt;ToolEntry&lt;/code&gt; registered in a &lt;code&gt;ToolRegistry&lt;/code&gt;. Adding a new capability meant registering a new entry with a name, schema, executor function, and summariser. No existing code needed modification:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;toolRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;search_emails&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Search emails by query, category, or sender&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;summarize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`Searched emails: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In theory this was clean and extensible. In practice, the cost of adding intents started to compound. Every new capability required writing a system prompt fragment describing the intent format, adding routing rules so the LLM knew when to select it, writing the executor function, and testing that the LLM reliably produced the right JSON structure. At 5 intent types it was manageable. By the time I had 15 (email search, calendar list, calendar create, calendar update, calendar search, work item query, work item create, PR query, pipeline list, Teams messages, follow-ups, actions, repo list, commits, branch list), the intent extraction system prompt had ballooned. The LLM was juggling too many format rules and frequently produced malformed JSON or selected the wrong intent type.&lt;/p&gt;

&lt;p&gt;The extraction prompt had grown to include detailed routing rules, source-specific provider logic, multi-intent support, and follow-up round awareness:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;INTENT_RULES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
ROUTING RULES:
- "check my Outlook" → search_emails with source: "outlook"
- "search Gmail" → search_emails with source: "gmail"
- "Outlook calendar" → list_calendar_events with source: "outlook-calendar"
- "work items" / "tickets" → query_work_items
- "pull requests" / "PRs" → query_source_control with subtype: "pull_requests"
...
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every new intent meant updating these routing rules, testing edge cases, and hoping the model did not confuse the new intent with existing ones. The Open/Closed architecture was holding up at the code level — I was not modifying existing executors, but the prompt was a single growing artifact shared by every intent. Adding one intent risked degrading the reliability of all the others.&lt;/p&gt;

&lt;p&gt;This led me to Claude's native tool use API. Instead of asking the LLM to produce JSON matching my custom schema, I could give it proper tool definitions and let Claude's built-in tool calling handle the routing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolRegistry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toToolDefinitions&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;completeWithTools&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude's tool use was noticeably more reliable. It natively understands tool schemas, validates parameters against the input schema, and handles multi-tool calls cleanly. The model picks the right tool more consistently than my intent extraction prompt ever did, because tool selection is a first-class capability of the model rather than something I was trying to engineer through prompt instructions.&lt;/p&gt;

&lt;p&gt;But tool use burned through API credits quickly. Each round of the conversation becomes a full API call carrying the entire tool catalogue, conversation history, and system prompt. A simple question like "what meetings do I have today?" that previously cost one cheap Haiku call for intent extraction plus one Sonnet call for response composition now cost one or more full Sonnet calls with tool definitions attached, adding significant token overhead to every request.&lt;/p&gt;

&lt;p&gt;I balanced models to keep costs sustainable. Intent extraction uses Haiku because it only needs to produce structured JSON, not reason deeply. Final response composition uses Sonnet with extended thinking enabled because that is where quality matters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;strategyDeps&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;// Sonnet — reasoning and response&lt;/span&gt;
  &lt;span class="na"&gt;fastLlm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fastLlm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Haiku — intent extraction&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rather than committing to one approach, I gave the chat system the ability to switch between both modes. The &lt;code&gt;mode&lt;/code&gt; parameter on each request selects the active strategy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;strategy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tool_use&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;toolUseStrategy&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;intentStrategy&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;strategyResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;localContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;deps&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Intent mode is cheaper and faster for straightforward queries where the routing rules work well. Tool use mode is more reliable for complex, ambiguous, or multi-step requests where maintaining routing rules would be impractical. Both strategies implement the same &lt;code&gt;ChatStrategy&lt;/code&gt; interface and share the same &lt;code&gt;ToolRegistry&lt;/code&gt;, so all capabilities are available in both modes without any duplication.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4bdmq4xlnm4jmyclpcsh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4bdmq4xlnm4jmyclpcsh.png" alt=" " width="800" height="330"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  From Single Request-Response to Reasoning Loops
&lt;/h2&gt;

&lt;p&gt;Early on, the chat used a single request-response pattern. I ask a question, Alfred gathers context from the database, sends everything to the LLM in one shot, and returns the response. The quality was poor. With 15+ tools and a rich system prompt, the model would frequently miss details, give shallow answers, or fail to connect information across multiple data sources. A question like "what's my schedule like tomorrow and do I have any overdue follow-ups?" would produce a partial answer because the model was trying to handle everything in a single pass.&lt;/p&gt;

&lt;p&gt;My first instinct was to use a better model. I switched from Sonnet to Opus for the response composition step and the quality jumped immediately. Opus reasons more carefully, connects dots across context, and produces noticeably more nuanced responses. But it was expensive. Opus costs significantly more per token than Sonnet, and every chat message was a full context window call carrying email stats, action history, follow-up data, and conversation history.&lt;/p&gt;

&lt;p&gt;This led me to implement reasoning loops. Instead of asking the model to do everything in one pass, I let it work iteratively. In intent mode, the strategy runs up to 5 rounds. Each round extracts intents, executes them, and feeds the results back into the next round's context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;round&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;round&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;MAX_ROUNDS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;round&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;intents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extractIntents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;extractionLlm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;recentHistory&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;priorResults&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;validToolNames&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;intents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;executeTools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;intents&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;allResults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`--- Round &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;round&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; ---\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In tool use mode, the loop is similar but driven by Claude's stop reason. The model keeps calling tools until it decides it has enough information and returns a final text response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;round&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;round&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;MAX_ROUNDS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;round&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;completeWithTools&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;systemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4096&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stopReason&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;end_turn&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;allResults&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;allActions&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="c1"&gt;// ... execute tool calls, feed results back&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This multi-round approach means a request like "invite Sarah to my 3pm meeting tomorrow" works naturally.&lt;br&gt;
Round 1 searches tomorrow's calendar events.&lt;br&gt;
Round 2 uses the event ID from that result to update the event with a new attendee. The LLM sees prior results in an &lt;code&gt;ACTIONS ALREADY EXECUTED THIS TURN&lt;/code&gt; block and returns &lt;code&gt;{"intents": [{"type": "none"}]}&lt;/code&gt; when everything is resolved and the loop should stop.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"2026-03-16T07:11:03.210Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;chat:start"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"component"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"chat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"What does my outlook calendar look like ?"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"historyLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"tool_use"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"2026-03-16T07:11:07.854Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"llm:completeWithTools"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"component"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"llm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"claude-opus-4-6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"inputTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8168&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"outputTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;131&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"durationMs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;4644&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"stopReason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"tool_use"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"2026-03-16T07:11:07.854Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"chat:tool-use-round"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"component"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"chat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"round"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"stopReason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"tool_use"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"toolCallCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hasText"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"durationMs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;4644&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"2026-03-16T07:11:07.855Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"chat:tool-result"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"component"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"chat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"list_calendar_events"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"resultLength"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"resultPreview"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Calendar Events: No events found."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"2026-03-16T07:11:13.314Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"llm:completeWithTools"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"component"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"llm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"claude-opus-4-6"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"inputTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8318&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"outputTokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"durationMs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5458&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"stopReason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"end_turn"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"2026-03-16T07:11:13.315Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"chat:tool-use-round"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"component"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"chat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"round"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"stopReason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"end_turn"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"toolCallCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"hasText"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"durationMs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;5459&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"2026-03-16T07:11:13.315Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"info"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"msg"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"chat:complete"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"component"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"chat"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"totalDurationMs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;10106&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"tool_use"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nl"&gt;"actionCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reasoning happens where it counts. Mechanical work like deciding which tools to call uses the cheapest model that can do it reliably, and the expensive synthesis step only fires once at the end. A 3-round conversation costs 3 Haiku calls plus 1 Sonnet call rather than 3 Opus calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt Refinement
&lt;/h2&gt;

&lt;p&gt;Prompt refinement turned out to be significantly harder with intent extraction than with tool use. With intent extraction, I was responsible for the entire instruction surface: routing rules, format specifications, edge case handling, multi-intent support, source disambiguation, date inference, and conversational context awareness. Every ambiguous user message required a new rule or clarification in the prompt. The prompt became a fragile, growing document where changing one section could silently break another.&lt;/p&gt;

&lt;p&gt;With tool use, Claude does most of the heavy lifting. I define each tool's name, description, and input schema. Claude figures out when to call it, what parameters to pass, and how to combine results across multiple tools. The refinement effort shifted from "teach the model my custom intent format" to "write clear tool descriptions and let the model's built-in tool selection do its job." This was a dramatically smaller surface area to maintain.&lt;/p&gt;

&lt;p&gt;The persona prompt is where I spent the most deliberate effort, and I structured it to follow the Open/Closed Principle. The &lt;code&gt;BASE_PERSONA&lt;/code&gt; defines Alfred's character, his access to workspace systems, and the critical behavioural rules that apply regardless of which mode is active:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;BASE_PERSONA&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`You are Alfred, a distinguished personal workspace assistant. 
You are an old English gentleman — impeccably dressed in a three-piece suit at all times, 
refined in manner, and utterly devoted to your employer. You always address the user as 
"Master Jo". Your speech carries the quiet authority and warmth of a seasoned butler...

CRITICAL RULES:
- ALWAYS address the user as "Master Jo"
- ONLY use the data provided to you. Do not make up emails, events, or results.
- When calendar events were CREATED, confirm this to the user with details and calendar links.
...`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mode-specific instructions are appended on top without touching the base. Intent mode tells Alfred that actions have already been executed and results are already in context, so he should not pretend to be searching. Tool use mode tells Alfred to actively call tools to fetch fresh data. The &lt;code&gt;buildSystemPrompt()&lt;/code&gt; function composes these cleanly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;buildSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;intent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tool_use&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;modeInstructions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;mode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tool_use&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;TOOL_USE_MODE_INSTRUCTIONS&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;INTENT_MODE_INSTRUCTIONS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;BASE_PERSONA&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;modeInstructions&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This separation means I can refine Alfred's personality, add new behavioural rules, or adjust mode-specific instructions entirely independently. Adding a new mode in the future means writing a new instruction block and adding a case to &lt;code&gt;buildSystemPrompt()&lt;/code&gt;, without touching the persona or any existing mode instructions.&lt;/p&gt;

&lt;p&gt;The persona itself evolved through iteration. Early versions were too stiff and formal. Later versions overcorrected and became too casual. The current version balances warmth with efficiency, giving Alfred permission to be dry-witted and occasionally opinionated while staying concise and never fabricating data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discoveries
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Floodgate Effect
&lt;/h3&gt;

&lt;p&gt;Once I had the first working version of Alfred deployed, something unexpected happened: my mind would not stop generating ideas. The initial version could poll Gmail, classify emails, propose actions, and let me approve them from a dashboard. It was functional, but using it every day exposed gaps and opportunities I had not anticipated during planning. Every morning I would open the dashboard, see how Alfred handled my overnight inbox, and think "what if he could also do this?" The backlog grew faster than I could build.&lt;/p&gt;

&lt;p&gt;This is something I did not expect about building a personal tool. When you are the only user, the feedback loop is immediate. There is no product manager filtering requests, no sprint planning, no prioritisation meetings. You feel the friction directly, and the fix is always within reach. That immediacy is both a gift and a trap. I had to learn to be disciplined about scope, because every "quick addition" carries a maintenance cost that compounds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Financial Statement Processing
&lt;/h3&gt;

&lt;p&gt;The first major expansion came from a personal pain point. I bank with multiple banks in Malaysia, and both send monthly e-statements as password-protected PDF attachments to my Gmail. Every month I would download the PDFs, unlock them, manually scan through transactions, and try to categorise spending in a spreadsheet. I actually stopped this a long time ago. It was tedious, error-prone, and I rarely kept up with it. I realised Alfred already had the infrastructure to solve this: he polls Gmail, he can download attachments, and he has an LLM for classification.&lt;/p&gt;

&lt;p&gt;I built a six-stage pipeline that runs automatically during each polling cycle. Alfred searches Gmail for emails from the configured bank sender addresses, filters for emails with PDF attachments, and checks each against the &lt;code&gt;bank_statements&lt;/code&gt; table to skip already-processed ones. The idempotency check matters because the polling loop runs every 60 seconds and the same bank emails will appear in search results repeatedly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;findUnprocessedIds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bank&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;BankConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;EmailSearchFilters&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;emailRead&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;searchFilteredIds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;unprocessed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;statementRepo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isStatementProcessed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;unprocessed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;unprocessed&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For each unprocessed email, Alfred downloads the PDF attachment and decrypts it using the bank-specific password from environment config. This is where I hit the first real bug. The &lt;code&gt;pdf-parse&lt;/code&gt; library accepts a &lt;code&gt;password&lt;/code&gt; option, but its internal implementation completely ignores it. It passes the raw buffer directly to PDF.js's &lt;code&gt;getDocument()&lt;/code&gt; instead of wrapping it in &lt;code&gt;{ data, password }&lt;/code&gt;. Every statement was failing with a cryptic "No password given" error. The fix was a workaround that tricks &lt;code&gt;pdf-parse&lt;/code&gt; by passing a PDF.js parameter object in place of the buffer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pdfInput&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Uint8Array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pdfBuffer&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nx"&gt;password&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;pdf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pdfInput&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After decryption, the raw text goes to a bank-specific parser. Each bank formats its statements differently, so I built a &lt;code&gt;StatementParserRegistry&lt;/code&gt; that routes to the correct parser based on the &lt;code&gt;BankProvider&lt;/code&gt; enum.&lt;/p&gt;

&lt;p&gt;The parser also strips page noise including headers, footers, and the Chinese and Malay translations that some banks include on every page, and collects multi-line transaction details like merchant names and reference numbers.&lt;/p&gt;

&lt;p&gt;Once parsed, transactions go through a hybrid classification stage. The &lt;code&gt;HybridTransactionClassifier&lt;/code&gt; first attempts rule-based categorisation using keyword matching (merchant names like "GRAB" map to transport, "MCDONALD'S" maps to food), and falls back to Claude Haiku for ambiguous transactions. This hybrid approach keeps costs low because most transactions have recognisable merchant names that do not need LLM inference.&lt;/p&gt;

&lt;p&gt;The pipeline also handles historical backfill. On first run, it does not just process recent statements. It walks backward through the inbox month by month, processing older statements until it reaches a configurable cutoff, defaulting to 12 months. A &lt;code&gt;backfill_state&lt;/code&gt; table tracks the cursor position per bank so the backfill can resume across server restarts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;processBackfill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bank&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;BankConfig&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isComplete&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;backfillStateRepo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isComplete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bank&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bankProvider&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;isComplete&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;backfillStateRepo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getCursor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bank&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bankProvider&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cutoff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="nx"&gt;cutoff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setMonth&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cutoff&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getMonth&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;deps&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;backfillMonths&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// ... fetch historical emails before cursor, process, advance cursor&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All of this produces a normalised &lt;code&gt;finance_transactions&lt;/code&gt; table where every transaction from every bank shares the same schema: date, description, amount, type (credit or debit), balance, category, merchant name, and statement period. Two banks, different formats, one unified table.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxmook3xe0gle0qehxs1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmxmook3xe0gle0qehxs1.png" alt=" " width="800" height="970"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3n2h9f2986ahp7wnpvwn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3n2h9f2986ahp7wnpvwn.png" alt=" " width="800" height="444"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Making Financial Data Conversational
&lt;/h3&gt;

&lt;p&gt;Having the data in SQLite was useful on its own, the dashboard has a Finance page with tables and charts, but the real power came from wiring it into Alfred's chat. I registered finance-specific tools in the &lt;code&gt;ToolRegistry&lt;/code&gt; so that both chat modes can query transaction data naturally.&lt;/p&gt;

&lt;p&gt;The chat can now answer questions like "how much did I spend on food last month?", "what were my biggest transactions in February?", or "show me all Grab transactions this year." Alfred queries the &lt;code&gt;finance_transactions&lt;/code&gt; table, aggregates the results, and presents them in his butler persona.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo927mb5hlfnhs4xsogf5.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo927mb5hlfnhs4xsogf5.JPG" alt=" " width="800" height="484"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhybx150xqnstxvwgkt75.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhybx150xqnstxvwgkt75.png" alt=" " width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What I did not anticipate is that this naturally enabled budgeting. Once Alfred could tell me "you spent RM 2,400 on dining in February, Master Jo," I started asking follow-up questions like "is that more than January?" and "set a reminder if I go over RM 2,000 next month." The transaction data combined with the follow-up system and push notifications created a lightweight budget monitoring capability that I never explicitly designed. It emerged from the intersection of features that already existed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Progressive Web App
&lt;/h3&gt;

&lt;p&gt;The dashboard started as a standard Next.js web app accessed through a browser tab. It worked, but it felt disposable. I would forget to check it, or close the tab and lose my place. Making Alfred a Progressive Web App changed that relationship. With a PWA manifest, a service worker, and the right meta tags, Alfred became an app I could install on my phone and in my Mac's dock. It has its own window, its own icon, and it persists across reboots.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1lzgswgpxmm7wrlipxb9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1lzgswgpxmm7wrlipxb9.png" alt=" " width="468" height="786"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The practical difference is small since it is still the same Next.js app behind the scenes. But the psychological difference is significant. An app in the dock feels like a tool. A browser tab feels temporary. I open Alfred every morning now the way I open Slack or my email client. It has presence.&lt;/p&gt;

&lt;h3&gt;
  
  
  Push Notifications with Service Workers
&lt;/h3&gt;

&lt;p&gt;The feature I am most proud of is the push notification system. Before I built it, Alfred was purely pull-based. I had to open the dashboard to see if anything needed attention. Proposed actions would sit in the approval queue for hours because I simply forgot to check. Follow-ups would go overdue silently.&lt;/p&gt;

&lt;p&gt;Push notifications made Alfred proactive. When the classification pipeline proposes a new action for approval, Alfred sends a push notification to my browser. When a high-priority email arrives, he notifies me immediately. When a DevOps PR webhook fires, I get a notification with a deep link straight to the approvals page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgt30tn8f85f2njr0p4up.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgt30tn8f85f2njr0p4up.png" alt=" " width="800" height="558"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The implementation uses the Web Push protocol with VAPID keys for authentication. The &lt;code&gt;SendNotification&lt;/code&gt; use case checks user preferences before sending. I can toggle notifications per event type from the Settings page, and for high-priority emails I can set a minimum priority threshold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;pref&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;preferenceRepo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pref&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;pref&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;enabled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;NotificationEventType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;HighPriorityEmail&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;emailPriority&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;PRIORITY_THRESHOLDS&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;minPriority&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;PRIORITY_THRESHOLDS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;high&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;emailPriority&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;WebPushAdapter&lt;/code&gt; sends to all registered browser subscriptions concurrently using &lt;code&gt;Promise.allSettled()&lt;/code&gt;, so a failed delivery to one device does not block others. It automatically cleans up expired subscriptions when the push service returns HTTP 410 or 404, which happens when a user clears browser data or uninstalls the PWA.&lt;/p&gt;

&lt;p&gt;On the client side, a service worker listens for push events and displays native OS notifications with the app icon, a body preview, and a deep link URL. The &lt;code&gt;notificationclick&lt;/code&gt; handler is smart about reusing existing windows: if the dashboard is already open, it focuses that tab instead of opening a new one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nb"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addEventListener&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;notificationclick&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;notification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;notification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;waitUntil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;matchAll&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;window&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;includeUncontrolled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;focus&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;focus&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clients&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;openWindow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;usePushNotifications&lt;/code&gt; React hook manages the entire subscription lifecycle from the UI: checking browser support, requesting notification permission, fetching the VAPID public key from the server, subscribing via the Push API, and sending the subscription details to the server for storage. Unsubscribing reverses the process, removing the subscription from both the browser and the server database.&lt;/p&gt;

&lt;p&gt;What made this feel like a real discovery is how it changed my workflow. Before push notifications, Alfred was a dashboard I checked. After push notifications, Alfred is an assistant who taps me on the shoulder. The difference between pull and push is the difference between a tool and a colleague. When my phone buzzes with "Action: archive. Proposed archive for 'Your NIKE order has shipped', Master Jo," I smile every time. It feels like Alfred is actually there, running the household.&lt;/p&gt;

&lt;h2&gt;
  
  
  Further Implementations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Retrieval-Augmented Generation for Personal Knowledge
&lt;/h3&gt;

&lt;p&gt;The next frontier I want to explore is giving Alfred deep knowledge of everything I have written. I publish articles, write tweets, draft technical documentation, and take notes across multiple platforms. Right now Alfred knows my emails, my calendar, and my finances, but he does not know my voice. If someone asks me to write a thread about Clean Architecture, I start from scratch every time. If I need to reference a point I made in an article six months ago, I have to search manually.&lt;/p&gt;

&lt;p&gt;I plan to build a RAG pipeline that indexes my published content, tweets, notes, and drafts into a vector store. A good friend of mine (Edem Kumodzi) already does this, read his article &lt;a href="https://edemkumodzi.com/posts/building-a-chatbot-from-15-years-of-my-own-writing/" rel="noopener noreferrer"&gt;here&lt;/a&gt;. When I ask Alfred to help me write something, he would retrieve relevant passages from my own prior work and use them as context for generation. The goal is not for Alfred to write as me, but to write with full awareness of what I have already said, how I say it, and what positions I have taken. He should be able to say: "Master Jo, you wrote about this exact topic in your March article. Shall I pull the relevant points as a starting foundation?"&lt;/p&gt;

&lt;p&gt;This is a step toward something larger. I want Alfred to have a total embodiment of who I am — not a shallow personality clone, but a deep contextual understanding of my thinking, my writing style, my professional opinions, and my personal preferences. He should know that I care about Clean Architecture and SOLID principles, that I have strong opinions about over-engineering, and that I prefer concise explanations with concrete examples. At the same time, he should remain his own person: a distinct entity with his butler persona who assists me rather than pretending to be me. The line between "knows me well" and "impersonates me" is one I want to walk carefully.&lt;/p&gt;

&lt;h3&gt;
  
  
  Expanding Service Integrations
&lt;/h3&gt;

&lt;p&gt;Alfred currently connects to Google Workspace, Microsoft 365, and Azure DevOps. I want to push further into the services that shape my daily life.&lt;/p&gt;

&lt;p&gt;WhatsApp is where most of my personal communication happens. The ability to search messages, get summaries of group conversations I have missed, or draft replies through Alfred would close a major gap. The challenge is that WhatsApp's API is designed for businesses rather than personal use, so I will likely need to explore the WhatsApp Business API with creative workarounds.&lt;/p&gt;

&lt;p&gt;LinkedIn is the integration I am most excited about. I got the idea from a podcast about the discipline of maintaining professional relationships, and it resonated because I am genuinely terrible at it. I connect with people at conferences, have great conversations, and then never follow up. Alfred could do something far more personal than LinkedIn's built-in "keep in touch" feature: track my connections, identify people I have not interacted with in a while, cross-reference them with my calendar and email history, and nudge me with context. Not just "you haven't talked to Sarah in 3 months" but "you haven't talked to Sarah in 3 months. You last discussed the migration project at her company. She posted about a promotion last week. Shall I draft a congratulations message, Master Jo?" That level of contextual nudging is what turns a contact list into actual relationships.&lt;/p&gt;

&lt;p&gt;Spotify might seem like an odd fit for a workspace assistant, but I spend a significant amount of my commute and focus time listening to engineering podcasts. I want Alfred to suggest relevant episodes based on what I am currently working on. If I am deep in a week of building a notification system, Alfred could recommend episodes about push notification architecture, service workers, or PWA best practices. The Spotify API is well-documented with solid search and recommendation endpoints, so this should be one of the more straightforward integrations to build.&lt;/p&gt;

&lt;h3&gt;
  
  
  Smart Home Integration
&lt;/h3&gt;

&lt;p&gt;I have been thinking about extending Alfred beyond the digital workspace and into my physical space. Apple Shortcuts provides a bridge between software and home devices. If I can trigger Shortcuts programmatically, Alfred could control lights, check device status, set scenes, and interact with HomeKit accessories through natural language.&lt;/p&gt;

&lt;p&gt;The most entertaining use case involves Juliana, my robot vacuum. She runs on a schedule, but I never actually know if she has finished cleaning or got stuck under the couch again. If I can query her status through a Shortcut or her manufacturer's API, Alfred could include in my morning briefing: "Juliana completed her cleaning cycle at 3 AM, Master Jo. All rooms covered, no incidents to report." Or more usefully: "Juliana appears to be stuck in the bedroom. She has not moved in 40 minutes. Shall I send a rescue party?"&lt;/p&gt;

&lt;p&gt;The broader vision is for Alfred to be aware of my home the same way he is aware of my inbox. When I ask "is everything in order?", he should be able to answer with a status report covering emails, calendar, pending approvals, financial alerts, and whether the house has been cleaned. A proper butler would never limit his awareness to just the mail.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Second Persona
&lt;/h3&gt;

&lt;p&gt;My girlfriend has watched me use Alfred. This sparked an idea I had not considered: cloning Alfred's architecture for a second persona. The entire system is built on Clean Architecture with dependency injection, which means the persona, the rules, and the connected accounts are all configurable. The core infrastructure covering polling, classification, the action lifecycle, push notifications, and chat strategies is entirely provider-agnostic and user-agnostic.&lt;/p&gt;

&lt;p&gt;In theory, creating a second instance means standing up another agent server pointed at different OAuth credentials, a different SQLite database, a different set of action rules, and a different system prompt. The persona would not be Alfred. She would get her own character, her own name, and her own way of speaking. But underneath, the same &lt;code&gt;ChatService&lt;/code&gt;, the same &lt;code&gt;ToolRegistry&lt;/code&gt;, the same &lt;code&gt;AgentLoop&lt;/code&gt;, and the same strategy pattern would power everything.&lt;/p&gt;

&lt;p&gt;The part that interests me most is how the persona shapes the experience. Alfred's butler character is not just flavour text. It affects how he delivers bad news ("I regret to inform you, Master Jo, that your credit card statement shows a rather generous dining budget this month"), how he prioritises information, and how he handles ambiguity. A different persona for a different person would need to match their communication style and preferences entirely. This is where the &lt;code&gt;buildSystemPrompt()&lt;/code&gt; architecture pays off. The base capabilities and mode-specific instructions stay constant, while the persona layer is a separate, swappable block. Building a second agent is less about rewriting code and more about crafting a new character who happens to run on the same engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building Alfred started as a weekend experiment: a polling loop that checked Gmail and labelled anything that looked important. What it became, over months of iteration, is something I did not fully anticipate: a personal operating system that sits between me and the noise of digital life.&lt;/p&gt;

&lt;p&gt;The biggest lesson was not technical. It was architectural. Clean Architecture is not just an academic exercise you draw on whiteboards. It is the reason I was able to bolt on Microsoft Teams notifications, bank statement processing, and a full chat interface without rewriting the core. When your domain layer knows nothing about Gmail, adding Outlook is just another adapter. When your use cases speak in ports, swapping Claude Haiku for Sonnet is a one-line change in the composition root. The upfront cost of drawing those boundaries paid for itself ten times over.&lt;/p&gt;

&lt;p&gt;That said, the path was not smooth. The jump from intent extraction to native tool use humbled me. Prompt engineering is not engineering in the traditional sense. There is no compiler to catch your mistakes, no type system to lean on. You ship a prompt, watch it hallucinate a tool name that does not exist, and go back to the drawing board. The multi-round reasoning loop took more iterations than any other feature, not because the code was complex, but because coaxing an LLM into reliable, structured behaviour across multiple turns is genuinely hard. Every fix revealed a new edge case. Every edge case demanded a new constraint in the system prompt. I have a much deeper respect now for anyone building production agentic systems.&lt;/p&gt;

&lt;p&gt;The discovery that surprised me most was how naturally financial data fit into the system. I built Alfred to manage emails. The fact that bank statements arrive as email attachments meant the entire PDF extraction and transaction classification pipeline was, architecturally, just another use case plugged into the same ports. The backfill system, the hybrid classifier, the per-bank parser registry: none of it required changes to the core domain. That is Clean Architecture doing exactly what it promises.&lt;/p&gt;

&lt;p&gt;Running everything on a Mac on my desk with a Cloudflare Tunnel was a deliberate choice. There is no monthly cloud bill. There is no cold start. My data never leaves my network unless I am the one requesting it through an encrypted tunnel. For a personal assistant that reads your email, knows your calendar, and processes your bank statements, that is not a nice-to-have. It is a requirement.&lt;/p&gt;

&lt;p&gt;Alfred is far from finished. RAG-powered memory, WhatsApp integration, smart home control: the roadmap is long. But the foundation is solid. Every new capability I have added has reinforced the same pattern: define a port, write the use case, build the adapter, wire it in the composition root. The system grows without becoming fragile because each piece knows only what it needs to know.&lt;/p&gt;

&lt;p&gt;If there is one thing I would tell someone starting a similar project, it is this: invest in the boundaries early. Not the features, not the UI, not the clever LLM tricks. The boundaries. Get the dependency direction right. Make your domain layer boring. Let your infrastructure layer be the only place that knows about the outside world. Everything else follows from that discipline. Alfred taught me that the most powerful personal software is not the one with the most features. It is the one you can keep evolving without fear of breaking what already works.&lt;/p&gt;

&lt;p&gt;See you in the next one 😁&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>productivity</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Increasing Technical Onboarding Velocity for Your Engineering Team</title>
      <dc:creator>JOOJO DONTOH</dc:creator>
      <pubDate>Fri, 05 Dec 2025 06:16:26 +0000</pubDate>
      <link>https://dev.to/joojodontoh/increasing-technical-onboarding-velocity-for-your-engineering-team-1lmo</link>
      <guid>https://dev.to/joojodontoh/increasing-technical-onboarding-velocity-for-your-engineering-team-1lmo</guid>
      <description>&lt;h2&gt;
  
  
  TLDR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Engineers change teams frequently, and slow onboarding wastes everyone's time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Minimise the time between &lt;code&gt;git clone&lt;/code&gt; and first meaningful pull request.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key practices:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Setup scripts&lt;/strong&gt; that interactively guide new engineers through environment configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code formatting tools&lt;/strong&gt; (Prettier, ESLint) committed to the repo so standards are automatic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Brief READMEs&lt;/strong&gt; focused on "how to run this" rather than business context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Descriptive file naming&lt;/strong&gt; (&lt;code&gt;transaction.service.ts&lt;/code&gt;, &lt;code&gt;pos.client.ts&lt;/code&gt;) so the codebase is navigable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Comprehensive tests&lt;/strong&gt; that serve as living documentation and give new engineers confidence to make changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hooks&lt;/strong&gt; to catch issues before they reach code review&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Protected branches&lt;/strong&gt; and required approvals to prevent accidental mistakes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Common libraries and pipeline templates&lt;/strong&gt; for multi-service teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; New engineers get productive in days, reviews are shorter, and service owners spend less time hand-holding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-off:&lt;/strong&gt; Requires upfront investment, but pays dividends with every new team member.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Hello my people, its me again 😄. Today I want to talk about Engineering onboarding. So what is that? 🤔 In very simple terms, it is the journey between an engineer's first introduction to a codebase and the moment they can confidently open a meaningful, safe pull request. It's that critical window where confusion transforms into contribution.&lt;/p&gt;

&lt;p&gt;The global job landscape for software engineers is highly dynamic and volatile. According to &lt;a href="https://www.zippia.com/software-engineer-jobs/demographics/" rel="noopener noreferrer"&gt;Zippia's analysis of over 100,000 software developer profiles&lt;/a&gt;, 69% of software engineers have a tenure of less than 2 years at their current job. At large tech companies, this number skews even shorter, with &lt;a href="https://www.centumsearch.com/employee-tenure-and-retention-for-tech-leaders-in-2024" rel="noopener noreferrer"&gt;average tenures ranging from 1 to 3 years&lt;/a&gt;. The tech industry also carries one of the highest turnover rates across all industries, estimated at 13.2% according to LinkedIn workforce data. This reality means that more engineers than ever will find themselves in onboarding situations throughout their careers.&lt;/p&gt;

&lt;p&gt;Onboarding isn't limited to new hires either. It happens when engineers switch teams internally, when a service gets transferred from one squad to another, or when engineering resources are borrowed temporarily for critical projects. Each of these scenarios demands the same thing: getting someone productive in an unfamiliar codebase as quickly as possible.&lt;/p&gt;

&lt;p&gt;As an engineering lead, I've seen firsthand how a rough onboarding experience can slow down delivery, frustrate talented people, and introduce risk into production systems. This article aims to share practical strategies for making onboarding smooth and fast, while minimising the fear of new team members accidentally breaking things.&lt;/p&gt;

&lt;p&gt;A quick note: onboarding involves non-technical aspects as well, such as team rituals, communication norms, and stakeholder relationships. Those matter deeply, but this article will focus specifically on the technical side of getting engineers productive and confident in your codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Aspects of Knowledge a New Engineer Should Be Aware Of
&lt;/h2&gt;

&lt;p&gt;Before an engineer can contribute meaningfully to a codebase, there are several knowledge areas they need to get up to speed on. Some of these are explicit and documented, others are tribal knowledge passed down through code reviews and hallway conversations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Domain Knowledge
&lt;/h3&gt;

&lt;p&gt;Understanding the business domain of the service you're working on isn't always a prerequisite for making changes. You can fix a bug in a fuel pricing service without fully understanding the intricacies of how pump prices are calculated. However, when it comes to adding features or making architectural decisions, domain knowledge becomes crucial for quality contributions and fewer review round trips.&lt;/p&gt;

&lt;p&gt;Consider this example: an engineer is tasked with adding a "price override" feature to a convenience retail POS system. Without understanding the domain, they might implement it as a simple field that replaces the scanned price. But someone with domain knowledge would know that price overrides in retail need to account for manager approval workflows, audit trails for loss prevention, tax recalculations, loyalty point adjustments, and integration with the back-office reporting system. They'd also know that certain items like fuel, tobacco, and alcohol often have regulatory restrictions on price modifications. The engineer lacking this context might go through three or four review cycles before landing on the right approach, while someone with domain understanding gets it right the first time.&lt;/p&gt;

&lt;p&gt;This knowledge transfer is typically handled through a buddy system where an assigned team member walks the new joiner through the current architecture at a high level. One important note here: keeping architecture diagrams up to date can feel like thankless work with no short-term rewards, but it pays dividends every time someone new joins the team.&lt;/p&gt;

&lt;h3&gt;
  
  
  Team Rituals
&lt;/h3&gt;

&lt;p&gt;Stand-ups, sprint ceremonies, retrospectives, RCAs (Root Cause Analyses) and other team rituals are also part of onboarding. These won't be covered in this article since we're focusing on technical aspects, but they're worth mentioning for completeness.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tech-Stack Familiarity
&lt;/h3&gt;

&lt;p&gt;Tech-stack familiarity is usually filtered for during hiring or internal transfers. If you're hiring a backend engineer for a Java-based integration team, you're likely looking for candidates with Java or similar JVM experience. Knowledge of the stack naturally makes onboarding smoother.&lt;/p&gt;

&lt;p&gt;That said, smooth onboarding practices become even more critical when tech-stack familiarity is low. If you've hired a strong engineer from a Python background into your Apache Camel and Spring Boot codebase, your onboarding process needs to carry more of the load.&lt;/p&gt;

&lt;h3&gt;
  
  
  Coding Standards
&lt;/h3&gt;

&lt;p&gt;Every team develops conventions around how code should be written and organised. These include file naming standards, variable naming conventions, indentation preferences, and file structure patterns.&lt;/p&gt;

&lt;p&gt;Some teams prefer their folder structure to mirror API endpoints. For example, in a retail integration service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/
├── api/
│   ├── v1/
│   │   ├── transactions/
│   │   │   ├── transactions.controller.ts
│   │   │   ├── transactions.service.ts
│   │   │   └── transactions.routes.ts
│   │   ├── inventory/
│   │   │   ├── inventory.controller.ts
│   │   │   ├── inventory.service.ts
│   │   │   └── inventory.routes.ts
│   │   └── fuel-prices/
│   │       ├── fuel-prices.controller.ts
│   │       ├── fuel-prices.service.ts
│   │       └── fuel-prices.routes.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this structure, if a new engineer needs to work on the &lt;code&gt;GET /api/v1/inventory&lt;/code&gt; endpoint that returns current tank dip readings, they immediately know to look in &lt;code&gt;src/api/v1/inventory/&lt;/code&gt;. The cognitive load of navigating the codebase drops significantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Authentication, Environment Variables, and Secrets
&lt;/h3&gt;

&lt;p&gt;This is often where onboarding gets frustrating. Different companies handle secrets and environment configuration in vastly different ways, and the friction here can make or break someone's first few days.&lt;/p&gt;

&lt;p&gt;More mature organisations orchestrate access at an enterprise level using tools like ServiceNow, HashiCorp Vault, or AWS Secrets Manager, where permissions are tied to identity and granted automatically based on team membership. The less manual this process is, the better.&lt;/p&gt;

&lt;p&gt;For teams without enterprise-grade tooling, here are some common approaches:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Symmetric encryption within the codebase:&lt;/strong&gt; Some teams encrypt their &lt;code&gt;.env&lt;/code&gt; files using a tool like &lt;a href="https://github.com/AGWA/git-crypt" rel="noopener noreferrer"&gt;git-crypt&lt;/a&gt; or &lt;a href="https://github.com/getsops/sops" rel="noopener noreferrer"&gt;sops&lt;/a&gt; and store them directly in the repository. New engineers just need the decryption password to access everything. This approach is convenient but carries risk since the password becomes a single point of compromise. A sensible mitigation is to only encrypt secrets for lower environments like &lt;code&gt;dev&lt;/code&gt; and &lt;code&gt;staging&lt;/code&gt;, keeping production secrets in a more secure system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Encrypted files outside the codebase:&lt;/strong&gt; Secrets are stored in a shared location (like an S3 bucket or internal file store) with company-wide access controls. Engineers with the right permissions can download what they need.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Manual sharing:&lt;/strong&gt; The most primitive approach. Someone on the team carefully shares env files via secure channels. It works, but it doesn't scale and is prone to human error.&lt;/p&gt;

&lt;p&gt;Whichever approach your team uses, the goal should be minimising the time between "I've cloned the repo" and "I have everything I need to run this locally."&lt;/p&gt;

&lt;h2&gt;
  
  
  Things Needed to Ensure Fast and Clean Onboarding
&lt;/h2&gt;

&lt;p&gt;This section covers the practical tooling and processes that make onboarding frictionless. I've split it into two parts: getting engineers set up quickly (code pickup), and enabling them to make changes safely (change integration).&lt;/p&gt;




&lt;h3&gt;
  
  
  Part 1: Smooth Code Pickup
&lt;/h3&gt;

&lt;p&gt;The goal here is simple: minimise the time between &lt;code&gt;git clone&lt;/code&gt; and "I have a working local environment."&lt;/p&gt;

&lt;h4&gt;
  
  
  Code Formatting Standardisation
&lt;/h4&gt;

&lt;p&gt;Inconsistent code formatting creates unnecessary noise in pull requests and wastes mental energy. A new engineer shouldn't have to guess whether to use tabs or spaces, or whether to add trailing commas.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://prettier.io/" rel="noopener noreferrer"&gt;Prettier&lt;/a&gt; is one of the most popular tools for solving this. Commit a &lt;code&gt;.prettierrc&lt;/code&gt; file to your repository and every engineer's code gets formatted the same way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"semi"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"singleQuote"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tabWidth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"trailingComma"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"es5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"printWidth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Don't forget a &lt;code&gt;.prettierignore&lt;/code&gt; file to prevent formatting generated files or dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;node_modules/
dist/
coverage/
*.generated.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When a new engineer opens the codebase, these config files immediately communicate the team's standards without anyone needing to explain them.&lt;/p&gt;

&lt;h4&gt;
  
  
  Handy Scripts
&lt;/h4&gt;

&lt;p&gt;Instead of a README that says "run &lt;code&gt;npm install&lt;/code&gt;, then set up your &lt;code&gt;.env&lt;/code&gt; file, then run &lt;code&gt;docker-compose up&lt;/code&gt;, then..." wrap all of this in scripts. Scripts are executable documentation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup Script&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A setup script handles dependency installation and environment preparation. Here's an example for a retail POS integration service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# setup.sh - Interactive setup script for POS Integration Service&lt;/span&gt;

&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🚀 Setting up POS Integration Service..."&lt;/span&gt;

&lt;span class="c"&gt;# Check Node version&lt;/span&gt;
&lt;span class="nv"&gt;REQUIRED_NODE_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"18"&lt;/span&gt;
&lt;span class="nv"&gt;CURRENT_NODE_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;node &lt;span class="nt"&gt;-v&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;'v'&lt;/span&gt; &lt;span class="nt"&gt;-f2&lt;/span&gt; | &lt;span class="nb"&gt;cut&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt;&lt;span class="s1"&gt;'.'&lt;/span&gt; &lt;span class="nt"&gt;-f1&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$CURRENT_NODE_VERSION&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-lt&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$REQUIRED_NODE_VERSION&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ Node.js version &lt;/span&gt;&lt;span class="nv"&gt;$REQUIRED_NODE_VERSION&lt;/span&gt;&lt;span class="s2"&gt; or higher is required."&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"   Current version: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;node &lt;span class="nt"&gt;-v&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"   Install via: https://nodejs.org/ or use nvm"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi
&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Node.js version: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;node &lt;span class="nt"&gt;-v&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Check for .env file&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; .env &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"📋 No .env file found."&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"   Would you like to:"&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"   1) Copy from .env.example (recommended for new setup)"&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"   2) Decrypt from .env.encrypted (requires team password)"&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"   3) Skip (I'll set it up manually)"&lt;/span&gt;
    &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"   Enter choice [1-3]: "&lt;/span&gt; env_choice

    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nv"&gt;$env_choice&lt;/span&gt; &lt;span class="k"&gt;in
        &lt;/span&gt;1&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
            &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Created .env from .env.example"&lt;/span&gt;
            &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"   ⚠️  Remember to update placeholder values"&lt;/span&gt;
            &lt;span class="p"&gt;;;&lt;/span&gt;
        2&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"   Enter decryption password: "&lt;/span&gt; password
            &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
            openssl aes-256-cbc &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-in&lt;/span&gt; .env.encrypted &lt;span class="nt"&gt;-out&lt;/span&gt; .env &lt;span class="nt"&gt;-pass&lt;/span&gt; pass:&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$password&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
            &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Decrypted .env file"&lt;/span&gt;
            &lt;span class="p"&gt;;;&lt;/span&gt;
        3&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"⏭️  Skipping .env setup"&lt;/span&gt;
            &lt;span class="p"&gt;;;&lt;/span&gt;
    &lt;span class="k"&gt;esac&lt;/span&gt;
&lt;span class="k"&gt;else
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ .env file exists"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Validate critical env vars&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; .env &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;source&lt;/span&gt; .env
    &lt;span class="nv"&gt;MISSING_VARS&lt;/span&gt;&lt;span class="o"&gt;=()&lt;/span&gt;

    &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$POS_API_BASE_URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; MISSING_VARS+&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s2"&gt;"POS_API_BASE_URL"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$AZURE_SERVICE_BUS_CONNECTION&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; MISSING_VARS+&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s2"&gt;"AZURE_SERVICE_BUS_CONNECTION"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$S3_BUCKET_NAME&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; MISSING_VARS+&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s2"&gt;"S3_BUCKET_NAME"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="k"&gt;${#&lt;/span&gt;&lt;span class="nv"&gt;MISSING_VARS&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-gt&lt;/span&gt; 0 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"⚠️  Missing required environment variables:"&lt;/span&gt;
        &lt;span class="k"&gt;for &lt;/span&gt;var &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;MISSING_VARS&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
            &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"   - &lt;/span&gt;&lt;span class="nv"&gt;$var&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;done
    else
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ All required environment variables are set"&lt;/span&gt;
    &lt;span class="k"&gt;fi
fi&lt;/span&gt;

&lt;span class="c"&gt;# Install dependencies&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"📦 Installing dependencies..."&lt;/span&gt;
npm ci
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Dependencies installed"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🎉 Setup complete! Run './start.sh' to start the service."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice how the script is interactive. It doesn't just fail silently when something is missing. It guides the engineer through decisions and helps them understand what the system needs. This is far more educational than a wall of README text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Startup Script&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A startup script gets the application running locally. It should aim to be system-agnostic by leveraging containers where possible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# start.sh - Start the POS Integration Service locally&lt;/span&gt;

&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🚀 Starting POS Integration Service..."&lt;/span&gt;

&lt;span class="c"&gt;# Check if setup has been run&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s2"&gt;"node_modules"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ Dependencies not installed. Run './setup.sh' first."&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Start local dependencies (mocked external services)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"📦 Starting local dependencies..."&lt;/span&gt;
docker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt; localstack mockpos

&lt;span class="c"&gt;# Wait for dependencies to be healthy&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"⏳ Waiting for dependencies..."&lt;/span&gt;
&lt;span class="nb"&gt;sleep &lt;/span&gt;5

&lt;span class="c"&gt;# Start the service&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🏃 Starting service in development mode..."&lt;/span&gt;
npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Some teams combine setup and startup into a single script. Others keep them separate so you don't re-run setup every time you want to start the service. Either approach works as long as it's consistent.&lt;/p&gt;

&lt;p&gt;example output from another script:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa9qwkxs803v76yrw59qu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa9qwkxs803v76yrw59qu.png" alt="example output" width="800" height="962"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Functionality Test Script (Optional)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For extra confidence, you can provide a script that runs a quick smoke test against the locally running service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# smoke-test.sh - Verify the service is running correctly&lt;/span&gt;

&lt;span class="nv"&gt;BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:3000"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🧪 Running smoke tests..."&lt;/span&gt;

&lt;span class="c"&gt;# Health check&lt;/span&gt;
&lt;span class="nv"&gt;HEALTH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /dev/null &lt;span class="nt"&gt;-w&lt;/span&gt; &lt;span class="s2"&gt;"%{http_code}"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BASE_URL&lt;/span&gt;&lt;span class="s2"&gt;/health"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HEALTH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-eq&lt;/span&gt; 200 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Health endpoint responding"&lt;/span&gt;
&lt;span class="k"&gt;else
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ Health check failed (HTTP &lt;/span&gt;&lt;span class="nv"&gt;$HEALTH&lt;/span&gt;&lt;span class="s2"&gt;)"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Test transaction endpoint with mock data&lt;/span&gt;
&lt;span class="nv"&gt;RESPONSE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$BASE_URL&lt;/span&gt;&lt;span class="s2"&gt;/api/v1/transactions"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"storeId": "TEST001", "items": [{"sku": "MOCK123", "quantity": 1}]}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RESPONSE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="s2"&gt;"transactionId"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Transaction endpoint working"&lt;/span&gt;
&lt;span class="k"&gt;else
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"❌ Transaction endpoint failed"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🎉 All smoke tests passed!"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Brief but Useful README
&lt;/h4&gt;

&lt;p&gt;People don't read long READMEs. Keep yours focused on operability rather than explaining the business problem the service solves. Save that for Confluence or your internal docs.&lt;/p&gt;

&lt;p&gt;A good README structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# POS Integration Service&lt;/span&gt;

Handles transaction processing between store POS systems and central data lake.

&lt;span class="gu"&gt;## Quick Start&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
bash&lt;br&gt;
./setup.sh    # First time only&lt;br&gt;
./start.sh    # Start the service&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
Service runs at `http://localhost:3000`

## Useful Commands

- `npm run test` - Run unit tests
- `npm run test:integration` - Run integration tests
- `./smoke-test.sh` - Verify local setup works

## Documentation

- [Architecture Diagram](https://confluence.internal/pos-integration/architecture)
- [API Specification](https://confluence.internal/pos-integration/api-spec)
- [Runbook](https://confluence.internal/pos-integration/runbook)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. A new engineer can get running in under a minute and knows where to find deeper documentation when needed.&lt;/p&gt;




&lt;h3&gt;
  
  
  Part 2: Smooth Change Integration
&lt;/h3&gt;

&lt;p&gt;Once an engineer is set up locally, the next challenge is enabling them to make changes confidently without breaking things.&lt;/p&gt;

&lt;h4&gt;
  
  
  Descriptive File and Function Naming
&lt;/h4&gt;

&lt;p&gt;Clear naming conventions reduce the learning curve dramatically. When files are named descriptively, new engineers can navigate the codebase intuitively.&lt;/p&gt;

&lt;p&gt;Consider a retail integration service with these common file patterns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/
├── clients/
│   ├── pos.client.ts           # Handles POS API communication
│   ├── serviceBus.client.ts    # Azure Service Bus operations
│   └── s3.client.ts            # S3 storage operations
├── services/
│   ├── transaction.service.ts  # Transaction business logic
│   └── inventory.service.ts    # Inventory business logic
├── utils/
│   ├── date.utils.ts           # Date formatting helpers
│   └── validation.utils.ts     # Input validation helpers
└── builders/
    └── transaction.builder.ts  # Builds transaction payloads
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a new engineer needs to modify how transactions are sent to S3, they know to look in &lt;code&gt;s3.client.ts&lt;/code&gt;. If they need to change business logic, they check the services folder. The naming convention acts as a map.&lt;/p&gt;

&lt;p&gt;Treat these as principles rather than rigid rules. The goal is descriptive, predictable naming that helps people find what they need.&lt;/p&gt;

&lt;h4&gt;
  
  
  Unit Tests
&lt;/h4&gt;

&lt;p&gt;All those clients, utils, helpers, and services should have accompanying tests. When a new team member modifies &lt;code&gt;transaction.service.ts&lt;/code&gt;, they can run the tests to verify they haven't broken existing functionality:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// transaction.service.test.ts&lt;/span&gt;
&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TransactionService&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;processTransaction&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;should calculate correct total for multiple items&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;FUEL001&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;45.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;unitPrice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.89&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;SNACK001&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;unitPrice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;3.50&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;];&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;transactionService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;processTransaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;total&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;92.995&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;should apply fuel discount for loyalty members&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;sku&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;FUEL001&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;quantity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;unitPrice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.89&lt;/span&gt; &lt;span class="p"&gt;}];&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;loyaltyId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;LOYALTY123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;transactionService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;processTransaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;loyaltyId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fuelDiscount&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.10&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;total&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;71.60&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tests serve as living documentation. A new engineer can read the test file to understand what a function is supposed to do without digging through implementation details.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pre-commit and Pre-push Hooks
&lt;/h3&gt;

&lt;p&gt;Git hooks catch issues before they reach the remote repository. Tools like &lt;a href="https://typicode.github.io/husky/" rel="noopener noreferrer"&gt;Husky&lt;/a&gt; make this easy to set up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;package.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"husky"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pre-commit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npm run lint &amp;amp;&amp;amp; npm run format:check"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pre-push"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npm run test"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A typical setup runs linting and format checks on commit (fast feedback), and runs tests before push (thorough validation).&lt;/p&gt;

&lt;p&gt;One word of caution: keep these hooks fast. If your pre-commit takes 30 seconds, engineers will start bypassing it with &lt;code&gt;--no-verify&lt;/code&gt;. Aim for under 5 seconds on pre-commit.&lt;/p&gt;

&lt;h4&gt;
  
  
  Common Libraries (For Multi-Service Teams)
&lt;/h4&gt;

&lt;p&gt;When your team owns multiple services, you'll notice patterns emerging. The same S3 client code, the same transaction builder, the same logging setup. Instead of copy-pasting across repositories, extract these into a shared library.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// @my-org/retail-common&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;S3Client&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@my-org/retail-common&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;TransactionBuilder&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@my-org/retail-common&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;S3Client&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;S3_BUCKET&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;transaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TransactionBuilder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;STORE001&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withItems&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few guidelines for common libraries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use semantic versioning with alpha/beta releases so teams can test changes before they go stable&lt;/li&gt;
&lt;li&gt;Write rigorous tests. A bug in a common library affects every consuming service&lt;/li&gt;
&lt;li&gt;Document breaking changes clearly in your changelog&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Pipeline Repositories (For Multi-Service Teams)
&lt;/h4&gt;

&lt;p&gt;GitHub Actions, GitLab CI, and Azure Pipelines all support reusable workflow definitions. Instead of duplicating deployment logic across repositories, centralise it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# In your service repository&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;my-org/pipeline-templates/.github/workflows/deploy-to-aws.yml@v2&lt;/span&gt;
    &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;staging&lt;/span&gt;
      &lt;span class="na"&gt;service-name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pos-integration&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When deployment processes change, you update one repository instead of twenty.&lt;/p&gt;

&lt;h4&gt;
  
  
  Branching Strategies and Policies
&lt;/h4&gt;

&lt;p&gt;Protect your main branch from direct commits. This is non-negotiable for team safety. Configure your repository to require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pull request reviews (at least one approver)&lt;/li&gt;
&lt;li&gt;Passing CI checks before merge&lt;/li&gt;
&lt;li&gt;No force pushes to main&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This protects new engineers from accidentally pushing directly to production. The guardrails are there before they even make their first commit.&lt;/p&gt;

&lt;h4&gt;
  
  
  Environment Strategies and Policies
&lt;/h4&gt;

&lt;p&gt;Development environments should be open for experimentation. Engineers need a place to break things safely.&lt;/p&gt;

&lt;p&gt;Staging and UAT environments should mirror production as closely as possible, with stricter deployment controls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# azure-pipelines.yml&lt;/span&gt;
&lt;span class="na"&gt;stages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;stage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DeployDev&lt;/span&gt;
    &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eq(variables['Build.SourceBranch'], 'refs/heads/develop')&lt;/span&gt;
    &lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;deployment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DeployToDev&lt;/span&gt;
        &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;development&lt;/span&gt;  &lt;span class="c1"&gt;# Auto-deploys, no approval needed&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;stage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DeployStaging&lt;/span&gt;
    &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eq(variables['Build.SourceBranch'], 'refs/heads/main')&lt;/span&gt;
    &lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;deployment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DeployToStaging&lt;/span&gt;
        &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;staging&lt;/span&gt;  &lt;span class="c1"&gt;# Requires manual approval&lt;/span&gt;
        &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;runOnce&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./deploy.sh staging&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures that code can flow freely to dev, but staging deployments require explicit approval.&lt;/p&gt;

&lt;h4&gt;
  
  
  Integration Tests
&lt;/h4&gt;

&lt;p&gt;Unit tests verify individual components. Integration tests verify that services work together correctly.&lt;/p&gt;

&lt;p&gt;For a retail integration service, an integration test might verify that a transaction flows correctly from the POS mock through your service and into S3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Transaction Flow Integration&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;should process POS transaction and store in S3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Arrange&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mockTransaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createMockPOSTransaction&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Act&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;posClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendTransaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mockTransaction&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;waitForProcessing&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Assert&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;storedTransaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;s3Client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getTransaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mockTransaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;storedTransaction&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBeDefined&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;storedTransaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PROCESSED&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Integration tests give new engineers confidence that their changes haven't broken contracts with other systems.&lt;/p&gt;

&lt;h4&gt;
  
  
  Culture of Maintenance
&lt;/h4&gt;

&lt;p&gt;Finally, build a culture that keeps quality high as new engineers join:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code coverage thresholds:&lt;/strong&gt; Configure your pipeline to fail if coverage drops below a threshold. This ensures new code comes with tests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# jest.config.js&lt;/span&gt;
&lt;span class="na"&gt;coverageThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;
  &lt;span class="nv"&gt;global&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;
    &lt;span class="nv"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;80&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;functions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;80&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;lines&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;80&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt;
    &lt;span class="nv"&gt;statements&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;80&lt;/span&gt;
  &lt;span class="pi"&gt;}&lt;/span&gt;
&lt;span class="pi"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Integration tests as part of feature work:&lt;/strong&gt; A feature isn't done until its integration tests are written. Make this explicit in your definition of done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strict but fair reviews:&lt;/strong&gt; Code reviews should enforce standards consistently, but reviewers should also be helpful and educational. A review that just says "wrong" teaches nothing. A review that explains &lt;em&gt;why&lt;/em&gt; something should change helps engineers grow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advantages and How This Helps My Teams
&lt;/h2&gt;

&lt;p&gt;All the upfront investment in scripts, tests, and automation pays dividends quickly. Here's what I've seen in practice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Saves Time for Onboarded Members
&lt;/h3&gt;

&lt;p&gt;New engineers on my team don't spend their first day wrestling with environment setup or hunting down secrets. They clone the repo, run &lt;code&gt;./setup.sh&lt;/code&gt;, and follow the interactive prompts. Within an hour, they have a working local environment and can start exploring the codebase.&lt;/p&gt;

&lt;p&gt;Compare this to the alternative: a new joiner pinging five different people on Slack asking where to find the database credentials, discovering their Node version is wrong after hitting a cryptic error, and spending half a day just getting the service to start. That frustration compounds and sets a negative tone for the entire onboarding experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Saves Time for Service Owners
&lt;/h3&gt;

&lt;p&gt;Before I invested in these practices, onboarding a new engineer meant hours of hand-holding. "Where's the config for X?" "How does Y work?" "Why is Z failing?"&lt;/p&gt;

&lt;p&gt;Now, when someone asks me a question, I can often point them to a specific file or test. "Check &lt;code&gt;transaction.service.test.ts&lt;/code&gt;, the third test case covers exactly that scenario." The tests become documentation. The scripts become guides. I'm not the bottleneck anymore.&lt;/p&gt;

&lt;p&gt;This is especially valuable when you're leading a team and your time is split across architecture decisions, stakeholder meetings, and code reviews. Every hour saved on repetitive explanations is an hour you can spend on higher-leverage work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reduces Time in Change Management and Knowledge Transfer
&lt;/h3&gt;

&lt;p&gt;When an engineer leaves the team or moves to another project, the knowledge transfer burden is significantly lighter. The important patterns are encoded in common libraries. The deployment process is captured in pipeline templates. The business logic is documented through tests.&lt;/p&gt;

&lt;p&gt;New engineers inheriting a service don't need a week of shadowing sessions. The codebase is largely self-explanatory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reviews Are Short and Sweet
&lt;/h3&gt;

&lt;p&gt;This one might be my favourite. When automated checks handle formatting, linting, test coverage, and integration verification, code reviews can focus on what actually matters: logic, architecture, and edge cases.&lt;/p&gt;

&lt;p&gt;I no longer leave comments like "missing semicolon" or "incorrect indentation." Prettier handles that. I don't have to verify that tests exist. The coverage threshold enforces it. The review becomes a conversation about the change itself rather than a checklist of mechanical issues.&lt;/p&gt;

&lt;p&gt;Pull requests that used to require three rounds of back-and-forth now get approved on the first or second pass.&lt;/p&gt;

&lt;h2&gt;
  
  
  Disadvantages
&lt;/h2&gt;

&lt;p&gt;No approach is without trade-offs. Here are the downsides I've encountered and some honest reflections on them.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Lot of Initial Work
&lt;/h3&gt;

&lt;p&gt;Setting up robust scripts, configuring pipelines, writing comprehensive tests, and building common libraries takes time. Time that could otherwise go toward feature delivery.&lt;/p&gt;

&lt;p&gt;I personally don't find this burdensome because I've seen the compounding benefits across multiple teams. But I understand the hesitation. When you're under pressure to ship a fuel pricing integration before the end of the quarter, spending two days writing a setup script feels like a luxury you can't afford.&lt;/p&gt;

&lt;p&gt;The reality is that this investment is easier to justify on greenfield projects or during quieter periods. Retrofitting these practices onto a legacy codebase with looming deadlines is genuinely difficult. Sometimes you have to be pragmatic and introduce improvements incrementally rather than all at once.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can Cause Friction in Delivery If Not Managed Well
&lt;/h3&gt;

&lt;p&gt;Standards are helpful until they become obstacles. If your automated checks are too strict or too slow, they start blocking legitimate work.&lt;/p&gt;

&lt;p&gt;Consider this scenario: your team has a rule that every pull request must have 90% code coverage. An engineer is fixing a critical bug in the loyalty points calculation that's causing customers to lose discounts at checkout. The fix is two lines, but to satisfy the coverage requirement, they'd need to write fifteen new tests for an untested legacy function they happened to touch. The bug sits in production for an extra day while they write tests for unrelated code.&lt;/p&gt;

&lt;p&gt;Another example: you've established a convention that all API responses must follow a specific format. But the convention lives only in a Confluence page that nobody reads. Without automated schema validation, engineers keep forgetting. Reviews become tedious nitpicking sessions, and resentment builds. "Why did my PR get blocked for a formatting issue when the last three PRs got merged without it?"&lt;/p&gt;

&lt;p&gt;The lesson here is that standards need automated enforcement to be sustainable. If it can't be checked by a machine, it will eventually be ignored by humans.&lt;/p&gt;

&lt;h3&gt;
  
  
  OS Compatibility Issues
&lt;/h3&gt;

&lt;p&gt;Scripts written on macOS often break on Windows, and vice versa. This is a constant source of friction for teams with mixed development environments.&lt;/p&gt;

&lt;p&gt;A simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Works on macOS and Linux, fails on Windows&lt;/span&gt;

&lt;span class="c"&gt;# macOS sed syntax&lt;/span&gt;
&lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt; &lt;span class="s1"&gt;'s/old/new/g'&lt;/span&gt; config.json

&lt;span class="c"&gt;# Linux sed syntax (different from macOS)&lt;/span&gt;
&lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s1"&gt;'s/old/new/g'&lt;/span&gt; config.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or path handling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Unix-style paths&lt;/span&gt;
&lt;span class="nv"&gt;CONFIG_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"./config/local/settings.json"&lt;/span&gt;

&lt;span class="c"&gt;# Windows needs backslashes (or Git Bash to translate)&lt;/span&gt;
&lt;span class="nv"&gt;CONFIG_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;".&lt;/span&gt;&lt;span class="se"&gt;\c&lt;/span&gt;&lt;span class="s2"&gt;onfig&lt;/span&gt;&lt;span class="se"&gt;\l&lt;/span&gt;&lt;span class="s2"&gt;ocal&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="s2"&gt;ettings.json"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mitigation strategies include using cross-platform tools like Node.js scripts instead of bash, containerising your development environment with Docker, or maintaining separate scripts for different platforms. None of these are perfect, but they reduce the pain.&lt;/p&gt;

&lt;h3&gt;
  
  
  Script Sprawl
&lt;/h3&gt;

&lt;p&gt;When you start automating everything, you can end up with a dozen scripts scattered across your repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;scripts/
├── setup.sh
├── setup-windows.ps1
├── start.sh
├── start-docker.sh
├── run-tests.sh
├── run-integration-tests.sh
├── deploy-dev.sh
├── deploy-staging.sh
├── generate-mocks.sh
├── update-snapshots.sh
├── clean.sh
└── seed-database.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A new engineer clones the repo and has no idea which script to run first. "Do I run &lt;code&gt;setup.sh&lt;/code&gt; or &lt;code&gt;start.sh&lt;/code&gt;? What's the difference between &lt;code&gt;start.sh&lt;/code&gt; and &lt;code&gt;start-docker.sh&lt;/code&gt;?"&lt;/p&gt;

&lt;p&gt;The solution is consolidation and documentation. Consider a single entry point script with subcommands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./run.sh setup      &lt;span class="c"&gt;# First-time setup&lt;/span&gt;
./run.sh start      &lt;span class="c"&gt;# Start the service&lt;/span&gt;
./run.sh &lt;span class="nb"&gt;test&lt;/span&gt;       &lt;span class="c"&gt;# Run unit tests&lt;/span&gt;
./run.sh &lt;span class="nb"&gt;test&lt;/span&gt;:int   &lt;span class="c"&gt;# Run integration tests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use a Makefile, which is language-agnostic and self-documenting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight make"&gt;&lt;code&gt;&lt;span class="nl"&gt;.PHONY&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;help setup start test&lt;/span&gt;

&lt;span class="nl"&gt;help&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Available commands:"&lt;/span&gt;
    &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  make setup    - First-time environment setup"&lt;/span&gt;
    &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  make start    - Start the service locally"&lt;/span&gt;
    &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  make test     - Run unit tests"&lt;/span&gt;
    &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"  make test-int - Run integration tests"&lt;/span&gt;

&lt;span class="nl"&gt;setup&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    ./scripts/setup.sh

&lt;span class="nl"&gt;start&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    docker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
    npm run dev

&lt;span class="nl"&gt;test&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    npm run &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running &lt;code&gt;make help&lt;/code&gt; gives engineers a clear menu of options. No more guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Engineering onboarding isn't a one-time event. With average tenures shrinking and teams constantly evolving, it's a recurring challenge that deserves intentional investment.&lt;/p&gt;

&lt;p&gt;The practices outlined in this article aren't revolutionary. Setup scripts, automated formatting, comprehensive tests, and protected branches are all well-established ideas. The difference lies in treating them as a cohesive system rather than isolated improvements. Each piece reinforces the others. Scripts reduce setup friction. Tests enable confident changes. Automation shortens reviews. Together, they create an environment where a new engineer can go from &lt;code&gt;git clone&lt;/code&gt; to meaningful pull request in days rather than weeks.&lt;/p&gt;

&lt;p&gt;The upfront cost is real. Writing that first setup script takes time you could spend on features. Configuring pipeline templates isn't glamorous work. But every engineer who joins your team after that benefits. The investment amortises quickly, and the compound returns are substantial.&lt;/p&gt;

&lt;p&gt;Start where you are. If your team has none of these practices, don't try to implement everything at once. Pick one pain point. Maybe it's the two hours new joiners spend setting up their environment. Write a setup script. Maybe it's the endless formatting debates in code reviews. Add Prettier. Small improvements stack up.&lt;/p&gt;

&lt;p&gt;Your future team members will thank you. And honestly, so will your future self the next time you have to onboard someone new.&lt;/p&gt;

&lt;p&gt;This article is formatted and grammatically enhance with AI.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>productivity</category>
      <category>programming</category>
      <category>leadership</category>
    </item>
    <item>
      <title>Your Integration Layer is Probably Over-Engineered (Let's Fix It with Camel DSL)</title>
      <dc:creator>JOOJO DONTOH</dc:creator>
      <pubDate>Tue, 28 Oct 2025 15:01:17 +0000</pubDate>
      <link>https://dev.to/joojodontoh/your-integration-layer-is-probably-over-engineered-lets-fix-it-with-camel-dsl-35m2</link>
      <guid>https://dev.to/joojodontoh/your-integration-layer-is-probably-over-engineered-lets-fix-it-with-camel-dsl-35m2</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Hi guys it's me again 😁. If you've ever wished you could describe complex integration flows in plain, readable language rather than wrestling with boilerplate code, you're going to love DSLs. A &lt;strong&gt;Domain-Specific Language (DSL)&lt;/strong&gt; is essentially a specialized mini-language designed for a particular task. Think of it as a shorthand that lets you express what you want to do without getting lost in the how. Easy peasy&lt;/p&gt;

&lt;p&gt;Apache Camel embraces this philosophy wholeheartedly and offers developers multiple flavors of DSLs to choose from. Whether you're a fan of the &lt;strong&gt;Java DSL&lt;/strong&gt; with its fluent builder style, prefer the structured clarity of &lt;strong&gt;XML DSL&lt;/strong&gt; in Camel XML files, or lean toward &lt;strong&gt;Spring XML&lt;/strong&gt; for classic Spring configurations, there's something literally for everyone. You can define routes using &lt;strong&gt;YAML DSL&lt;/strong&gt; for a clean, human-readable format, build RESTful services with &lt;strong&gt;Rest DSL&lt;/strong&gt; (including contract-first approaches with OpenAPI specs), or even keep things annotation-based with the &lt;strong&gt;Annotation DSL&lt;/strong&gt; right in your Java beans.&lt;/p&gt;

&lt;p&gt;In this article, we're narrowing down on the &lt;strong&gt;YAML DSL&lt;/strong&gt;. It's a lightweight, intuitive way to define integration routes that feels more like writing a configuration file than coding. It follows the declarative way of engineering. We'll explore how to define routes, configure endpoints, and wire up beans, all while keeping our setup refreshingly simple. If you want to dive deeper into the technical details, the &lt;a href="https://camel.apache.org/components/4.14.x/others/yaml-dsl.html" rel="noopener noreferrer"&gt;official Camel YAML DSL documentation&lt;/a&gt; is a great resource. But for now, let's keep things practical and see what makes YAML DSL such a joy to work with.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Enterprise Integration Patterns?
&lt;/h2&gt;

&lt;p&gt;Before we dive into Camel DSL, let's talk about the "why" behind it all. Enterprise Integration Patterns (EIPs) are tried-and-true solutions to common problems that pop up when you're connecting different systems. These are the plumbing of most of the systems you use. Think of them as design patterns, but specifically for the messy world of enterprise integration where you're constantly moving data between APIs, databases, message queues, and legacy systems that were never meant to talk to each other.&lt;/p&gt;

&lt;p&gt;The concept was popularized by Gregor Hohpe and Bobby Woolf's seminal book, &lt;em&gt;Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions&lt;/em&gt;. Published in 2003, this book catalogued around 65 patterns that have since become the lingua franca for integration architects. Apache Camel was built from the ground up to implement these patterns, making them accessible through its DSL rather than forcing you to reinvent the wheel every time. &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbd15z08y9bcccxty20f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbd15z08y9bcccxty20f.png" alt="Enterprise Integration Patterns" width="800" height="1112"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here are five of the most common EIPs you'll encounter:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Content-Based Router&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Routes messages to different destinations based on their content. For example, you might route orders to different processing queues based on their total amount—high-value orders go to manual review, while smaller ones get auto-processed. &lt;a href="https://www.enterpriseintegrationpatterns.com/patterns/messaging/ContentBasedRouter.html" rel="noopener noreferrer"&gt;Read more here&lt;/a&gt; &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fysdzgp3xagt2jafrck00.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fysdzgp3xagt2jafrck00.png" alt="Content-Based Router" width="800" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Message Filter&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Acts as a gatekeeper, allowing only messages that meet certain criteria to pass through. Think of it as a bouncer for your data—only messages with valid formats or specific properties get through. &lt;a href="https://www.enterpriseintegrationpatterns.com/patterns/messaging/Filter.html" rel="noopener noreferrer"&gt;Read more here&lt;/a&gt; &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9666fezpihc4xzo6cxib.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9666fezpihc4xzo6cxib.png" alt="Message Filter" width="800" height="192"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Splitter&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Takes a single message containing multiple items (like a batch of orders) and breaks it into individual messages. This is handy when you need to process each item independently or send them to different endpoints.&lt;a href="https://www.enterpriseintegrationpatterns.com/patterns/messaging/Sequencer.html" rel="noopener noreferrer"&gt;Read more here&lt;/a&gt; &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgkn6eu0mhpowu4dwemei.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgkn6eu0mhpowu4dwemei.png" alt="Splitter" width="800" height="266"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Aggregator&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The opposite of a splitter—it combines multiple related messages into a single cohesive message. Perfect for scenarios where you're collecting responses from multiple services before sending a unified reply.&lt;br&gt;
&lt;a href="https://www.enterpriseintegrationpatterns.com/patterns/messaging/Aggregator.html" rel="noopener noreferrer"&gt;Read more here&lt;/a&gt; &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvq92rgg4qtto9j4jr246.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvq92rgg4qtto9j4jr246.png" alt="Aggregator" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Dead Letter Channel&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Your safety net for when things go wrong. Messages that fail processing get routed to a special "dead letter" queue where you can inspect them, fix issues, and potentially reprocess them later.&lt;br&gt;
&lt;a href="https://www.enterpriseintegrationpatterns.com/patterns/messaging/DeadLetterChannel.html" rel="noopener noreferrer"&gt;Read more here&lt;/a&gt; &lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8akr16ubim50jyrwka56.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8akr16ubim50jyrwka56.png" alt="Dead Letter Channel" width="800" height="496"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can explore the full catalog of patterns in the &lt;a href="https://camel.apache.org/components/4.14.x/eips/enterprise-integration-patterns.html" rel="noopener noreferrer"&gt;Apache Camel EIP documentation&lt;/a&gt;, but these five alone will cover a huge portion of your integration needs.&lt;/p&gt;
&lt;h2&gt;
  
  
  What is Apache Camel DSL?
&lt;/h2&gt;
&lt;h3&gt;
  
  
  A Brief History
&lt;/h3&gt;

&lt;p&gt;Apache Camel didn't appear out of nowhere. It was born from a real need to implement those Enterprise Integration Patterns we just discussed. Here's how it evolved:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2003&lt;/strong&gt; marked the beginning when Gregor Hohpe and Bobby Woolf's &lt;em&gt;Enterprise Integration Patterns&lt;/em&gt; book was published, laying the groundwork for what would become Apache Camel's DNA.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;June 27, 2007&lt;/strong&gt; saw Camel's initial release. From day one, the DSL was baked into its core design philosophy. The goal was simple: make it ridiculously easy to describe integration flows without drowning in boilerplate code.&lt;/p&gt;

&lt;p&gt;In those &lt;strong&gt;early years&lt;/strong&gt;, Camel established itself as the go-to framework for implementing EIPs in Java. The DSL wasn't just an afterthought, it was &lt;em&gt;the&lt;/em&gt; way you worked with Camel, turning complex integration logic into readable, maintainable code.&lt;/p&gt;

&lt;p&gt;As adoption grew, so did the DSL. It &lt;strong&gt;evolved beyond Java&lt;/strong&gt;, expanding to support XML, YAML, Groovy, and other syntaxes. This gave developers the freedom to pick the language that fit their team's style and existing infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2014 (Camel 2.14)&lt;/strong&gt; brought a game-changer: extensions to the routing DSL specifically for REST endpoints. Suddenly, building RESTful integrations became as straightforward as defining any other route.&lt;/p&gt;

&lt;p&gt;Today, &lt;strong&gt;ongoing development&lt;/strong&gt; keeps the DSL at the forefront of Camel's priorities. The community continuously refines it, making routes easier to write, test, and maintain. New features and improvements roll out regularly, keeping pace with modern integration challenges.&lt;/p&gt;
&lt;h2&gt;
  
  
  How Does Camel YAML DSL Work?
&lt;/h2&gt;
&lt;h3&gt;
  
  
  From YAML to Running Integration
&lt;/h3&gt;

&lt;p&gt;Here's where things get really interesting. You write a simple YAML file declaring your integration patterns, and somehow it becomes a running Java application. But here's the key distinction: &lt;strong&gt;this isn't compilation—it's interpretation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When you run a Camel YAML route, you're not generating Java source code, compiling it, and then executing bytecode. Instead, Camel reads your YAML file at runtime, parses it, and dynamically constructs the integration flow in memory. Think of it like the difference between translating a book (compilation) versus having a real-time interpreter translate as you speak (interpretation). The YAML DSL is interpreted into Camel's internal routing model on the fly.&lt;/p&gt;
&lt;h3&gt;
  
  
  Enter JBang
&lt;/h3&gt;

&lt;p&gt;So how do we actually run these YAML files? Meet &lt;strong&gt;JBang&lt;/strong&gt;—a tool that lets you run Java applications with zero ceremony. No project setup, no build files, no IDE required. Just a simple command and you're off to the races.&lt;/p&gt;

&lt;p&gt;JBang is essentially a launcher and script runner for Java. It can download dependencies, manage classpaths, and execute Java code—all from a single command. For Camel, this means you can run integration routes as easily as running a Python script.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Interpretation Process
&lt;/h3&gt;

&lt;p&gt;When you execute &lt;code&gt;jbang camel@apache/camel run route.yaml&lt;/code&gt;, here's what happens under the hood:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;JBang Downloads &amp;amp; Starts&lt;/strong&gt;: JBang fetches the Camel catalog if it's not already cached and initializes the runtime environment.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;File Detection&lt;/strong&gt;: JBang detects that you've provided a YAML file and determines which parser to use.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;YAML Parsing&lt;/strong&gt;: The YAML content is parsed into a structured format that can be processed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Convert to Camel Model&lt;/strong&gt;: The parsed YAML is transformed into Camel's internal route model—essentially Java objects that represent your integration flow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Component Resolution&lt;/strong&gt;: Camel identifies which components you're using (HTTP, Kafka, databases, etc.) and loads them if needed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Endpoint Creation&lt;/strong&gt;: Based on your route definitions, Camel creates endpoint instances that represent the actual connections to external systems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Processor Pipeline Creation&lt;/strong&gt;: Any transformations, filters, or logic in your route are assembled into a processing pipeline.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Consumer Creation&lt;/strong&gt;: Camel sets up consumers that will trigger your route (like an HTTP listener or a file watcher).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Route Registration &amp;amp; Start&lt;/strong&gt;: Finally, the complete route is registered with Camel's context and started, ready to process messages.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All of this happens in seconds, and you're left with a fully functional integration running in a JVM.&lt;/p&gt;

&lt;p&gt;For those curious about the inner workings, the &lt;a href="https://github.com/apache/camel" rel="noopener noreferrer"&gt;Apache Camel GitHub repository&lt;/a&gt; is a treasure trove of information.&lt;/p&gt;
&lt;h3&gt;
  
  
  Hands-On: Building a Weather Service
&lt;/h3&gt;

&lt;p&gt;Let's get practical. We'll build a simple REST service that fetches weather information for a given country. This example will demonstrate route definition, endpoint definition, and how Camel handles external API calls.&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 1: Install JBang
&lt;/h4&gt;

&lt;p&gt;First, install JBang. On macOS or Linux:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-Ls&lt;/span&gt; https://sh.jbang.dev | bash &lt;span class="nt"&gt;-s&lt;/span&gt; - app setup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On Windows (using PowerShell):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;iex&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;amp; { &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="n"&gt;iwr&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-useb&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://ps.jbang.dev&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; } app setup"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify the installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;jbang version
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtopj24aa40xbi8wfgev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtopj24aa40xbi8wfgev.png" alt="jbang version" width="562" height="140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 2: Create Your Route
&lt;/h4&gt;

&lt;p&gt;Create a file called &lt;code&gt;weather-service.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;weather-api-route&lt;/span&gt;
    &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform-http:/weather"&lt;/span&gt;
      &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;httpMethodRestrict&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GET"&lt;/span&gt;
      &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setProperty&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;country&lt;/span&gt;
            &lt;span class="na"&gt;simple&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${header.country}"&lt;/span&gt;

        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;choice&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;when&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;simple&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${header.country}&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;==&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;null"&lt;/span&gt;
                &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setHeader&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Content-Type&lt;/span&gt;
                      &lt;span class="na"&gt;constant&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;application/json&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setBody&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                      &lt;span class="na"&gt;constant&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{"error":&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;"Country&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;parameter&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;required"}'&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setHeader&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CamelHttpResponseCode&lt;/span&gt;
                      &lt;span class="na"&gt;constant&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;400"&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;stop&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{}&lt;/span&gt;

        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;removeHeaders&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*"&lt;/span&gt;

        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;toD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://wttr.in/${exchangeProperty.country}"&lt;/span&gt;
            &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;j1"&lt;/span&gt;
              &lt;span class="na"&gt;bridgeEndpoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;

        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;unmarshal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;json&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="na"&gt;library&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Jackson&lt;/span&gt;

        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setBody&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;simple&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;weather&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;${exchangeProperty.country}&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;${body[current_condition][0][temp_C]}&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;degrees&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Celsius&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;${body[current_condition][0][weatherDesc][0][value]}"&lt;/span&gt;

        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;setHeader&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Content-Type&lt;/span&gt;
            &lt;span class="na"&gt;constant&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;text/plain&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let me break down what's happening here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Route Definition&lt;/strong&gt;: We define a route with ID &lt;code&gt;weather-api-route&lt;/code&gt; that starts from an HTTP endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Endpoint Definition&lt;/strong&gt;: The &lt;code&gt;platform-http:/weather&lt;/code&gt; endpoint creates an HTTP server listening on the &lt;code&gt;/weather&lt;/code&gt; path, restricted to GET requests only.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Processing Steps&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We extract the &lt;code&gt;country&lt;/code&gt; parameter from the request header and store it as an exchange property&lt;/li&gt;
&lt;li&gt;We validate that the country parameter exists, returning a 400 error if it doesn't&lt;/li&gt;
&lt;li&gt;We remove all incoming HTTP headers with &lt;code&gt;removeHeaders&lt;/code&gt; to prevent them from interfering with our external API call&lt;/li&gt;
&lt;li&gt;We use &lt;code&gt;toD&lt;/code&gt; (dynamic to) to call the wttr.in weather API with the country variable. The &lt;code&gt;D&lt;/code&gt; in &lt;code&gt;toD&lt;/code&gt; means it evaluates expressions at runtime&lt;/li&gt;
&lt;li&gt;We set &lt;code&gt;bridgeEndpoint: "true"&lt;/code&gt; to properly bridge the HTTP connection between our endpoint and the external API&lt;/li&gt;
&lt;li&gt;We unmarshal the JSON response using Jackson&lt;/li&gt;
&lt;li&gt;We transform it into a readable sentence using Camel's Simple language&lt;/li&gt;
&lt;li&gt;We set the response content type to plain text&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important Notes&lt;/strong&gt;: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We use &lt;code&gt;toD&lt;/code&gt; instead of &lt;code&gt;to&lt;/code&gt; because we need to evaluate the &lt;code&gt;${exchangeProperty.country}&lt;/code&gt; expression dynamically at runtime&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;removeHeaders&lt;/code&gt; step is crucial - without it, headers from the incoming request would be passed to the external API, causing errors&lt;/li&gt;
&lt;li&gt;In YAML DSL, query parameters must be in the &lt;code&gt;parameters&lt;/code&gt; section, not embedded in the URI&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Step 3: Run Your Route
&lt;/h4&gt;

&lt;p&gt;Execute your route with JBang:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;jbang camel@apache/camel run weather-service.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see Camel start up, and it will tell you which port it's listening on (typically 8080). The first startup might take a bit longer as JBang downloads dependencies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjjj2of2dd7p1jyo644vs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjjj2of2dd7p1jyo644vs.png" alt="jbang run" width="800" height="136"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 4: Test It Out
&lt;/h4&gt;

&lt;p&gt;Open another terminal and try it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;"http://localhost:8080/weather"&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"country: London"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should get a response like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The weather in London is 15 degrees Celsius with Partly cloudy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqqlntoa78elybqee7eu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffqqlntoa78elybqee7eu.png" alt="Result" width="800" height="42"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Try different cities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;"http://localhost:8080/weather"&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"country: Tokyo"&lt;/span&gt;
curl &lt;span class="s2"&gt;"http://localhost:8080/weather"&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"country: NewYork"&lt;/span&gt;
curl &lt;span class="s2"&gt;"http://localhost:8080/weather"&lt;/span&gt; &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"country: Paris"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test the error handling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;"http://localhost:8080/weather"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This should return:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Country parameter is required"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkckyxh98endp66j4a3qh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkckyxh98endp66j4a3qh.png" alt="Error" width="800" height="59"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note on Response Times&lt;/strong&gt;: You might notice requests take 5-15 seconds to complete. This isn't your Camel route being slow—it's waiting on wttr.in, a free weather service that aggregates data in real-time. If you test the API directly (&lt;code&gt;curl "https://wttr.in/London?format=j1"&lt;/code&gt;), you'll see it takes about the same time. Your Camel route itself is fast; it's just waiting on the external dependency. In production, you'd typically use a faster commercial weather API or implement caching to improve response times.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Step 5: Visualize with Karavan
&lt;/h4&gt;

&lt;p&gt;Want to see your route visually? Install the &lt;strong&gt;Karavan&lt;/strong&gt; extension in VS Code:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open VS Code&lt;/li&gt;
&lt;li&gt;Go to Extensions (Ctrl+Shift+X or Cmd+Shift+X on Mac)&lt;/li&gt;
&lt;li&gt;Search for "Karavan"&lt;/li&gt;
&lt;li&gt;Install the "Karavan" extension by Apache Software Foundation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once installed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Right-click on your &lt;code&gt;weather-service.yaml&lt;/code&gt; file in VS Code&lt;/li&gt;
&lt;li&gt;Select &lt;strong&gt;"Karavan: Open"&lt;/strong&gt; from the context menu&lt;/li&gt;
&lt;li&gt;You'll see a visual representation of your route with all the steps laid out graphically&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Karavan designer shows your route as a flowchart—from the HTTP endpoint through validation, header removal, the external API call, JSON unmarshalling, transformation, and finally the response. It's incredibly helpful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understanding complex routes at a glance&lt;/li&gt;
&lt;li&gt;Debugging integration flows&lt;/li&gt;
&lt;li&gt;Communicating integration logic to non-technical stakeholders&lt;/li&gt;
&lt;li&gt;Editing routes visually instead of writing YAML by hand&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can even use Karavan to build routes from scratch by dragging and dropping components, and it will generate the YAML for you!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3yka39dwwsv00vd1xih.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh3yka39dwwsv00vd1xih.png" alt="Karavan" width="800" height="738"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Use Camel YAML DSL? Real-World Advantages
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Rapid Project Spinning with Proven Patterns
&lt;/h3&gt;

&lt;p&gt;Camel YAML DSL gives you instant access to 65+ pre-built Enterprise Integration Patterns. Need a content-based router, message filter, or splitter? It's just a few lines of YAML. What typically takes weeks to build from scratch—complete with error handling and retry logic—becomes an afternoon of configuration. You're describing what you want, and Camel handles the how.&lt;/p&gt;

&lt;h3&gt;
  
  
  MVPs and Proof of Concepts at Lightning Speed
&lt;/h3&gt;

&lt;p&gt;For startups and innovation teams, speed matters. Build a functioning integration layer in hours, not weeks. Connect to Stripe, send Twilio notifications, sync to your CRM—all achievable in a single afternoon. When business requirements change (and they will), your routes change with them. No massive refactoring, just edit YAML and redeploy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Democratizing Integration Development
&lt;/h3&gt;

&lt;p&gt;You don't need to be a Java expert to build integrations anymore. JBang and YAML DSL lower the barrier dramatically. Business analysts, product managers, or junior engineers with basic YAML knowledge can create and modify routes. This shifts senior engineers from implementing routine integrations to building reusable components and solving genuinely complex problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Centralizing Integration Logic Across Teams
&lt;/h3&gt;

&lt;p&gt;Companies struggle with integration sprawl—every team building their own connectors and reinventing the same patterns. Camel DSL solves this. A platform team maintains common components and patterns, while product teams compose them as needed. Want all APIs to use exponential backoff? Configure it once. Need centralized logging? Build it into your base template. One update benefits every route. This is a personal favorite of mine&lt;/p&gt;

&lt;h3&gt;
  
  
  Building Integration Platforms on Kubernetes
&lt;/h3&gt;

&lt;p&gt;Here's where Camel truly shines at scale. Companies build entire integration platforms using this pattern:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Setup&lt;/strong&gt;: Multiple Kubernetes clusters divided into namespaces (workspaces), each owned by a team. Each integration runs as a pod within these namespaces.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Smart Routing&lt;/strong&gt;: Route traffic between namespaces or clusters using Kubernetes networking. A payment request flows through payment → order → notification namespaces seamlessly. Add content-based routing to direct European orders to EU clusters for data residency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security&lt;/strong&gt;: Kubernetes secrets hold API keys, passwords, and tokens. Each pod only accesses its required secrets—payment pods can't see shipping credentials, notification pods can't access payment gateways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Team Autonomy&lt;/strong&gt;: Teams deploy their integrations independently. They write YAML routes, push to Git, and CI/CD handles the rest. Teams set their own resource limits, autoscaling rules, and health checks while following platform-level guardrails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Payoff&lt;/strong&gt;: Self-service integration platform. Fast team velocity. Enforced security and governance. Kubernetes handles scaling. Clean, readable YAML that stakeholders understand. Enterprises already use this pattern to manage thousands of integrations across distributed teams.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Other Side: When DSLs Fall Short
&lt;/h2&gt;

&lt;h3&gt;
  
  
  High Abstraction Hides Complexity
&lt;/h3&gt;

&lt;p&gt;YAML DSL sits at a very high level of abstraction. You declare &lt;em&gt;what&lt;/em&gt; you want, and Camel figures out &lt;em&gt;how&lt;/em&gt; to make it happen. This is powerful—until something breaks or behaves unexpectedly.&lt;/p&gt;

&lt;p&gt;Debugging becomes detective work. Your YAML looks correct, but the route isn't working. Is it a timing issue? A header not being set? A component default you didn't know about? Suddenly you need to understand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How Camel interprets and executes routes&lt;/li&gt;
&lt;li&gt;Underlying component implementations
&lt;/li&gt;
&lt;li&gt;Message exchange patterns and contexts&lt;/li&gt;
&lt;li&gt;How properties and headers flow through pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The abstraction that made development fast now makes troubleshooting slow. You end up diving into Camel documentation, examining verbose logs, and sometimes reading Java source code to understand what's really happening under your simple YAML declarations.&lt;/p&gt;

&lt;h3&gt;
  
  
  It's Java All the Way Down
&lt;/h3&gt;

&lt;p&gt;Here's the uncomfortable truth: Camel is a Java framework. When things get complex or custom, you're writing Java code.&lt;/p&gt;

&lt;p&gt;Need a transformation more complex than Simple language supports? Write Java. Want a custom component or specialized error handling? Java. Need to debug a route deadlocking under load? Better understand Java concurrency, exception handling, and threading models.&lt;/p&gt;

&lt;p&gt;This creates a skills gap. You might have team members comfortable with YAML who hit a wall when they need to drop into Java. Or you have Java experts who view the DSL as unnecessary abstraction. Either way, the promise of "just write YAML" only holds for straightforward integrations. Anything non-trivial requires Java expertise.&lt;/p&gt;

&lt;h3&gt;
  
  
  Organizational Startup Costs
&lt;/h3&gt;

&lt;p&gt;Building an integration platform around Camel DSL isn't trivial. There's real investment required:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Learning Curve&lt;/strong&gt;: Teams need to learn YAML syntax, Camel concepts, EIP patterns, component configurations, and debugging techniques. This takes time—expect weeks or months before teams are productive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure Setup&lt;/strong&gt;: You need CI/CD pipelines, container registries, Kubernetes clusters (if going that route), monitoring systems, and logging infrastructure. Setting this up properly takes effort.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Governance and Standards&lt;/strong&gt;: Someone needs to establish coding standards, route templates, security policies, and deployment procedures. Without this, you'll end up with inconsistent routes that are hard to maintain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cultural Change&lt;/strong&gt;: Non-engineers writing integrations sounds great in theory, but requires buy-in. Engineers might resist sharing control. Business analysts might not want the responsibility. Establishing this new way of working takes organizational effort.&lt;/p&gt;

&lt;p&gt;The payoff is significant, but don't underestimate the upfront investment needed to make it work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuration Has Its Limits
&lt;/h3&gt;

&lt;p&gt;YAML DSL is fantastic for standard patterns, but sometimes you need something unique. Maybe you're integrating with a legacy system that has quirky authentication. Or you need custom data transformation logic that's too complex for Simple language. Perhaps you're dealing with binary protocols that don't fit Camel's HTTP-centric model.&lt;/p&gt;

&lt;p&gt;When configuration isn't enough, you have to write custom Java components or processors. Now you're maintaining both YAML routes &lt;em&gt;and&lt;/em&gt; Java code, which defeats some of the simplicity. You also create a two-tier system: simple integrations anyone can handle, and complex ones that require Java developers.&lt;/p&gt;

&lt;p&gt;The line between "configurable" and "needs code" isn't always clear until you're deep into implementation. What starts as "just a simple route" can evolve into something requiring custom Java, at which point you might wonder if starting with code would have been cleaner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Apache Camel YAML DSL represents a pragmatic approach to enterprise integration. It takes decades of integration patterns, wraps them in readable YAML, and makes them accessible to a broader audience than traditional Java frameworks ever could.&lt;/p&gt;

&lt;p&gt;Is it perfect? No. You're trading explicit control for declarative simplicity, and that tradeoff won't suit every team or every project. But for organizations drowning in integration complexity, or startups needing to move fast, or platform teams trying to democratize development—it's a compelling option.&lt;/p&gt;

&lt;p&gt;The key is knowing when to reach for it. Building a weather API demo? Perfect. Connecting your e-commerce platform to payment gateways, inventory systems, and shipping providers? Absolutely. Implementing something so custom that you're fighting the framework every step of the way? Maybe reconsider.&lt;/p&gt;

&lt;p&gt;Start small. Spin up JBang, write a simple route, see how it feels. You might find that what once took your team weeks now takes hours, and that's worth paying attention to. The patterns are proven, the tooling is mature, and the community is active. Give it a shot—your future self (and your integration backlog) might thank you. See you in the next one&lt;/p&gt;

</description>
      <category>camel</category>
      <category>apache</category>
      <category>java</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Skip the Database: Building Analytics Dashboards Directly from S3 Files</title>
      <dc:creator>JOOJO DONTOH</dc:creator>
      <pubDate>Sat, 27 Sep 2025 07:57:42 +0000</pubDate>
      <link>https://dev.to/joojodontoh/skip-the-database-building-analytics-dashboards-directly-from-s3-files-34an</link>
      <guid>https://dev.to/joojodontoh/skip-the-database-building-analytics-dashboards-directly-from-s3-files-34an</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Hello guys, my name is Jo and welcome back to my engineering blog!.&lt;br&gt;
I'm gonna talk about something cool today, let's go!😎.&lt;br&gt;
In today's data-driven landscape, organizations collect vast amounts of information from multiple sources in various formats including structured databases, APIs, IoT sensors, and application logs, with much of this data landing in formats like CSV and JSON files that need to be stored, processed, and analyzed efficiently. While Amazon S3 serves as an excellent scalable storage solution for these files, it's a significant engineering challenge 🤕 to extract meaningful insights from data sitting in S3 buckets, especially when stakeholders need regular access to visualizations and reports without the overhead of complex data pipeline management. &lt;br&gt;
Traditional approaches often involve provisioning expensive database instances, writing ETL jobs to move data around, and maintaining multiple data stores just to enable business intelligence tools like Power BI to access the information they need. In this article, I'll walk you through a cost-effective, serverless architecture that uses AWS Glue for data cataloging, Amazon Athena for SQL querying, and Power BI for visualization. This creates a streamlined solution that maintains S3 as your single source of truth while providing the SQL interface that Power BI expects, all without the complexity and cost of traditional database provisioning. Let's dive into it!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;While Amazon S3 serves as an excellent scalable storage solution for these files, it's a significant engineering challenge 🤕 to extract meaningful insights from data sitting in S3 buckets&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h1&gt;
  
  
  Situation
&lt;/h1&gt;

&lt;p&gt;It's critical to understand that the solution presented in this article is designed for a specific set of circumstances and constraints, and while it may apply to many similar use cases, &lt;strong&gt;every engineering decision should start with clearly defined functional and non-functional requirements that match your particular context&lt;/strong&gt;. In our scenario, CSV files are being deposited into an S3 bucket at regular intervals, which creates a steady stream of data that needs to be accessible for analysis and reporting. These files come pre-structured with a consistent, well-defined schema and the data has already been cleaned and sanitized upstream, meaning we don't need to handle data quality issues, type conversions, or missing field scenarios within our solution. Most importantly, the data represents a single business entity without complex relationships or foreign key dependencies that would require maintaining referential integrity across multiple tables, allowing us to treat each file as a self-contained dataset that can be processed and queried independently.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CSV Files (Regular Intervals) → S3 Bucket Storage
     ↓
Well-structured + Pre-sanitized Data
     ↓
Single Entity (No Relational Dependencies)
     ↓
Ready for Processing (SQL)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Problem/Requirements
&lt;/h1&gt;

&lt;p&gt;The core challenge lies in making the data dumped into S3 buckets accessible and actionable for stakeholders who need regular visibility into operational insights through dashboards and reports. Currently, the team handles similar data requirements by loading CSV contents into traditional SQL databases, which then serve as the data source for Power BI through standard database connections. This solution exists primarily because Power BI has robust, well-established SQL connectivity and most teams are familiar with this approach. However, this conventional method introduces several significant problems that compound over time and create unnecessary operational overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Issues with the Current Approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Infrastructure Costs&lt;/strong&gt; - Provisioning and maintaining database instances (RDS, SQL Server, etc.) for what is essentially file storage creates ongoing monthly expenses that scale with data volume and performance requirements&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Unnecessary Complexity&lt;/strong&gt; - Data gets written into SQL tables purely for Power BI access, not because relational database features like ACID transactions, foreign keys, or complex joins are actually needed for this use case&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Access Management Overhead&lt;/strong&gt; - Every new database introduces another set of credentials, connection strings, and security policies that need to be managed, rotated, and monitored&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Increased System Dependencies&lt;/strong&gt; - Adding database services means more moving parts that can fail, require updates, need backup strategies, and demand monitoring - each new service multiplies your potential points of failure&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Refresh Constraints&lt;/strong&gt; - Power BI's periodic refresh cycles from SQL databases create timing dependencies where dashboard data freshness becomes a configuration trade-off rather than being driven by actual data availability&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Fragmentation&lt;/strong&gt; - Managing multiple data types across different storage solutions (files in S3, structured data in databases) creates inconsistencies in backup strategies, access patterns, and operational procedures&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Solution
&lt;/h1&gt;

&lt;h2&gt;
  
  
  The Proposed Architecture
&lt;/h2&gt;

&lt;p&gt;The solution replaces the traditional database-centric approach with a serverless, event-driven architecture that treats S3 as both storage and the single source of truth for all data operations. This approach maintains data in its original CSV format while providing the SQL query interface that Power BI requires, eliminating the need for intermediate database layers entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core Requirements
&lt;/h3&gt;

&lt;p&gt;Our replacement solution addresses four fundamental requirements:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Single Source of Truth&lt;/strong&gt; - All data remains in S3 in CSV format, eliminating data duplication and synchronization issues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intelligent Cataloging&lt;/strong&gt; - Files are automatically cataloged with schema information and logical partitioning, similar to database table structures but without the database overhead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Event-Driven Processing&lt;/strong&gt; - New file arrivals trigger automatic catalog updates, ensuring data is immediately available for querying&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SQL Query Layer&lt;/strong&gt; - Athena provides the SQL interface that Power BI expects, querying directly against cataloged S3 data&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Architecture Components
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   CSV File  │───▶│  S3 Bucket  │───▶│ S3 Event    │───▶│ SQS Queue   │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
                                                                   │
┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────┴───┐
│  Power BI   │◀───│   Athena    │◀───│ Glue Catalog│◀───│   Lambda    │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tool Breakdown
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Role in Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;S3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Primary data storage&lt;/td&gt;
&lt;td&gt;Houses original CSV files and partitioned data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SQS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Event queue&lt;/td&gt;
&lt;td&gt;Buffers S3 file creation events for processing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lambda&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Event processor&lt;/td&gt;
&lt;td&gt;Reads files, creates partitions, updates Glue catalog&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Glue&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data catalog&lt;/td&gt;
&lt;td&gt;Maintains schema and partition metadata for Athena&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Athena&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Query engine&lt;/td&gt;
&lt;td&gt;Provides SQL interface for Power BI connectivity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdod7bp04lc2rp186r32z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdod7bp04lc2rp186r32z.png" alt=" " width="800" height="255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Solution Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Cost Efficiency
&lt;/h3&gt;

&lt;p&gt;Athena operates on a pay-per-query model, charging approximately &lt;strong&gt;$5 per TB of data scanned&lt;/strong&gt; (&lt;a href="https://aws.amazon.com/athena/pricing/" rel="noopener noreferrer"&gt;AWS Athena Pricing&lt;/a&gt;), which is dramatically cheaper than maintaining always-on database instances that can cost hundreds of dollars monthly regardless of usage patterns.&lt;/p&gt;

&lt;h3&gt;
  
  
  Operational Simplicity
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single Data Format&lt;/strong&gt; - Everything stays in CSV, eliminating format conversion complexity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified Access Control&lt;/strong&gt; - S3 IAM policies manage all data access, no separate database credentials&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Minimal Dependencies&lt;/strong&gt; - The entire pipeline uses managed AWS services with no servers to maintain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Direct Integration&lt;/strong&gt; - Athena provides native Power BI connectivity through ODBC/JDBC drivers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Performance Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Query Caching&lt;/strong&gt; - Athena automatically caches results for repeated queries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partition Pruning&lt;/strong&gt; - Only scans relevant data partitions, dramatically reducing query costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Columnar Optimization&lt;/strong&gt; - Can easily migrate to Parquet format later for even better performance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Trade-offs and Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Performance Limitations
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Traditional DB: Sub-second response for cached data&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-15'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Athena: 2-10 second response depending on data size&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="nb"&gt;year&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'2024'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'01'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="k"&gt;day&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'15'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Athena queries have higher latency than traditional databases since they scan files rather than using pre-built indexes, but this trade-off is acceptable when query frequency is moderate and cost savings are significant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Initial Setup Complexity
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Power BI + Athena Integration Requirements:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Driver Installation&lt;/strong&gt; - ODBC/JDBC drivers on dashboard servers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network Configuration&lt;/strong&gt; - VPC endpoints and security group rules for federated AWS accounts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authentication Setup&lt;/strong&gt; - IAM roles and cross-account access policies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gateway Configuration&lt;/strong&gt; - On-premises data gateway driver installation
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Example driver setup&lt;/span&gt;
wget https://s3.amazonaws.com/athena-downloads/drivers/ODBC/SimbaAthenaODBC-1.1.17.1000-Linux64.zip
&lt;span class="c"&gt;# Configure connection string in Power BI&lt;/span&gt;
&lt;span class="c"&gt;# Server: athena.us-east-1.amazonaws.com&lt;/span&gt;
&lt;span class="c"&gt;# Database: your_glue_database&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While the initial connectivity setup between Power BI and Athena can be complex, especially in enterprise environments with federated AWS accounts, these are one-time configuration challenges that become valuable learning experiences for teams expanding their cloud-native analytics capabilities.&lt;/p&gt;

&lt;h1&gt;
  
  
  Suggested Implementation
&lt;/h1&gt;

&lt;p&gt;This implementation guide uses TypeScript, but the concepts translate to any language. The key is building a maintainable, extensible solution that follows the &lt;strong&gt;Open/Closed Principle&lt;/strong&gt; - open for extension, closed for modification.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Core Schema Design (Critical Prerequisite)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Schema Analysis and Configuration
&lt;/h3&gt;

&lt;p&gt;Start by examining your S3 files to understand the data structure and extract partition information from filenames:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Example filename: sales_data_2024_03_15.csv&lt;/span&gt;
&lt;span class="c1"&gt;// Partition extraction: year=2024, month=03, day=15&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sales_data_2024_03_15.csv&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;partitionPattern&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/sales_data_&lt;/span&gt;&lt;span class="se"&gt;(\d{4})&lt;/span&gt;&lt;span class="sr"&gt;_&lt;/span&gt;&lt;span class="se"&gt;(\d{2})&lt;/span&gt;&lt;span class="sr"&gt;_&lt;/span&gt;&lt;span class="se"&gt;(\d{2})\.&lt;/span&gt;&lt;span class="sr"&gt;csv/&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[,&lt;/span&gt; &lt;span class="nx"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;day&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;partitionPattern&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Shared Configuration Structure
&lt;/h3&gt;

&lt;p&gt;Create a centralized configuration that serves both your code and Infrastructure as Code (IAC):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;shared-config.json&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"aws"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"region"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"us-east-1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"account_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${AWS_ACCOUNT_ID}"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resources"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"source_bucket"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"my-company-data-source"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"analytics_bucket"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"my-company-analytics"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"glue_database"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"analytics_db"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"workgroup_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"analytics_workgroup"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sqs_queue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"file_processing_queue"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"entities"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sales"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"table_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sales_data"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"filename_pattern"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sales_data_(&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{4})_(&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{2})_(&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;d{2})&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;.csv"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"partition_keys"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"year"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"month"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"day"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"transaction_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"customer_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"amount"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"decimal(10,2)"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"transaction_date"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"date"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Infrastructure as Code
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Account-Agnostic Infrastructure
&lt;/h3&gt;

&lt;p&gt;Structure your IAC to work across any AWS account with minimal changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# terraform/main.tf or cloudformation template&lt;/span&gt;
&lt;span class="c1"&gt;# All names reference shared config&lt;/span&gt;

&lt;span class="s"&gt;resource "aws_s3_bucket" "analytics_bucket" {&lt;/span&gt;
  &lt;span class="s"&gt;bucket = var.shared_config.resources.analytics_bucket&lt;/span&gt;

  &lt;span class="s"&gt;versioning {&lt;/span&gt;
    &lt;span class="s"&gt;enabled = &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="s"&gt;}&lt;/span&gt;
&lt;span class="err"&gt;}&lt;/span&gt;

&lt;span class="s"&gt;resource "aws_glue_catalog_database" "analytics_db" {&lt;/span&gt;
  &lt;span class="s"&gt;name = var.shared_config.resources.glue_database&lt;/span&gt;
&lt;span class="err"&gt;}&lt;/span&gt;

&lt;span class="s"&gt;resource "aws_athena_workgroup" "analytics" {&lt;/span&gt;
  &lt;span class="s"&gt;name = var.shared_config.resources.workgroup_name&lt;/span&gt;

  &lt;span class="s"&gt;configuration {&lt;/span&gt;
    &lt;span class="s"&gt;result_configuration {&lt;/span&gt;
      &lt;span class="s"&gt;output_location = "s3://${aws_s3_bucket.analytics_bucket.bucket}/query-results/"&lt;/span&gt;
    &lt;span class="s"&gt;}&lt;/span&gt;

    &lt;span class="s"&gt;enforce_workgroup_configuration = &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="s"&gt;publish_cloudwatch_metrics = &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="s"&gt;}&lt;/span&gt;
&lt;span class="err"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Complete Resource Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Infrastructure components from shared config&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;infraConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;sourceBucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source_bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;analyticsBucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;analytics_bucket&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;glue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;glue_database&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;tables&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;sqs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;queueName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sqs_queue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// Note: Standard queue only - S3 events don't support FIFO&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Standard&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;functionName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;file-processor&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;nodejs22.x&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;sqsTrigger&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Core Lambda Logic
&lt;/h2&gt;

&lt;h3&gt;
  
  
  File Processing Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lambda/fileProcessor.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;SQSEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;SQSRecord&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;S3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Glue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;Athena&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SQSEvent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;record&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Records&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processS3File&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;processS3File&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sqsRecord&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SQSRecord&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// 1. Extract S3 event from SQS message&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;s3Event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sqsRecord&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;s3Event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Records&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;s3Event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Records&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// 2. Get entity configuration&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;entityConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getEntityFromFilename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// 3. Extract partition information&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;partitions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extractPartitions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;entityConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;filename_pattern&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// 4. Read and validate file&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileContent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getObject&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;promise&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// 5. Create partitioned path&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;partitionedPath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildPartitionPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entityConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;partitions&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// 6. Write to analytics bucket&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;putObject&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;analytics_bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;partitionedPath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fileContent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Body&lt;/span&gt;
  &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;promise&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// 7. Update Glue catalog&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;updateGlueCatalog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entityConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;partitions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;partitionedPath&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// 8. Optional: Validate with Athena query&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;validateCatalogUpdate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entityConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;partitions&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Partition Management
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;buildPartitionPath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entityConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;partitions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Example: sales_data/year=2024/month=03/day=15/sales_data_2024_03_15.csv&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;partitionPath&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;entityConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;partition_keys&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;partitions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;entityConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;partitionPath&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;originalFilename&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;updateGlueCatalog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;entityConfig&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;partitions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;s3Path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;glue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;AWS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Glue&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;glue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createPartition&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;DatabaseName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;glue_database&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;TableName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;entityConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;PartitionInput&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;Values&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;entityConfig&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;partition_keys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;partitions&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
      &lt;span class="na"&gt;StorageDescriptor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;Location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`s3://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;analytics_bucket&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;s3Path&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;InputFormat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;org.apache.hadoop.mapred.TextInputFormat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;OutputFormat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;SerdeInfo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;SerializationLibrary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nf"&gt;promise&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Power BI Connection Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites Checklist
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Local Development Setup&lt;/span&gt;
&lt;span class="c"&gt;# Download Athena ODBC driver&lt;/span&gt;
wget https://s3.amazonaws.com/athena-downloads/drivers/ODBC/SimbaAthenaODBC-1.1.17.1000-Windows.msi

&lt;span class="c"&gt;# 2. Gateway Installation (for enterprise distribution)&lt;/span&gt;
&lt;span class="c"&gt;# Install driver on Power BI Gateway machine&lt;/span&gt;
&lt;span class="c"&gt;# Configure data source in gateway admin console&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Connection Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# ODBC Connection String
&lt;/span&gt;&lt;span class="py"&gt;Driver&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;{Amazon Athena ODBC Driver};&lt;/span&gt;
&lt;span class="py"&gt;Server&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;athena.us-east-1.amazonaws.com;&lt;/span&gt;
&lt;span class="py"&gt;Port&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;443;&lt;/span&gt;
&lt;span class="py"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;analytics_db;&lt;/span&gt;
&lt;span class="py"&gt;Workgroup&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;analytics_workgroup;&lt;/span&gt;
&lt;span class="py"&gt;AuthenticationType&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;IAM Credentials;&lt;/span&gt;
&lt;span class="py"&gt;UID&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;AKIA...;&lt;/span&gt;
&lt;span class="py"&gt;PWD&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;secret_key;&lt;/span&gt;
&lt;span class="py"&gt;S3OutputLocation&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;s3://my-company-analytics/query-results/;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Network Requirements
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# VPC Endpoint Configuration (if using private connectivity)&lt;/span&gt;
&lt;span class="na"&gt;vpc_endpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;com.amazonaws.us-east-1.athena&lt;/span&gt;
    &lt;span class="na"&gt;route_table_ids&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rtb-xxx"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;com.amazonaws.us-east-1.s3&lt;/span&gt;
    &lt;span class="na"&gt;route_table_ids&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rtb-xxx"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Security Group Rules&lt;/span&gt;
&lt;span class="na"&gt;security_groups&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;athena_access&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;ingress&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;protocol&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tcp&lt;/span&gt;
        &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;443&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
        &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;power-bi-gateway-sg"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  IAM Service Account
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Statement"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"athena:*"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"glue:GetDatabase"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"glue:GetTable"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"glue:GetPartitions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"s3:GetObject"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"s3:ListBucket"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"s3:PutObject"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:athena:*:*:workgroup/analytics_workgroup"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:glue:*:*:catalog"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:glue:*:*:database/analytics_db"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:s3:::my-company-analytics/*"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Security Best Practices
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Workspace Binding&lt;/strong&gt; - Restrict report access to specific distribution lists&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Row-Level Security&lt;/strong&gt; - Implement in Power BI if data contains sensitive information&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credential Rotation&lt;/strong&gt; - Set up automatic rotation for IAM access keys&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPC Isolation&lt;/strong&gt; - Use VPC endpoints for private connectivity when possible&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This implementation provides a solid foundation that's extensible for additional entities and maintainable across different AWS environments.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;This serverless analytics architecture demonstrates how modern cloud services can dramatically simplify data pipelines while reducing costs and operational overhead. By treating S3 as both storage and source of truth, we've eliminated the traditional database layer that often exists purely to satisfy business intelligence tool requirements, not actual business logic needs.&lt;/p&gt;

&lt;p&gt;The solution delivers several key advantages: &lt;strong&gt;astronomical cost savings&lt;/strong&gt; through Athena's pay-per-query model versus always-on database instances, &lt;strong&gt;operational simplicity&lt;/strong&gt; with fewer moving parts to manage and monitor, and &lt;strong&gt;architectural flexibility&lt;/strong&gt; that scales effortlessly with data volume and query frequency. The event-driven design ensures data is immediately available for analysis without complex ETL scheduling, while the shared configuration approach makes the entire solution extensible for additional data sources and entities.&lt;/p&gt;

&lt;p&gt;While the initial Power BI connectivity setup requires some networking and driver configuration effort, these are one-time investments that unlock significant long-term value. The slight query latency trade-off compared to traditional databases is typically negligible for most business intelligence use cases, especially when weighed against the dramatic reduction in infrastructure costs and complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaway&lt;/strong&gt;: Before defaulting to database solutions for analytics workloads, consider whether your use case actually requires relational database features like transactions, foreign keys, or complex joins. If you're simply storing structured data for reporting and visualization, this S3-native approach can deliver the same business outcomes with fraction of the cost and complexity.&lt;/p&gt;

&lt;p&gt;The complete implementation, including Infrastructure as Code templates and Lambda functions, provides a foundation that teams can fork, customize, and extend for their specific data analytics needs. As cloud-native architectures continue to mature, patterns like this represent the future of cost-effective, scalable data engineering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ready to implement this solution?&lt;/strong&gt; Start with the shared configuration design, analyze your existing data patterns, and begin building your serverless analytics pipeline today.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>architecture</category>
      <category>analytics</category>
      <category>aws</category>
    </item>
    <item>
      <title>Building a Secure SFTP Server on a Linode Public Subnet</title>
      <dc:creator>JOOJO DONTOH</dc:creator>
      <pubDate>Fri, 05 Sep 2025 07:04:52 +0000</pubDate>
      <link>https://dev.to/joojodontoh/building-a-secure-sftp-server-on-a-linode-public-subnet-3b0j</link>
      <guid>https://dev.to/joojodontoh/building-a-secure-sftp-server-on-a-linode-public-subnet-3b0j</guid>
      <description>&lt;p&gt;In the previous post, I walked through setting up a bare-ish-metal cloud environment with Linode, partitioning public and private subnets, and wiring up your own proxies, and firewalls — without handing everything off to someone else. If you haven't read it, &lt;a href="https://dev.to/joojodontoh/reclaiming-engineering-ownership-a-hands-on-guide-to-bare-metal-cloud-1b8f"&gt;I suggest you do!&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Today, I intend to go one level deeper:&lt;br&gt;
For basic learning purposes, let’s build a &lt;strong&gt;secure SFTP server&lt;/strong&gt; from scratch using the node in our &lt;strong&gt;public subnet&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Why should you do this you ask?&lt;br&gt;
Because file transfer is a foundational primitive in ops, and there’s no reason to let that knowledge slip through the cracks.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why Build Your Own SFTP Server?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;You &lt;strong&gt;don’t need to rely on a SaaS&lt;/strong&gt; or managed service (lots of manual ops btw)&lt;/li&gt;
&lt;li&gt;You &lt;strong&gt;control access, retention, and isolation&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;You &lt;strong&gt;understand how file access and security&lt;/strong&gt; actually work under the hood&lt;/li&gt;
&lt;li&gt;You can &lt;strong&gt;build automations and workflows&lt;/strong&gt; around it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is especially useful when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You’re collaborating with partners who need to send you files securely&lt;/li&gt;
&lt;li&gt;You want to ship logs, reports, or ETL inputs into your infra&lt;/li&gt;
&lt;li&gt;You’re learning how SSH, chroot jails, and Linux permissions actually work&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A provisioned &lt;strong&gt;Linode instance in your public subnet&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;public IP address&lt;/strong&gt; and port &lt;code&gt;22&lt;/code&gt; open to trusted IPs&lt;/li&gt;
&lt;li&gt;Basic Linux CLI knowledge&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I handled the first 2 points in &lt;a href="https://dev.to/joojodontoh/reclaiming-engineering-ownership-a-hands-on-guide-to-bare-metal-cloud-1b8f"&gt;this article&lt;/a&gt; &lt;br&gt;
We'll be using &lt;strong&gt;Ubuntu 22.04 LTS&lt;/strong&gt;, but this works on most distros with &lt;code&gt;openssh-server&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Step-by-Step Setup
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Install OpenSSH
&lt;/h3&gt;

&lt;p&gt;Make sure SSH is installed and running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;openssh-server &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Create an SFTP-Only User
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;groupadd sftpusers

&lt;span class="nb"&gt;sudo &lt;/span&gt;useradd &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="nt"&gt;-G&lt;/span&gt; sftpusers &lt;span class="nt"&gt;-s&lt;/span&gt; /sbin/nologin sftpuser1
&lt;span class="nb"&gt;sudo &lt;/span&gt;passwd sftpuser1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents shell access and groups users logically.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Create a Secure Directory Structure
&lt;/h3&gt;

&lt;p&gt;OpenSSH’s &lt;code&gt;ChrootDirectory&lt;/code&gt; &lt;strong&gt;requires that the parent dir is owned by root and not writable&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /sftp/sftpuser1/upload
&lt;span class="nb"&gt;sudo chown &lt;/span&gt;root:root /sftp/sftpuser1
&lt;span class="nb"&gt;sudo chmod &lt;/span&gt;755 /sftp/sftpuser1

&lt;span class="nb"&gt;sudo chown &lt;/span&gt;sftpuser1:sftpusers /sftp/sftpuser1/upload
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a writable &lt;code&gt;/upload&lt;/code&gt; directory while keeping the jail secure.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Configure &lt;code&gt;sshd_config&lt;/code&gt; for SFTP Jail
&lt;/h3&gt;

&lt;p&gt;Append this block to the bottom of &lt;code&gt;/etc/ssh/sshd_config&lt;/code&gt; (&lt;code&gt;sudo nano /etc/ssh/sshd_config&lt;/code&gt; to open):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;Match&lt;/span&gt; &lt;span class="n"&gt;Group&lt;/span&gt; &lt;span class="n"&gt;sftpusers&lt;/span&gt;
  &lt;span class="n"&gt;ChrootDirectory&lt;/span&gt; /&lt;span class="n"&gt;sftp&lt;/span&gt;/%&lt;span class="n"&gt;u&lt;/span&gt;
  &lt;span class="n"&gt;ForceCommand&lt;/span&gt; &lt;span class="n"&gt;internal&lt;/span&gt;-&lt;span class="n"&gt;sftp&lt;/span&gt;
  &lt;span class="n"&gt;X11Forwarding&lt;/span&gt; &lt;span class="n"&gt;no&lt;/span&gt;
  &lt;span class="n"&gt;AllowTcpForwarding&lt;/span&gt; &lt;span class="n"&gt;no&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then reload SSH:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart ssh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Use SSH Key Authentication
&lt;/h3&gt;

&lt;p&gt;On your &lt;strong&gt;local machine&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh-keygen &lt;span class="nt"&gt;-t&lt;/span&gt; rsa &lt;span class="nt"&gt;-b&lt;/span&gt; 4096 &lt;span class="nt"&gt;-f&lt;/span&gt; ~/.ssh/sftpuser1_key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On your &lt;strong&gt;server&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /home/sftpuser1/.ssh
&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /home/sftpuser1/.ssh/authorized_keys
&lt;span class="c"&gt;# Paste public key here&lt;/span&gt;

&lt;span class="nb"&gt;sudo chown&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; sftpuser1:sftpusers /home/sftpuser1/.ssh
&lt;span class="nb"&gt;sudo chmod &lt;/span&gt;700 /home/sftpuser1/.ssh
&lt;span class="nb"&gt;sudo chmod &lt;/span&gt;600 /home/sftpuser1/.ssh/authorized_keys
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can now disable password login if you wish.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Secure the Server
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Open port &lt;strong&gt;22&lt;/strong&gt; to only &lt;strong&gt;your office or VPN IP&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Install &lt;strong&gt;fail2ban&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;fail2ban
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Consider using &lt;strong&gt;logrotate&lt;/strong&gt; and basic audit logging&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. Test the Setup
&lt;/h3&gt;

&lt;p&gt;From your local terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sftp &lt;span class="nt"&gt;-i&lt;/span&gt; ~/.ssh/sftpuser1_key sftpuser1@&amp;lt;your-linode-ip&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd /upload
put testfile.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Tip: SFTP isn’t a shell. You can’t run &lt;code&gt;cat&lt;/code&gt; or &lt;code&gt;echo&lt;/code&gt; — just &lt;code&gt;put&lt;/code&gt;, &lt;code&gt;get&lt;/code&gt;, &lt;code&gt;ls&lt;/code&gt;, etc.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why Public Subnet?
&lt;/h2&gt;

&lt;p&gt;Because this server needs to be accessed &lt;strong&gt;from the internet&lt;/strong&gt;. If it were in a private subnet, you’d need a bastion or VPN to reach it — useful for internal automation, but not external sharing.&lt;/p&gt;

&lt;p&gt;Just like with your previous setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;public subnet gives controlled external access&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Security is enforced via &lt;strong&gt;firewall + SSH key access&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Lessons Reinforced
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Chroot directories &lt;strong&gt;must be owned by root&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;SFTP can be &lt;strong&gt;a secure alternative to email attachments or third-party tools&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;You can still &lt;strong&gt;own your file flows&lt;/strong&gt; in a modern, cloud-native way&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;For future improvements and personal learning growth:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automate uploads from other services or cron jobs&lt;/li&gt;
&lt;li&gt;Pipe incoming files into a processing queue (e.g., via inotify or systemd)&lt;/li&gt;
&lt;li&gt;Back up uploaded files to S3&lt;/li&gt;
&lt;li&gt;Add a DNS record if you want: &lt;code&gt;sftp.yourdomain.com&lt;/code&gt; → your Linode IP&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Owning your infra doesn't mean reinventing everything — it means understanding the tradeoffs and being able to build what you need, when you need it. This is one more building block toward that confidence.&lt;/p&gt;

&lt;p&gt;You’ve got this. Nothing is impossible&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Reclaiming Engineering Ownership: A Hands-On Guide to Bare-ish-Metal Cloud</title>
      <dc:creator>JOOJO DONTOH</dc:creator>
      <pubDate>Fri, 25 Jul 2025 06:13:05 +0000</pubDate>
      <link>https://dev.to/joojodontoh/reclaiming-engineering-ownership-a-hands-on-guide-to-bare-metal-cloud-1b8f</link>
      <guid>https://dev.to/joojodontoh/reclaiming-engineering-ownership-a-hands-on-guide-to-bare-metal-cloud-1b8f</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;I decided to write this after diving into the ongoing conversation around &lt;em&gt;learned helplessness&lt;/em&gt; in software engineering. This is something David Heinemeier Hansson (creator of Ruby on Rails) has been very vocal about especially on his twitter. His points may resonate with you depending on your business needs and the layers of abstraction you are willing to take on. I've seen so many companies rack up huge cloud bills, and that can easily convince smaller teams that they need to do the same to be “serious.” But a lot of this complexity is sold to us by vendors whose business depends on making things look harder than they really are, A.K.A "Merchants of complexity"&lt;/p&gt;

&lt;p&gt;Learned helplessness, in this context, happens when engineering teams slowly lose the ability, or even the confidence to manage and understand their own infrastructure. Over time, everything becomes someone else’s service: databases, hosting, even cron jobs. And when that happens, teams risk losing technical depth, the ability to troubleshoot under pressure, and even the curiosity that drives real innovation.&lt;/p&gt;

&lt;p&gt;The truth is, setting up your own infrastructure isn’t always as hard or as costly as it seems. Sometimes, going hands-on—provisioning your own servers, configuring your own network—can teach you more and cost you less. This article walks through a high-level, hands-on setup of a simple app using IaaS-level tools (like VPSs, subnets, and Apache proxies), not because it’s the “right” way for every project, but because understanding the layers &lt;em&gt;beneath&lt;/em&gt; the abstraction gives you real control—and that’s a power every engineer should have.&lt;/p&gt;

&lt;p&gt;All of this has led me to revisit the foundational layers of cloud infrastructure, not to throw shade at modern abstractions, but to get a clearer picture of what they’re built on. In this article, I’ll walk through a high-level setup of a simple, non-production-ready web app. This focuses purely on the purpose of learning and technical understanding. It’s a hands-on journey that starts at the Infrastructure-as-a-Service (IaaS) layer, the lowest abstraction tier in cloud computing (the others being PaaS and SaaS).&lt;/p&gt;

&lt;p&gt;Most of us have used cloud platforms in some form, whether it’s deploying serverless functions like AWS Lambda or Firebase Cloud Functions, or using tools like Heroku or Vercel that abstract away orchestration entirely. But beneath all that convenience lies real, raw infrastructure: virtual machines, subnets, proxies, and firewalls. This article is a small tutorial that dives into exactly that. I’ll also drop “nuggets” throughout. Basically pointers to deepen your understanding if you’re curious to dig further at any point in the process.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources Used for This Exploration
&lt;/h2&gt;

&lt;p&gt;To keep things practical and grounded, I built a small full-stack notes application that serves as the foundation for the tutorial. The frontend is a &lt;a href="https://github.com/Joojo7/notes-app-frontend" rel="noopener noreferrer"&gt;Next.js&lt;/a&gt; application, responsible for rendering UI and communicating with the backend via API calls. The backend is a lightweight &lt;a href="https://github.com/Joojo7/notes-app-backend" rel="noopener noreferrer"&gt;Node.js app built with Koa&lt;/a&gt;, handling user authentication and CRUD operations for notes. For data storage, I used a PostgreSQL database containerized and hosted within the private subnet. Everything runs on Akamai’s cloud infrastructure—specifically their VPC and Linode offerings—which provide just enough control to explore low-level networking, subnetting, and proxy setups, without overwhelming complexity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1562zsti44t57d2upklb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1562zsti44t57d2upklb.png" alt="backend" width="800" height="714"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj573uam88hvjxogwm2s9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj573uam88hvjxogwm2s9.png" alt="frontend" width="800" height="692"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Provisioning a VPC and Partitioning Your Network
&lt;/h2&gt;

&lt;p&gt;The first step is to rent a VPS from a provider that gives you fine-grained control over networking—options include DigitalOcean, Linode, and AWS EC2. For this project, I chose &lt;strong&gt;Akamai’s Linode platform&lt;/strong&gt;, which allowed me to create a Virtual Private Cloud (VPC) and define custom subnets. I partitioned the network into two subnets: a &lt;strong&gt;public subnet&lt;/strong&gt; that can access the internet (ideal for hosting the frontend), and a &lt;strong&gt;private subnet&lt;/strong&gt; that has no direct internet access (reserved for backend services and the database). When creating your subnets, you’ll need to allocate CIDR blocks to define the IP ranges. For example, the public subnet could use &lt;code&gt;10.0.1.0/24&lt;/code&gt;, while the private subnet could use &lt;code&gt;10.0.2.0/24&lt;/code&gt;. These ranges should be chosen with future growth and IP efficiency in mind. A good practice is to size your subnets based on how many nodes or services you expect to scale into each zone.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjmgnpu9jmkwgg19p6inn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjmgnpu9jmkwgg19p6inn.png" alt=" " width="800" height="727"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;em&gt;Nugget: Take some time to explore how CIDR blocks work, how IP addresses are distributed, and why certain ranges are considered private. It’s a foundational concept for understanding modern networking.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Creating Firewalls to Enforce Subnet Isolation
&lt;/h2&gt;

&lt;p&gt;With your subnets in place, the next step is to &lt;strong&gt;enforce network boundaries&lt;/strong&gt; using firewall rules. Firewalls allow you to control which traffic is allowed to enter or leave a node based on IP ranges, ports, and protocols. For this tutorial, we’ll design our firewall to &lt;strong&gt;completely isolate the private subnet from the internet&lt;/strong&gt;, while exposing only the necessary ports in the public subnet. Let’s break this down into &lt;strong&gt;inbound&lt;/strong&gt; and &lt;strong&gt;outbound&lt;/strong&gt; rules.&lt;/p&gt;

&lt;h4&gt;
  
  
  Inbound Rules
&lt;/h4&gt;

&lt;p&gt;Inbound rules govern what kind of traffic is allowed &lt;em&gt;into&lt;/em&gt; your nodes.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Default Deny:&lt;/strong&gt;&lt;br&gt;
By default, deny all inbound traffic. Only allow what’s explicitly needed. This is the safest baseline.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ICMP for Testing (Optional):&lt;/strong&gt;&lt;br&gt;
You may want to temporarily allow ICMP (ping) traffic to help debug connectivity.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Protocol: ICMP&lt;/li&gt;
&lt;li&gt;Source: &lt;code&gt;0.0.0.0/0&lt;/code&gt; (or your own IP)&lt;/li&gt;
&lt;li&gt;Action: Accept&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Public Subnet — Web Access (Frontend App):&lt;/strong&gt;
Your public-facing frontend must be reachable from the internet.&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Ports: &lt;code&gt;80&lt;/code&gt; (HTTP), &lt;code&gt;443&lt;/code&gt; (HTTPS)&lt;/li&gt;
&lt;li&gt;Protocol: TCP&lt;/li&gt;
&lt;li&gt;Source: &lt;code&gt;0.0.0.0/0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Action: Accept&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Private to Public — Forward Proxy Access:&lt;/strong&gt;
To allow your private nodes (backend/DB) to make &lt;strong&gt;outbound&lt;/strong&gt; requests via the &lt;strong&gt;public proxy&lt;/strong&gt;, you must enable &lt;em&gt;inbound&lt;/em&gt; access on the proxy port in the public node.&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Port: &lt;code&gt;8080&lt;/code&gt; (or your chosen proxy port)&lt;/li&gt;
&lt;li&gt;Protocol: TCP&lt;/li&gt;
&lt;li&gt;Source: &lt;code&gt;10.0.2.0/24&lt;/code&gt; (Private subnet IP range)&lt;/li&gt;
&lt;li&gt;Action: Accept&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Private Subnet — Internal Communication (Backend ↔ DB):&lt;/strong&gt;
Backend services in the private subnet need to talk to each other, especially to your database node.&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Port: e.g., &lt;code&gt;5432&lt;/code&gt; (PostgreSQL)&lt;/li&gt;
&lt;li&gt;Protocol: TCP&lt;/li&gt;
&lt;li&gt;Source: &lt;code&gt;10.0.2.0/24&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Action: Accept&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;SSH Access for Maintenance:&lt;/strong&gt;
You’ll want to be able to SSH into your Linodes for debugging or setup.&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Port: &lt;code&gt;22&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Protocol: TCP&lt;/li&gt;
&lt;li&gt;Source: Your public IP (or &lt;code&gt;0.0.0.0/0&lt;/code&gt; if unrestricted, though this is not recommended)&lt;/li&gt;
&lt;li&gt;Action: Accept&lt;/li&gt;
&lt;li&gt;💡 &lt;em&gt;Tip: For security, restrict this to your personal IP only.&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;em&gt;Nugget: SSH uses asymmetric cryptography—your private key remains on your machine, while the public key is added to the server’s &lt;code&gt;~/.ssh/authorized_keys&lt;/code&gt;. Understanding this is essential when managing key-based access securely.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Outbound Rules
&lt;/h4&gt;

&lt;p&gt;Outbound rules control what kind of traffic your nodes are allowed to initiate.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;ICMP for Testing (Optional):&lt;/strong&gt;
Allow outbound ping (ICMP) for basic connectivity tests.&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Protocol: ICMP&lt;/li&gt;
&lt;li&gt;Destination: &lt;code&gt;0.0.0.0/0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Action: Accept&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Internet Access (Public Subnet Only):&lt;/strong&gt;
Allow HTTP and HTTPS requests from the public subnet.&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Ports: &lt;code&gt;80&lt;/code&gt;, &lt;code&gt;443&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Protocol: TCP&lt;/li&gt;
&lt;li&gt;Destination: &lt;code&gt;0.0.0.0/0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Action: Accept&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Private Subnet via Proxy (Handled Later):&lt;/strong&gt;
The private subnet won’t have direct internet access. Instead, outbound requests will go through the forward proxy configured on the public node. We’ll configure this in a later section.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This firewall setup ensures that your &lt;strong&gt;private services are protected&lt;/strong&gt;, your &lt;strong&gt;frontend is accessible&lt;/strong&gt;, and your &lt;strong&gt;infrastructure remains tightly controlled&lt;/strong&gt;. Always test your rules incrementally—misconfigurations are common but easily fixed if introduced step-by-step.&lt;/p&gt;

&lt;p&gt;An example of inbound rules:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Source (IPv4)&lt;/th&gt;
&lt;th&gt;Port&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ACCEPT&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.0.0.0/0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;Public HTTP access (frontend)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ACCEPT&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.0.0.0/0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;3000&lt;/td&gt;
&lt;td&gt;(Optional) direct access to frontend’s internal port (e.g. for testing or bypassing Apache)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ACCEPT&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.0.0.0/0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;443&lt;/td&gt;
&lt;td&gt;Public HTTPS access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ACCEPT&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.0.0.0/0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;ICMP for ping/testing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ACCEPT&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.0.0.0/0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;SSH access (can be restricted to your personal IP)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ACCEPT&lt;/td&gt;
&lt;td&gt;&lt;code&gt;10.0.1.2/32&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8000&lt;/td&gt;
&lt;td&gt;Backend service from public proxy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ACCEPT&lt;/td&gt;
&lt;td&gt;&lt;code&gt;10.0.2.0/24&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8080&lt;/td&gt;
&lt;td&gt;Proxy communication from private subnet to public proxy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ACCEPT&lt;/td&gt;
&lt;td&gt;&lt;code&gt;10.0.2.0/24&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;5432&lt;/td&gt;
&lt;td&gt;PostgreSQL access for backend services in the same subnet&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Example of Outbound rules:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;Destination (IPv4)&lt;/th&gt;
&lt;th&gt;Port&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ACCEPT&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.0.0.0/0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;443&lt;/td&gt;
&lt;td&gt;HTTPS (package installs, certs, APIs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ACCEPT&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.0.0.0/0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;td&gt;HTTP (package installs, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ACCEPT&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0.0.0.0/0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;ICMP (ping)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Setting Up Your First Linode (Public Subnet)
&lt;/h2&gt;

&lt;p&gt;With your VPC and firewall rules in place, it's time to spin up your actual infrastructure nodes—starting with a public-facing Linode. This Linode will act as the gateway to your application, serving your frontend (and optionally proxying to your backend), and it’s where we’ll verify that your network and firewall setup is working correctly.&lt;/p&gt;

&lt;p&gt;To begin, provision a low-cost Linode (around \$5/month at the time of writing) and &lt;strong&gt;assign it to your public subnet&lt;/strong&gt;. Make sure to also &lt;strong&gt;attach the firewall&lt;/strong&gt; you previously configured, so that all the carefully crafted rules now apply to this instance.&lt;/p&gt;

&lt;p&gt;Once deployed, you’ll need to access the machine. You can do this in two ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SSH into the Linode&lt;/strong&gt; using your terminal and the public IP.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use LISH (Linode Shell)&lt;/strong&gt; from the Akamai console if you don’t have SSH access yet.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Inside your Linode, perform a few critical connectivity tests to ensure your networking is correctly configured:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Test Internet Access&lt;/strong&gt;
Run a simple ping to Google’s DNS to verify outbound access is allowed by your firewall:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   ping 8.8.8.8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Update the Package Index&lt;/strong&gt;
This verifies that outbound HTTPS is working and you can install packages:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If both commands succeed, your firewall and public subnet are configured correctly. You now have a functioning public node, fully capable of installing software, serving applications, and acting as a forward or reverse proxy for your private subnet. This will serve as the entry point to your application infrastructure as we move forward.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up a Private Linode (Backend Node)
&lt;/h2&gt;

&lt;p&gt;Next, provision your &lt;strong&gt;private Linode&lt;/strong&gt;, which will host your backend application. Just like the public node, this Linode is also very affordable (around \$5/month), but unlike the public node, this one will &lt;strong&gt;not&lt;/strong&gt; be assigned a public IP address—&lt;strong&gt;ensuring it has no direct access to or from the internet&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When creating this Linode:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Assign it to the private subnet&lt;/strong&gt; you created earlier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do not assign a public IP address&lt;/strong&gt;. This isolation is intentional: your backend should only talk to the frontend and the database, not the public internet.&lt;/li&gt;
&lt;li&gt;Ensure your &lt;strong&gt;firewall rules&lt;/strong&gt; allow this node to communicate with:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;The public node (for outbound access via proxy)&lt;/li&gt;
&lt;li&gt;Other private nodes like the DB server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since the private node will not have internet access, you’ll configure a &lt;strong&gt;forward proxy&lt;/strong&gt; later in the article—hosted on the public Linode—to help it install packages or make outbound HTTP requests securely and indirectly.&lt;/p&gt;

&lt;p&gt;In addition to forward proxying, you’ll also need a &lt;strong&gt;reverse proxy&lt;/strong&gt;, typically using Apache, to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Route external requests to the appropriate internal services (e.g., &lt;code&gt;/api&lt;/code&gt; to the backend)&lt;/li&gt;
&lt;li&gt;Handle SSL termination and clean URL routing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To configure Apache as a reverse proxy, you’ll later modify the default site config file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/apache2/sites-available/000-default.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or better yet, create your own virtual host file to keep things modular and clear.&lt;/p&gt;

&lt;h4&gt;
  
  
  Buy and Configure a Domain
&lt;/h4&gt;

&lt;p&gt;To make your app feel more “real” and not just accessible by an IP, you should purchase a cheap domain (e.g., from Namecheap or Google Domains). Once you own a domain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Create an A Record&lt;/strong&gt; that maps your domain (e.g., &lt;code&gt;notes.online&lt;/code&gt;) to the &lt;strong&gt;public IP of your frontend Linode&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Once DNS propagation completes, install a free HTTPS certificate via Let’s Encrypt by running:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;certbot &lt;span class="nt"&gt;--apache&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; yourdomain.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;If your DNS records are correctly set up, this command will:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Validate domain ownership&lt;/li&gt;
&lt;li&gt;Automatically install an HTTPS cert&lt;/li&gt;
&lt;li&gt;Store it in &lt;code&gt;/etc/letsencrypt/live/yourdomain.com&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup ensures that your frontend can be accessed securely via your custom domain, and that traffic can be reverse-proxied to your backend securely—all while maintaining strict network isolation between tiers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up a Private Linode for the Database (PostgreSQL)
&lt;/h2&gt;

&lt;p&gt;The final component of your infrastructure setup is the &lt;strong&gt;database node&lt;/strong&gt;, which will also reside entirely in your &lt;strong&gt;private subnet&lt;/strong&gt;. This ensures that your data is not exposed to the public internet and can only be accessed by other internal services—specifically, your backend application.&lt;/p&gt;

&lt;p&gt;Here’s how to set it up:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Provision a new Linode&lt;/strong&gt; (again, a \$5/month plan will suffice).&lt;/li&gt;
&lt;li&gt;Assign this Linode to the &lt;strong&gt;private subnet&lt;/strong&gt;, just like your backend node.&lt;/li&gt;
&lt;li&gt;Apply the same &lt;strong&gt;firewall&lt;/strong&gt; to this node so that only traffic from the &lt;strong&gt;private subnet&lt;/strong&gt;—especially your backend—can reach it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Do not assign a public IP&lt;/strong&gt;. Your DB should never be exposed to the internet.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Once your Linode is up and running, SSH into it using LISH (if it doesn't have internet access) or set up a jump box via your public Linode, and install PostgreSQL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;postgresql postgresql-contrib &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After installation, configure PostgreSQL to &lt;strong&gt;accept connections over the private subnet&lt;/strong&gt;:&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 1: Update &lt;code&gt;postgresql.conf&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;This file controls PostgreSQL’s runtime behavior. You need to allow it to listen for connections beyond &lt;code&gt;localhost&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/postgresql/14/main/postgresql.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Find the line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="c"&gt;#listen_addresses = 'localhost'
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;listen_addresses&lt;/span&gt; = &lt;span class="s1"&gt;'*'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This tells PostgreSQL to listen on all network interfaces—including the private IP assigned by the subnet.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 2: Update &lt;code&gt;pg_hba.conf&lt;/code&gt;
&lt;/h4&gt;

&lt;p&gt;This file defines &lt;strong&gt;who&lt;/strong&gt; can connect, &lt;strong&gt;from where&lt;/strong&gt;, and &lt;strong&gt;how&lt;/strong&gt; they authenticate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/postgresql/14/main/pg_hba.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At the bottom, add a rule that allows incoming connections &lt;strong&gt;from the private subnet&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;host    all             all             10.0.2.0/24            md5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means: &lt;em&gt;allow all users to connect to all databases from any machine in the private subnet using password (md5) authentication&lt;/em&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Nugget:&lt;/strong&gt; Spend time reading about how &lt;code&gt;postgresql.conf&lt;/code&gt; and &lt;code&gt;pg_hba.conf&lt;/code&gt; interact. They are the gatekeepers of your DB’s network exposure and authentication model.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Step 3: Restart PostgreSQL
&lt;/h4&gt;

&lt;p&gt;Apply your configuration changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart postgresql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your PostgreSQL instance is now fully isolated within the private subnet and only reachable by other nodes in the same subnet—specifically your backend node. You’ve effectively recreated a secure, cloud-style VPC networking setup, but on your own terms, for a fraction of the cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enhance Communication Between Nodes and the Internet
&lt;/h2&gt;

&lt;p&gt;In previous sections, we intentionally designed the infrastructure so that &lt;strong&gt;nodes in the private subnet do not have direct access to the internet&lt;/strong&gt;. This is a common and recommended security posture—but it introduces a challenge: how do backend services fetch updates, install packages, or interact with external APIs?&lt;/p&gt;

&lt;p&gt;The solution is to introduce a &lt;strong&gt;forward proxy&lt;/strong&gt; in the public subnet. This allows private nodes to &lt;strong&gt;send outbound traffic via a trusted middleman&lt;/strong&gt; (the public node), without exposing themselves directly to the internet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Not Use Basic NAT?
&lt;/h3&gt;

&lt;p&gt;While it's tempting to set up a &lt;strong&gt;1:1 NAT (Basic NAT)&lt;/strong&gt; for simplicity, this approach bypasses the layered security model we’re trying to build. It essentially grants your private nodes direct exposure, undermining the purpose of subnet separation.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🔗 This is the official documentation of the &lt;a href="https://techdocs.akamai.com/cloud-computing/docs/forward-proxy-for-vpc#forward-proxy" rel="noopener noreferrer"&gt;Akamai Forward Proxy&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Set Up a Forward Proxy with Apache (on the Public Node)
&lt;/h3&gt;

&lt;p&gt;Let’s walk through setting up a proper &lt;strong&gt;forward proxy&lt;/strong&gt; using Apache.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 1: Access the Public Linode
&lt;/h4&gt;

&lt;p&gt;SSH into your public-facing Linode (in the public subnet) or use the LISH console via Akamai’s dashboard.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh root@&amp;lt;your-public-linode-ip&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 2: Install &amp;amp; Prepare Apache
&lt;/h4&gt;

&lt;p&gt;Update your packages and ensure Apache is installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;apache2 &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Enable necessary Apache modules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;a2enmod proxy proxy_http proxy_connect
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 3: Create a Forward Proxy Configuration
&lt;/h4&gt;

&lt;p&gt;Open a new config file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/apache2/sites-available/fwd-proxy.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paste the following configuration (adjust IPs as needed):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight apache"&gt;&lt;code&gt;&lt;span class="c"&gt;# Listen on the internal IP (public node) at port 8080.&lt;/span&gt;
&lt;span class="c"&gt;# This sets up the Apache server to accept proxy requests from the private subnet via port 8080.&lt;/span&gt;
&lt;span class="nc"&gt;Listen&lt;/span&gt; 10.0.2.2:8080

&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nl"&gt;VirtualHost&lt;/span&gt;&lt;span class="sr"&gt; *:8080&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&lt;/span&gt;    &lt;span class="c"&gt;# Admin email for server issues (not strictly required unless you're sending error reports).&lt;/span&gt;
    &lt;span class="nc"&gt;ServerAdmin&lt;/span&gt; webmaster@localhost

    &lt;span class="c"&gt;# Root directory for served files (not used in proxying but required syntactically).&lt;/span&gt;
    &lt;span class="nc"&gt;DocumentRoot&lt;/span&gt; /var/www/html

    &lt;span class="c"&gt;# Log errors from proxy activity here (useful for debugging)&lt;/span&gt;
    &lt;span class="nc"&gt;ErrorLog&lt;/span&gt; ${APACHE_LOG_DIR}/fwd-proxy-error.log

    &lt;span class="c"&gt;# Log all access through the proxy&lt;/span&gt;
    &lt;span class="nc"&gt;CustomLog&lt;/span&gt; ${APACHE_LOG_DIR}/fwd-proxy-access.log combined

    &lt;span class="c"&gt;# Enable forward proxy mode (Apache acts as a middleman for outbound traffic)&lt;/span&gt;
    &lt;span class="nc"&gt;ProxyRequests&lt;/span&gt; &lt;span class="ss"&gt;On&lt;/span&gt;

    &lt;span class="c"&gt;# Adds headers like Via: to show request went through a proxy (useful for tracing)&lt;/span&gt;
    &lt;span class="nc"&gt;ProxyVia&lt;/span&gt; &lt;span class="ss"&gt;On&lt;/span&gt;

    &lt;span class="c"&gt;# Restrict proxy access to only IPs from the private subnet&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nl"&gt;Proxy&lt;/span&gt;&lt;span class="sr"&gt; "*"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&lt;/span&gt;        &lt;span class="nc"&gt;Require&lt;/span&gt; ip 10.0.2.0/24
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nl"&gt;Proxy&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&amp;lt;/&lt;/span&gt;&lt;span class="nl"&gt;VirtualHost&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Nugget:&lt;/strong&gt; The &lt;code&gt;ProxyRequests On&lt;/code&gt; directive enables forward proxying. The &lt;code&gt;&amp;lt;Proxy "*"&amp;gt;&lt;/code&gt; block restricts usage of this proxy to requests originating from your private subnet only.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Save and close the file.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 4: Enable the Proxy Site and Restart Apache
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo chown &lt;/span&gt;root:root /etc/apache2/sites-available/fwd-proxy.conf
&lt;span class="nb"&gt;sudo chmod &lt;/span&gt;0644 /etc/apache2/sites-available/fwd-proxy.conf
&lt;span class="nb"&gt;sudo &lt;/span&gt;a2ensite fwd-proxy
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart apache2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test the Proxy
&lt;/h3&gt;

&lt;p&gt;From a &lt;strong&gt;private Linode&lt;/strong&gt;, you can now route outbound traffic through the proxy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-x&lt;/span&gt; http://10.0.2.2:8080 https://example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If successful, your private node is now securely communicating with the internet &lt;strong&gt;through your public node&lt;/strong&gt;—without needing a public IP of its own.&lt;/p&gt;

&lt;p&gt;This setup retains your network isolation while still enabling secure, auditable internet access.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enhance Communication Between Nodes and the Internet (Part 2: Private Nodes)
&lt;/h3&gt;

&lt;p&gt;Now that your forward proxy is configured and running on the public Linode, it’s time to set up your private nodes—specifically the backend and database Linodes—to route their outbound internet traffic through this proxy.&lt;/p&gt;

&lt;h4&gt;
  
  
  Backend Private Node Configuration
&lt;/h4&gt;

&lt;p&gt;On your &lt;strong&gt;backend node&lt;/strong&gt; in the private subnet (which should not have direct access to the internet), you’ll need to explicitly configure it to use the forward proxy you previously set up on the public Linode (e.g., &lt;code&gt;10.0.1.2:8080&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Start by configuring the proxy settings for &lt;code&gt;apt&lt;/code&gt;, so you can perform package installations via the proxy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'Acquire::http::proxy "http://10.0.2.2:8080";'&lt;/span&gt; | &lt;span class="nb"&gt;sudo tee&lt;/span&gt; /etc/apt/apt.conf.d/proxy.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once done, test it using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also test general HTTP traffic routing through the proxy with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;--proxy&lt;/span&gt; 10.0.2.2:8080 http://example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If both work as expected, proceed to &lt;strong&gt;route all HTTP/HTTPS traffic system-wide&lt;/strong&gt; through the proxy. This ensures that any application or system utility that needs external access will use the proxy automatically.&lt;/p&gt;

&lt;p&gt;Edit the &lt;code&gt;/etc/environment&lt;/code&gt; file to export proxy variables globally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/environment
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;http_proxy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"http://10.0.1.2:8080"&lt;/span&gt;
&lt;span class="nv"&gt;https_proxy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"http://10.0.1.2:8080"&lt;/span&gt;
&lt;span class="nv"&gt;no_proxy&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"localhost,127.0.0.1,::1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These environment variables will persist across sessions and reboots. For them to take full effect, reboot the node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;reboot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Setup Applications and Storage
&lt;/h3&gt;

&lt;p&gt;With your networking, firewall, and proxy configuration complete, the next step is to deploy your applications on the respective nodes. This section guides you through cloning, configuring, and starting both the frontend and backend apps used in this tutorial, along with their storage setup.&lt;/p&gt;

&lt;h4&gt;
  
  
  Public Linode – Frontend Application
&lt;/h4&gt;

&lt;p&gt;Your &lt;strong&gt;public Linode&lt;/strong&gt; is where the frontend (Next.js) application will live, and it’s accessible to the outside world via your domain and Apache reverse proxy.&lt;/p&gt;

&lt;p&gt;Follow these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run an update on your packages:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Install Git:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Clone the frontend repository made specifically for this tutorial:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   git clone https://github.com/Joojo7/notes-app-frontend
   &lt;span class="nb"&gt;cd &lt;/span&gt;notes-app-frontend
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Install the required Node.js dependencies:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There’s no need to over-engineer this with a CI/CD pipeline, since it’s a one-off learning project. You can start the app directly or with a tool like &lt;code&gt;pm2&lt;/code&gt; if you want to keep it alive in the background.&lt;/p&gt;

&lt;h4&gt;
  
  
  Private Linode – Backend Application
&lt;/h4&gt;

&lt;p&gt;On your &lt;strong&gt;private backend Linode&lt;/strong&gt;, follow similar steps, with additional backend-specific setup:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Update the system and install Git:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
   &lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install &lt;/span&gt;git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Clone the backend repository:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   git clone https://github.com/Joojo7/notes-app-backend
   &lt;span class="nb"&gt;cd &lt;/span&gt;notes-app-backend
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Ensure you have the following installed:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Node.js (v18 or higher)&lt;/li&gt;
&lt;li&gt;npm&lt;/li&gt;
&lt;li&gt;Docker + Docker Compose&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Create a &lt;code&gt;.env&lt;/code&gt; file at the root of the project. You can copy from &lt;code&gt;.env.example&lt;/code&gt; and customize as needed:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   DB_USER=your_db_user
   DB_PASSWORD=your_db_password
   DB_HOST=localhost
   DB_PORT=5432
   DB_NAME=notes_db
   JWT_SECRET=your_jwt_secret
   JWT_EXPIRATION=15m
   PORT=8000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;To simplify the startup process, a &lt;code&gt;startup.sh&lt;/code&gt; script has been provided. Make it executable and run it:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;chmod&lt;/span&gt; +x startup.sh
   ./startup.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script will handle the Docker Compose setup and the backend service bootstrapping.&lt;/p&gt;

&lt;p&gt;More details are available in the backend repo’s README:&lt;br&gt;
🔗 &lt;a href="https://github.com/Joojo7/notes-app-backend?tab=readme-ov-file" rel="noopener noreferrer"&gt;notes-app-backend GitHub&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Setup Applications and Storage (continued)
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Serve and Test the Application
&lt;/h4&gt;

&lt;p&gt;Once both the frontend and backend are installed and configured, it’s time to serve the application to the internet and test its full flow. The public Linode will act as a reverse proxy, routing requests to the appropriate services via Apache.&lt;/p&gt;
&lt;h5&gt;
  
  
  Reverse Proxy from Domain to Frontend (and Backend API)
&lt;/h5&gt;

&lt;p&gt;To serve your frontend from &lt;code&gt;https://yourdomain.com&lt;/code&gt; without needing to append a &lt;code&gt;:3000&lt;/code&gt; port, we’ll configure &lt;strong&gt;Apache as a reverse proxy&lt;/strong&gt;.&lt;/p&gt;
&lt;h5&gt;
  
  
  Steps to Enable Apache Modules
&lt;/h5&gt;

&lt;p&gt;SSH into your public Linode and enable the necessary Apache modules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;a2enmod proxy
&lt;span class="nb"&gt;sudo &lt;/span&gt;a2enmod proxy_http
&lt;span class="nb"&gt;sudo &lt;/span&gt;a2enmod headers
&lt;span class="nb"&gt;sudo &lt;/span&gt;a2enmod rewrite
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then restart Apache to apply the changes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart apache2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Configure Apache Reverse Proxy for HTTP and HTTPS
&lt;/h5&gt;

&lt;ol&gt;
&lt;li&gt;Create or modify a site configuration file:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/apache2/sites-available/yourdomain.com.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Paste the following configuration:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;HTTP (Port 80)&lt;/strong&gt; – used for initial redirect or Certbot challenge:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight apache"&gt;&lt;code&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nl"&gt;VirtualHost&lt;/span&gt;&lt;span class="sr"&gt; *:80&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&lt;/span&gt;    &lt;span class="nc"&gt;ServerAdmin&lt;/span&gt; webmaster@localhost
    &lt;span class="nc"&gt;DocumentRoot&lt;/span&gt; /var/www/html

    &lt;span class="nc"&gt;ErrorLog&lt;/span&gt; ${APACHE_LOG_DIR}/reverse-proxy-error.log
    &lt;span class="nc"&gt;CustomLog&lt;/span&gt; ${APACHE_LOG_DIR}/reverse-proxy-access.log combined

    &lt;span class="nc"&gt;ProxyRequests&lt;/span&gt; &lt;span class="ss"&gt;Off&lt;/span&gt;

    &lt;span class="c"&gt;# Reverse proxy to backend API&lt;/span&gt;
    &lt;span class="nc"&gt;ProxyPass&lt;/span&gt; /api/notes/ http://10.0.2.2:8000/
    &lt;span class="nc"&gt;ProxyPassReverse&lt;/span&gt; /api/notes/ http://10.0.2.2:8000/

    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nl"&gt;Proxy&lt;/span&gt;&lt;span class="sr"&gt; *&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&lt;/span&gt;        &lt;span class="nc"&gt;Require&lt;/span&gt; ip 10.0.2.0/24
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nl"&gt;Proxy&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&amp;lt;/&lt;/span&gt;&lt;span class="nl"&gt;VirtualHost&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;HTTPS (Port 443)&lt;/strong&gt; – full site access and secure reverse proxy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight apache"&gt;&lt;code&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nl"&gt;IfModule&lt;/span&gt;&lt;span class="sr"&gt; mod_ssl.c&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&amp;lt;&lt;/span&gt;&lt;span class="nl"&gt;VirtualHost&lt;/span&gt;&lt;span class="sr"&gt; *:443&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&lt;/span&gt;    &lt;span class="nc"&gt;ServerAdmin&lt;/span&gt; webmaster@localhost
    &lt;span class="nc"&gt;ServerName&lt;/span&gt; yourdomain.com
    &lt;span class="nc"&gt;DocumentRoot&lt;/span&gt; /var/www/html

    &lt;span class="nc"&gt;ProxyRequests&lt;/span&gt; &lt;span class="ss"&gt;Off&lt;/span&gt;

    &lt;span class="c"&gt;# Reverse proxy to backend API (must come before frontend proxy)&lt;/span&gt;
    &lt;span class="nc"&gt;ProxyPass&lt;/span&gt; /api/ http://10.0.2.2:8000/
    &lt;span class="nc"&gt;ProxyPassReverse&lt;/span&gt; /api/ http://10.0.2.2:8000/

    &lt;span class="c"&gt;# Reverse proxy to Next.js frontend&lt;/span&gt;
    &lt;span class="nc"&gt;ProxyPass&lt;/span&gt; / http://localhost:3000/
    &lt;span class="nc"&gt;ProxyPassReverse&lt;/span&gt; / http://localhost:3000/

    &lt;span class="c"&gt;# Allow Certbot HTTP challenge&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nl"&gt;Location&lt;/span&gt;&lt;span class="sr"&gt; /.well-known/acme-challenge&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&lt;/span&gt;        &lt;span class="nc"&gt;Require&lt;/span&gt; &lt;span class="ss"&gt;all&lt;/span&gt; granted
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nl"&gt;Location&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&lt;/span&gt;
    &lt;span class="c"&gt;# SSL configuration (provided by Certbot)&lt;/span&gt;
    &lt;span class="nc"&gt;SSLEngine&lt;/span&gt; &lt;span class="ss"&gt;on&lt;/span&gt;
    &lt;span class="nc"&gt;SSLCertificateFile&lt;/span&gt; /etc/letsencrypt/live/yourdomain.com/fullchain.pem
    &lt;span class="nc"&gt;SSLCertificateKeyFile&lt;/span&gt; /etc/letsencrypt/live/yourdomain.com/privkey.pem
    &lt;span class="nc"&gt;Include&lt;/span&gt; /etc/letsencrypt/options-ssl-apache.conf

    &lt;span class="c"&gt;# Logging&lt;/span&gt;
    &lt;span class="nc"&gt;ErrorLog&lt;/span&gt; ${APACHE_LOG_DIR}/reverse-proxy-error.log
    &lt;span class="nc"&gt;CustomLog&lt;/span&gt; ${APACHE_LOG_DIR}/reverse-proxy-access.log combined
&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nl"&gt;VirtualHost&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&amp;lt;/&lt;/span&gt;&lt;span class="nl"&gt;IfModule&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  Final Steps
&lt;/h5&gt;

&lt;ol&gt;
&lt;li&gt;Enable the new site config:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;a2ensite yourdomain.com.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Disable the default config (optional but recommended):
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;a2dissite 000-default.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Reload or restart Apache:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl reload apache2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  DNS Setup and Request Flow
&lt;/h5&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Go to your &lt;strong&gt;domain provider&lt;/strong&gt; (e.g., Namecheap, GoDaddy, etc.):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Add an &lt;strong&gt;A record&lt;/strong&gt; that points your domain (e.g., &lt;code&gt;yourdomain.com&lt;/code&gt;) to the &lt;strong&gt;public IP&lt;/strong&gt; of your public Linode.&lt;/li&gt;
&lt;li&gt;Wait for DNS propagation — this may take anywhere from a few minutes to a few hours depending on TTL settings.&lt;/li&gt;
&lt;/ol&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;After DNS is live, the &lt;strong&gt;request flow&lt;/strong&gt; will look like this:&lt;br&gt;&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;

&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client Browser
    ↓
DNS Lookup (yourdomain.com resolves to public IP)
    ↓
Firewall (allows ports 80/443 to Apache)
    ↓
Apache Web Server (reverse proxy)
    ↓
    - /api/ → Private backend service via 10.0.2.2:8000
    - /     → Local frontend served from port 3000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Now visit &lt;code&gt;https://yourdomain.com&lt;/code&gt; — your frontend should load without the port, and API requests to &lt;code&gt;/api/notes&lt;/code&gt; should proxy correctly to the backend on the private node.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5reoypdv44fgibh51jr5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5reoypdv44fgibh51jr5.png" alt=" " width="744" height="517"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ymn44esf3p0ptljmnq1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ymn44esf3p0ptljmnq1.png" alt=" " width="798" height="882"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Future Improvements and Deep Dives to Sharpen Your Understanding
&lt;/h3&gt;

&lt;p&gt;First off—if you’ve made it this far, take a moment to acknowledge what you’ve accomplished. You’ve not only provisioned infrastructure at the IaaS level, but also configured firewalls, private/public subnets, secure proxies, reverse routing, and a full-stack deployment—all from scratch. That’s huge.&lt;/p&gt;

&lt;p&gt;But this journey doesn’t end here. There’s so much more to explore, and none of it is out of reach.&lt;/p&gt;

&lt;h4&gt;
  
  
  Add More Services for Realistic Environments
&lt;/h4&gt;

&lt;p&gt;Now that you’ve successfully deployed a frontend, backend, and database, try introducing &lt;strong&gt;additional Linodes&lt;/strong&gt; to simulate multi-service architectures. For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add a &lt;strong&gt;Redis node&lt;/strong&gt;, a &lt;strong&gt;message queue&lt;/strong&gt;, or a &lt;strong&gt;monitoring service&lt;/strong&gt; like Prometheus or Grafana.&lt;/li&gt;
&lt;li&gt;Observe how service-to-service communication happens over private IPs.&lt;/li&gt;
&lt;li&gt;Practice managing security and performance as your ecosystem grows.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Learn About Load Balancing
&lt;/h4&gt;

&lt;p&gt;Load balancing is a cornerstone of high-availability systems.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Study how &lt;strong&gt;Apache&lt;/strong&gt; or &lt;strong&gt;Nginx&lt;/strong&gt; can distribute requests across multiple backend servers.&lt;/li&gt;
&lt;li&gt;Try simulating stress or high traffic to watch your load balancing strategy in action.&lt;/li&gt;
&lt;li&gt;Experiment with sticky sessions, round-robin, and IP-hash load balancing techniques.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is how production-scale infrastructure starts.&lt;/p&gt;

&lt;h4&gt;
  
  
  Rebuild the Proxy Setup in Nginx
&lt;/h4&gt;

&lt;p&gt;Everything you’ve configured with Apache can be reimagined in &lt;strong&gt;Nginx&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Learn how to configure reverse proxies and forward proxies in Nginx.&lt;/li&gt;
&lt;li&gt;Explore advanced modules like &lt;code&gt;ngx_http_proxy_connect_module&lt;/code&gt; for forward proxying.&lt;/li&gt;
&lt;li&gt;Compare the verbosity, performance, and control between Apache and Nginx.&lt;/li&gt;
&lt;li&gt;You’ll appreciate Apache more—and gain a new appreciation for Nginx too.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You’re not starting from zero anymore. You now have mental models to guide you.&lt;/p&gt;

&lt;h4&gt;
  
  
  Prepare for Production-Like Workflows
&lt;/h4&gt;

&lt;p&gt;Imagine if this app had users. Or stakeholders. Or deadlines. What would you automate?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Practice triggering deployments from GitHub via &lt;strong&gt;webhooks&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Explore &lt;strong&gt;CI/CD pipelines&lt;/strong&gt; with tools like &lt;strong&gt;GitHub Actions&lt;/strong&gt;, &lt;strong&gt;ArgoCD&lt;/strong&gt;, or &lt;strong&gt;Terraform&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Think about container orchestration and start reading up on &lt;strong&gt;Kubernetes&lt;/strong&gt; or &lt;strong&gt;Nomad&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Look into secrets management, versioned configuration, or observability tooling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are things you’ll naturally grow into—and now you know where to start.&lt;/p&gt;

&lt;h3&gt;
  
  
  Final Thoughts
&lt;/h3&gt;

&lt;p&gt;This wasn’t just a tutorial. It was a walk down the forgotten path of &lt;strong&gt;technical self-reliance&lt;/strong&gt;—the kind that builds confidence, clarity, and curiosity.&lt;/p&gt;

&lt;p&gt;Yes, modern SaaS and PaaS platforms are convenient—but they abstract away the very systems we’re responsible for. Sometimes, by getting your hands dirty and walking a little closer to the metal, you reclaim something powerful: your understanding.&lt;/p&gt;

&lt;p&gt;So, keep tinkering. Keep asking questions. Keep exploring.&lt;/p&gt;

&lt;p&gt;You don’t need a million-dollar cloud budget to learn this stuff.&lt;br&gt;
You just need a \$5 Linode, some grit, and a healthy dose of curiosity.&lt;/p&gt;

&lt;p&gt;You’ve got this.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Building practical workflows: data observability, AI trend analysis &amp; proactivity.</title>
      <dc:creator>JOOJO DONTOH</dc:creator>
      <pubDate>Sun, 29 Jun 2025 10:15:55 +0000</pubDate>
      <link>https://dev.to/joojodontoh/building-practical-workflows-data-observability-ai-trend-analysis-proactivity-3odi</link>
      <guid>https://dev.to/joojodontoh/building-practical-workflows-data-observability-ai-trend-analysis-proactivity-3odi</guid>
      <description>&lt;h3&gt;
  
  
  Introduction
&lt;/h3&gt;

&lt;p&gt;Today I want to talk about data observability and a few other things. My team and I have continuously worked on this section of our workflow I would like to share something about it. To understand data, one must first recognize that it is not merely an output, but a reflection of the logic that has been executed across systems. Every user action, system response, or triggered event leaves behind a residue in the form of recorded information. Rather than existing in abstraction, data carries the imprint of the logic that shaped it, effectively turning each row, record, or object into a timestamped decision made by code.&lt;/p&gt;

&lt;p&gt;But working with data isn’t always smooth. Problems often arise when different systems store the same data differently, leading to confusion about which version is correct. Logic applied inconsistently across services creates more gaps. Sometimes data is left without clear ownership, making it hard to maintain. As systems grow, understanding what the data means — and how to work with it — becomes harder. And when teams rely on external tools or manual steps to combine or process data, the risk of mistakes increases.&lt;/p&gt;

&lt;p&gt;These issues highlight why data observability matters. Without it, teams can’t easily tell where problems come from or whether their data can be trusted. Observability gives clarity. It helps teams understand how data flows, where it breaks, and how to fix it before it becomes a bigger issue.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is Data Observability?
&lt;/h3&gt;

&lt;p&gt;Data observability is the ability to monitor the health and behavior of data as it moves through a system. At its core, it ensures that data is accurate, consistent, and reliable. This means spotting when data is missing, outdated, duplicated, or corrupted — and knowing where and why it happened.&lt;/p&gt;

&lt;p&gt;With strong observability in place, teams can quickly detect issues and trace them back to the root cause. For example, if a report shows incorrect numbers, observability makes it easier to see whether the issue came from a failed data load, a logic error, or a stale source. Instead of guessing, teams can investigate with confidence and resolve problems faster.&lt;/p&gt;

&lt;p&gt;Beyond fixing errors, data observability plays a key role in decision-making. When teams trust the data, they can make faster and more informed choices. From refining product strategies to debugging subscription flows or interpreting performance metrics, good data leads to better outcomes and observability is what makes that trust possible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding Data Flow
&lt;/h3&gt;

&lt;p&gt;To practice data observability effectively, a team must first understand how data flows through their system. This means tracking how data is created, how it changes over time, and where it ends up. Without this awareness, it’s difficult to catch issues or explain unexpected results. &lt;/p&gt;

&lt;p&gt;Every piece of data goes through different &lt;strong&gt;states&lt;/strong&gt;. For example, a subscription might start in a &lt;code&gt;PENDING_ACTIVATION&lt;/code&gt; state, move to &lt;code&gt;ACTIVE&lt;/code&gt;, and eventually become &lt;code&gt;EXPIRED&lt;/code&gt; or &lt;code&gt;CANCELLED&lt;/code&gt;. Each of these states has a meaning tied to business logic. &lt;code&gt;PENDING_ACTIVATION&lt;/code&gt; might signal that a user has initiated a subscription but hasn’t yet activated it. &lt;code&gt;EXPIRED&lt;/code&gt; could mean the subscription ended naturally, while &lt;code&gt;CANCELLED&lt;/code&gt; might indicate a user-initiated termination or a system-triggered rollback.&lt;/p&gt;

&lt;p&gt;It’s also important to define how long data is expected to stay in each state. A record stuck in &lt;code&gt;PENDING_ACTIVATION&lt;/code&gt; for more than 24 hours might be a red flag. Without defined time windows, teams won’t know when data is stale or whether something has failed silently.&lt;/p&gt;

&lt;p&gt;Equally critical is understanding the &lt;strong&gt;transition routes&lt;/strong&gt; — how data moves from one state to another. Tracking these transitions creates transparency and accountability. The best way to do this is through &lt;strong&gt;change history&lt;/strong&gt;. A well-structured change history logs not just what changed, but also metadata around the change: who or what triggered it, when it happened, and why. The ideal structure includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Metadata&lt;/strong&gt; (timestamp, source, actor),&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Old Data&lt;/strong&gt; (previous state or value),&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New Data&lt;/strong&gt; (the updated state or value).
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[
  {
    "metadata": {
      "partnerSubscriptionId": "XXXXXXXXXXXXXX",
      "subscriptionEndDate": 1749200000000,
      "smc": "*********",
      "hhid": "*********",
      "salesChannel": "CHANNEL_X",
      "packId": "generic-pack-id",
      "createdAt": 1746000000000,
      "partner": "PARTNER_X",
      "assetId": "GenericAsset",
      "client": "INTERNAL_SYSTEM",
      "subscriptionId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
      "subscriptionStartDate": 1746000000000,
      "userProductSubscriptionId": "XXXXXXXXXXXXXX"
    },
    "hhid": "XXXXXXXX",
    "operation": "MODIFY",
    "puid": "anonymous",
    "loggedAt": 1749200000000,
    "accountId": "anonymous",
    "subscriptionId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "table": "subscription_table",
    "smc": "XXXXXXXXXXXX",
    "apId": "anonymous",
    "oldData": {
      "subscriptionStatus": "ACTIVE",
      "autoRenew": true,
      "updatedAt": 1746000000000
    },
    "subscriptionStatus": "EXPIRED",
    "assetId": "GenericAsset",
    "partner": "PARTNER_X",
    "SK": "CHANGE#1749200000000",
    "newData": {
      "subscriptionStatus": "EXPIRED",
      "autoRenew": false,
      "updatedAt": 1749200000000
    },
    "id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "packId": "generic-pack-monthly"
  }
]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdm81mx8rwg9z6m1d2rot.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdm81mx8rwg9z6m1d2rot.JPG" alt="Change history" width="800" height="318"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this level of tracking, teams gain a clear view into the life cycle of any data point, making root cause analysis, debugging, and auditing significantly easier.&lt;/p&gt;

&lt;h3&gt;
  
  
  Consolidation and Aggregation
&lt;/h3&gt;

&lt;p&gt;Healthy data starts with the ability to see the full picture — not just fragments scattered across systems. In modern architectures, it's common for information about a single entity to live in multiple datastores, maintained by different services. Without aggregation, each piece of data remains incomplete, and insights drawn from them are at best limited, at worst misleading.&lt;/p&gt;

&lt;p&gt;To make data useful, teams must &lt;strong&gt;consolidate it across both internal and external sources&lt;/strong&gt;. This requires a clear understanding of actor profiles which in some ways are a logical grouping of all relevant data tied to a single subject, such as a user, account, or device. Without this profile view, systems remain reactive and siloed. Nobody wants silos&lt;/p&gt;

&lt;p&gt;In our case, aggregation is performed &lt;strong&gt;on demand&lt;/strong&gt; in specific contexts. For example, when we retrieve information tied to a user's smart card, we gather data from several sources at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subscription data&lt;/strong&gt;, stored internally and reflecting the user's current and historical subscriptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entitlement data&lt;/strong&gt;, calculated dynamically through CRM logic that applies partner-specific rules, eligibility criteria, and service configurations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partner-related data&lt;/strong&gt;, which may be sourced externally and used to contextualize how the user interacts with third-party services.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2dob2ztun5wuwycwluc2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2dob2ztun5wuwycwluc2.png" alt=" " width="800" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each of these datasets plays a role in shaping the full state of the user. Without aggregation, teams would have to manually stitch these pieces together which is too slow and fragile to support modern operations.&lt;/p&gt;

&lt;p&gt;Additionally, &lt;strong&gt;system configurational data&lt;/strong&gt; — feature flags, environment-specific settings, or service-level parameters — must be easy to access and interpret. When teams have quick visibility into the configuration that shaped a data point, debugging becomes faster and business behavior easier to explain. It’s not enough to track the data alone; we must also track the rules that govern how that data behaves.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding Data Freshness and Completeness
&lt;/h3&gt;

&lt;p&gt;Healthy data must not only be accurate — it must also be timely and complete. This is especially true in environments where data is sourced from multiple partners and systems. For our team, &lt;strong&gt;freshness&lt;/strong&gt; refers to how recently the data was updated and how reliably it reflects the current state of CRM activity (our source of truth). When working with external integrations, it’s important to recognize that not all systems operate in real time, so decisions must be made about how often data is fetched, transformed, and loaded.&lt;/p&gt;

&lt;p&gt;These decisions directly tie into &lt;strong&gt;ETL pipeline design&lt;/strong&gt;. Striking the right balance between &lt;strong&gt;data consistency and performance&lt;/strong&gt; is key. Trying to always stay perfectly in sync with every upstream system can create unnecessary load or latency. On the other hand, overly infrequent updates can make the data stale and unusable.&lt;/p&gt;

&lt;p&gt;To address this, our ETL pipelines are designed to scale both &lt;strong&gt;logically and operationally&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Extraction&lt;/strong&gt; is often handled asynchronously by standalone services or serverless functions with dedicated resources. Since this stage can be resource-intensive, especially when pulling from partner APIs or scanning large internal datasets — it’s decoupled from real-time workflows. This decoupling is important to maintain availability and durability of your real time workflows. The &lt;strong&gt;frequency of extraction&lt;/strong&gt; is tuned based on the freshness requirement of each data source. Some may be pulled hourly, others daily, depending on how critical and volatile the data is.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Transformation&lt;/strong&gt; includes various forms of computation which include simple mapping to statistical aggregations like totals, averages, and distributions. In our system, partner-specific subscription data is transformed concurrently, using context-aware processing that segments workloads by partner to avoid bottlenecks. Depending on complexity and resource cost, these transformations either happen within the extraction step or are delegated to separate transformation functions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Load&lt;/strong&gt; is the final stage, where the processed data is stored. For most of our needs, a single structured &lt;strong&gt;JSON file stored in S3&lt;/strong&gt; suffices, given that the data is precalculated and intended for read-heavy use cases. To improve performance, we place a &lt;strong&gt;read-through cache&lt;/strong&gt; in front of this storage, allowing downstream consumers to access the latest data quickly. Whenever the load process completes, the cache is also cleared and refreshed, ensuring consistency between stored data and what consumers read.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38gfl8rip0cp9jsdpcj4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38gfl8rip0cp9jsdpcj4.png" alt=" " width="800" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Completeness is another pillar of data health. Often, the focus is on sanitizing data at entry points — validating input, enforcing schemas, ensuring type safety. But &lt;strong&gt;sanitization during retrieval is just as important&lt;/strong&gt;, especially in systems where manual edits, migrations, or external syncs might have bypassed initial validation. We treat both entry and exit as critical points for enforcing data standards, catching missing attributes, and preserving structural integrity.&lt;/p&gt;

&lt;p&gt;Without freshness, data becomes misleading. Without completeness, it becomes fragile. Observability into both helps teams ensure that what they’re seeing is both recent and whole.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating Avenues to View Data Health
&lt;/h3&gt;

&lt;p&gt;Observability is only useful when data health can be inspected both broadly and in context. For a team to react quickly to issues, understand root causes, or maintain confidence in their systems, there must be clear and accessible ways to monitor how data is behaving over time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Viewing data health can happen at two levels&lt;/strong&gt;: system-wide or context-specific. A broad system view might highlight trends, such as an increase in failed data loads or a drop in expected event volumes. But often, the most meaningful insights come from zooming into a specific &lt;strong&gt;actor&lt;/strong&gt; or &lt;strong&gt;data entity&lt;/strong&gt; — seeing what happened, when, and why.&lt;/p&gt;

&lt;p&gt;In our systems, this context-based observability takes on many forms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;User subscriptions&lt;/strong&gt; are a core entity we track. Each subscription carries a lifecycle — from activation to renewal to expiration — and understanding the health of this data involves checking if transitions occurred as expected, if timestamps align, and if associated metadata (like auto-renew flags or entitlement links) are correct and intact. If a subscription appears stuck or missing key attributes, it may indicate a broader system issue.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scheduled actions&lt;/strong&gt; are another important context. These are time-driven operations like renewals, cancellations, or retries. To understand their health, we support queries across time windows and statuses — such as identifying actions that were &lt;code&gt;QUEUED&lt;/code&gt; but never &lt;code&gt;EXECUTED&lt;/code&gt;, or that failed unexpectedly. Being able to slice this by partner, product, or status allows teams to quickly isolate patterns and respond.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgprn316nype3sncbd5x9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgprn316nype3sncbd5x9.png" alt=" " width="800" height="107"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9ofcvnuwiia3zxs90vl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9ofcvnuwiia3zxs90vl.png" alt=" " width="800" height="177"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymgpc90i4svfw2tqu1xo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymgpc90i4svfw2tqu1xo.png" alt=" " width="800" height="183"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Partner events&lt;/strong&gt;, which are signals from external systems, form another layer of contextual health. These events might indicate that a user has activated a service, consumed content, or encountered an error. We monitor if these events are received, verify that they’re parsed accurately, and ensure downstream systems respond as intended. When expected events go missing or arrive malformed, it becomes a signal that something upstream may be broken.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07zbiddd7egqwcj3cefr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F07zbiddd7egqwcj3cefr.png" alt=" " width="800" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By building these contextual views, teams gain the ability to not just observe data — but to &lt;strong&gt;understand&lt;/strong&gt; it. Investigating a single issue or analyzing long-term trends, these views into data health are what separate reactive problem-solving from proactive improvement.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating Avenues for Meaningful Data Transition, Extraction, and Storytelling
&lt;/h3&gt;

&lt;p&gt;Raw data, no matter how accurate or complete, becomes significantly more valuable when teams can &lt;strong&gt;visualize, interpret, and communicate&lt;/strong&gt; its meaning. Data storytelling transforms numbers and transitions into narratives that drive understanding, alignment, and action — especially for non-technical stakeholders.&lt;/p&gt;

&lt;p&gt;We start with &lt;strong&gt;visualization&lt;/strong&gt;, which is often the most immediate way to surface meaning. &lt;strong&gt;Charts&lt;/strong&gt; help display trends, distributions, and anomalies in a digestible format — whether it's a spike in subscription failures or a dip in partner event delivery. When paired with &lt;strong&gt;color-coded statuses&lt;/strong&gt;, these visualizations can immediately highlight the state of a dataset or flow — for instance, using green for COMPLETED, yellow for PENDING, and red for FAILED — without requiring users to parse detailed logs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbtc6ki0smqwkv4ly13e2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbtc6ki0smqwkv4ly13e2.png" alt=" " width="800" height="115"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw4056dyyrhgrzi2jhu9n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw4056dyyrhgrzi2jhu9n.png" alt=" " width="800" height="243"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ahiis1f81fksokg58gm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ahiis1f81fksokg58gm.png" alt=" " width="800" height="229"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Beyond visuals, we invest heavily in &lt;strong&gt;AI-generated summaries&lt;/strong&gt; to bridge the gap between raw data and human decision-making. Our team uses &lt;strong&gt;in-house agents&lt;/strong&gt; to generate summaries at different levels of granularity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For &lt;strong&gt;user actors&lt;/strong&gt;, the agent produces insights such as subscription health, recent failures, entitlement mismatches, or eligibility violations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9o4aj84s8165k3o8ligt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9o4aj84s8165k3o8ligt.png" alt=" " width="800" height="687"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For &lt;strong&gt;partners&lt;/strong&gt;, another agent compiles metrics and patterns into &lt;strong&gt;periodic reports and strategic recommendations&lt;/strong&gt;, covering usage, errors, and integration health.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo89biwx7tppv48yds3p8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo89biwx7tppv48yds3p8.png" alt=" " width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We're actively exploring ways to &lt;strong&gt;enhance these summaries with memory and context&lt;/strong&gt;. One improvement involves converting generated summaries into &lt;strong&gt;embeddings&lt;/strong&gt; using NLP(Natural Language Processing) techniques and storing them as &lt;strong&gt;vectors&lt;/strong&gt;. Then, on the next analysis request, the agent could convert the new prompt into an embedding, retrieve the &lt;strong&gt;five most similar historical summaries&lt;/strong&gt;, and enrich the prompt with this context. This approach helps produce better, more informed summaries that evolve over time.&lt;/p&gt;

&lt;p&gt;These generated insights are often used in &lt;strong&gt;non-technical decision-making&lt;/strong&gt;, from partner relationship discussions to strategic roadmap planning. For this reason, we also support &lt;strong&gt;easy export options&lt;/strong&gt; — allowing summaries to be copied directly or downloaded as &lt;strong&gt;PDFs&lt;/strong&gt; for distribution in reports and presentations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tp0pntr4bhel4bkd2ki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5tp0pntr4bhel4bkd2ki.png" alt=" " width="800" height="162"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To maintain performance and availability, especially under repeated or automated usage, these agents are backed by &lt;strong&gt;read-through caches&lt;/strong&gt;. This prevents overloading the AI systems, reduces latency for frequent queries, and ensures consistency in outputs for the same context.&lt;/p&gt;

&lt;p&gt;Ultimately, storytelling is what allows technical data to influence real-world outcomes. By creating tools that present, explain, and share data meaningfully, we ensure it has the power to inform and guide decisions at every level of the organization.&lt;/p&gt;

&lt;h3&gt;
  
  
  Proactive Data Issue Resolution
&lt;/h3&gt;

&lt;p&gt;While observability helps teams monitor and understand data, the real advantage comes when those insights are used to resolve issues &lt;strong&gt;before&lt;/strong&gt; they escalate. Proactive data resolution means building systems that not only detect anomalies but also guide, automate, or trigger corrective actions across the stack.&lt;/p&gt;

&lt;p&gt;The first step involves &lt;strong&gt;static logical analysis&lt;/strong&gt;, which scans data against clearly defined rules to identify &lt;strong&gt;structured violations&lt;/strong&gt;. These are issues that can be caught with deterministic checks — for example, a subscription marked &lt;code&gt;ACTIVE&lt;/code&gt; but missing a &lt;code&gt;startDate&lt;/code&gt;, or an entitlement with invalid configuration. These checks are currently run on demand during subscription data aggregation. In the future we will automate them to run regularly and help catch data that’s in a broken but detectable state.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fet6b2cwew0r0206ownls.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fet6b2cwew0r0206ownls.png" alt=" " width="800" height="264"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;More complex problems — especially those involving pattern recognition or inconsistent data relationships — require &lt;strong&gt;AI-driven suggestions&lt;/strong&gt;. These AI agents help identify &lt;strong&gt;unstructured violations&lt;/strong&gt;, such as unexpected spikes in cancellations, or subtle mismatches between entitlements and partner rules. These suggestions are governed by &lt;strong&gt;configurable guardrails&lt;/strong&gt; to ensure they stay within bounds that are understandable and controllable by the team. On the backend, we track &lt;strong&gt;prompt consumption&lt;/strong&gt;, not just for logging and debugging, but to safeguard against misuse, hallucination, or context drift that could compromise model accuracy or security.&lt;/p&gt;

&lt;p&gt;Once a violation is detected, either through a rule or an AI-generated suggestion, resolution must be actionable. That’s why we’ve built &lt;strong&gt;agents that integrate directly with task management tools like Jira&lt;/strong&gt;. When an issue is confirmed, these agents can suggest Jira tickets with full context: the data in question, violation type, metadata, and a recommended fix path. This shortens the cycle between detection and accountability, allowing issues to be tracked and resolved in standard engineering workflows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm1rejong2v19z2ayke8l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm1rejong2v19z2ayke8l.png" alt=" " width="800" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy24e1cldegesxq93xzi6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy24e1cldegesxq93xzi6.png" alt=" " width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another key pillar of proactive resolution is maintaining &lt;strong&gt;synchronization between related datastores&lt;/strong&gt;. In systems where multiple services maintain different views of the same data, desyncs are inevitable. To address this, we intend to use both &lt;strong&gt;manual and automated sync pipelines&lt;/strong&gt;. Some pipelines would run on a schedule to reconcile mismatches, while others can be triggered ad hoc when a drift is manually detected. These processes ensure consistency without requiring constant developer intervention.&lt;/p&gt;

&lt;p&gt;Proactivity in data management isn’t just about building alerts — it’s about designing flows that detect, explain, and repair issues at the right level of automation. The result is a system that doesn’t just observe its state, but works to maintain its integrity in real time.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Data Observability Has Helped My Team
&lt;/h3&gt;

&lt;p&gt;Adopting data observability has had a transformative effect on how our team operates. What once required manual digging, cross-referencing, and tribal knowledge can now be done quickly, visually, and with far more confidence. The biggest shift has been in &lt;strong&gt;data surveillance&lt;/strong&gt; — we now have a clear, consistent view of how data moves and behaves throughout our systems.&lt;/p&gt;

&lt;p&gt;One of the most immediate benefits is how &lt;strong&gt;easily team members can understand user data profiles&lt;/strong&gt;. A developer debugging a flow, a QA tester verifying a fix, or a product owner validating a new rule can all inspect the full data picture for any given user without jumping across dashboards or databases. This has made &lt;strong&gt;behavioral patterns more traceable&lt;/strong&gt;, allowing us to detect anomalies like missing activations, frequent subscription failures, or inconsistent entitlement states.&lt;/p&gt;

&lt;p&gt;Data gaps such as missing fields, incomplete transitions, or failed triggers now surface visibly, making them easy to &lt;strong&gt;flag and investigate early&lt;/strong&gt;. This has greatly improved our &lt;strong&gt;QA workflow&lt;/strong&gt;, as testers no longer need to manually reconstruct test cases from fragmented logs. Instead, they can validate entire flows from a central point of visibility. Even during &lt;strong&gt;UAT&lt;/strong&gt;, stakeholders can observe how data responds across environments with a &lt;strong&gt;bird’s eye view&lt;/strong&gt;, reducing ambiguity and speeding up feedback cycles. My team is in the process of expanding viewership capabilities of the dashboard to first responders and customer service, which will greatly boost their abilities while helping customers.&lt;/p&gt;

&lt;p&gt;Beyond day-to-day operations, observability has helped with &lt;strong&gt;tracking one-time activities&lt;/strong&gt;, such as &lt;strong&gt;bulk email campaigns&lt;/strong&gt; or &lt;strong&gt;pre-provisioning subscription entitlements&lt;/strong&gt;. These kinds of scheduled jobs are notoriously hard to monitor without proper instrumentation. With observability in place, we can now monitor their execution, volume, and any edge-case failures without writing one-off scripts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0ltijfpistme3709013.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0ltijfpistme3709013.png" alt=" " width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, observability has created a direct line of visibility for leadership. &lt;strong&gt;High-level statistics&lt;/strong&gt;, such as total active subscriptions, partner-triggered event rates, or renewal success ratios, are now exposed through curated summaries and dashboards. This allows &lt;strong&gt;management to make decisions based on data&lt;/strong&gt;, without relying on delayed reports or engineering cycles to extract insights.&lt;/p&gt;

&lt;p&gt;In short, observability hasn’t just improved how we handle data — it’s improved how the entire team communicates, collaborates, and aligns around it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Future Improvements in Terms of AI
&lt;/h3&gt;

&lt;p&gt;As our use of data observability matures, we’re looking to expand our AI capabilities with a sharper focus on context-awareness, scalability, and practical integration across teams. One of our main objectives is to maintain a dedicated &lt;strong&gt;Small Language Model (SLM)&lt;/strong&gt; that is trained on internal systems, workflows, and vocabulary. This SLM would act as a lightweight, focused assistant — optimized for our operational context — and &lt;strong&gt;continuously refined&lt;/strong&gt; by internal AI teams to stay aligned with evolving business needs.&lt;/p&gt;

&lt;p&gt;A deeper understanding of &lt;strong&gt;model management&lt;/strong&gt; will be essential. Beyond just deploying models, we’re considering a foundation for version control, prompt governance, feedback loops, and performance evaluation in real-world scenarios. The goal is to ensure that the models we rely on not only produce accurate results but also reflect the nuances of our environment and workflows.&lt;/p&gt;

&lt;p&gt;We also plan to &lt;strong&gt;extend decision-making workflows through AI&lt;/strong&gt;. This could include suggesting data fixes, detecting and prioritizing anomalies, and recommending operational actions based on historical patterns. These automations wouldn’t replace human decisions, but rather &lt;strong&gt;amplify the speed and quality of those decisions&lt;/strong&gt;, especially in high-volume or high-pressure contexts.&lt;/p&gt;

&lt;p&gt;Finally, one of the most exciting frontiers is connecting AI to &lt;strong&gt;team priorities and planning&lt;/strong&gt;. We envision tools that can monitor workstreams, identify friction points, and &lt;strong&gt;suggest roadmap improvements&lt;/strong&gt; based on observed data and recurring pain points. Highlighting areas ripe for automation or surfacing issues that consistently slow down delivery, AI can play a role in shaping strategy, not just supporting it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;Data is no longer just an output of system behavior, it is mostly the foundation on which modern decisions, automation, and user experiences are built. As many systems scale and complexity grows, it becomes crucial not only to collect data but to observe it meaningfully. &lt;strong&gt;Data observability ensures that data is complete, accurate, and timely&lt;/strong&gt;, enabling teams to debug faster, monitor more effectively, and act with confidence.&lt;/p&gt;

&lt;p&gt;But observability is only one side of the equation. The other is &lt;strong&gt;intelligence&lt;/strong&gt; — and this is where AI comes in. By summarizing, interpreting, and recommending actions based on observed data, AI allows teams to move from passive awareness to proactive resolution and strategic foresight. Through summaries tailored to users and partners or workflow-integrated agents that assist with decision-making, AI transforms observability from a monitoring tool into a driver of improvement.&lt;/p&gt;

&lt;p&gt;Together, data observability and AI form a powerful loop: observability provides the clarity needed to understand the system, while AI brings the intelligence needed to optimize it. The future lies in continuously refining both — building systems that not only see clearly, but think ahead.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Improving Deployment Velocity: How We Rebuilt for Speed and Sustainability</title>
      <dc:creator>JOOJO DONTOH</dc:creator>
      <pubDate>Thu, 22 May 2025 09:53:06 +0000</pubDate>
      <link>https://dev.to/joojodontoh/improving-deployment-velocity-how-we-rebuilt-for-speed-and-sustainability-7on</link>
      <guid>https://dev.to/joojodontoh/improving-deployment-velocity-how-we-rebuilt-for-speed-and-sustainability-7on</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;When we talk about engineering performance, &lt;strong&gt;deployment velocity&lt;/strong&gt; may be one of the clearest indicators of how effectively a team delivers software. At its core, deployment velocity measures how often code changes are pushed to production. It reflects a team's ability to &lt;strong&gt;move fast without breaking things often&lt;/strong&gt;, respond to change, and continuously improve. High deployment velocity means features, fixes, and improvements reach consumers more quickly, which directly benefits product delivery. For engineers, it creates a healthy rhythm of execution and feedback. It reduces the pressure of large, infrequent releases and gives developers a sense of momentum and progress. When velocity is high and sustainable, it usually points to a team that’s well-organized, technically sound, and empowered to ship confidently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tracking What Matters: Deployment as a Reflection of Team Growth
&lt;/h2&gt;

&lt;p&gt;One of the clearest signs of progress we’ve made as a team has been the improvement in our &lt;strong&gt;deployment velocity&lt;/strong&gt;—a reflection not just of speed, but of how well we’ve grown in our ability to plan, execute, and deliver. This success isn’t mine alone, it’s mostly the result of a committed, teachable, and resilient team that embraced change and moved with it. Truly grateful to them. From a measurement standpoint, we were in a good position: our team was already using &lt;strong&gt;Jira’s ecosystem effectively&lt;/strong&gt;, with structured &lt;strong&gt;ticket creation&lt;/strong&gt;, &lt;strong&gt;deployment tracking through Bitbucket&lt;/strong&gt;, and &lt;strong&gt;clear release versioning&lt;/strong&gt;. This meant that we had a consistent stream of data about our work, which gave us a solid foundation to assess progress. Having access to this kind of visibility is crucial as it sets the stage not only for identifying what’s going well, but also for spotting where things might need attention. It helps create a culture where improvement isn’t guesswork—it’s informed and intentional.&lt;/p&gt;

&lt;p&gt;To evaluate our delivery progress, I extracted deployment data from Jira’s Deployment Panel and analyzed two distinct 9-month periods: one prior to my joining (October 2023 – July 2024), and one after (August 2024 – May 2025). The analysis focused exclusively on &lt;strong&gt;successful production deployments&lt;/strong&gt;, ensuring that only &lt;strong&gt;one deployment per day&lt;/strong&gt; was acknowledged to avoid overcounting batch releases or automated retries.&lt;/p&gt;

&lt;p&gt;We measured progress by calculating the &lt;strong&gt;average number of production deployments per week&lt;/strong&gt; — a clear, time-normalized metric that reflects delivery cadence. In the 9 months before I joined, there were &lt;strong&gt;4 unique and major production deployments&lt;/strong&gt;, averaging &lt;strong&gt;0.09 deployments per week&lt;/strong&gt;. In the 9 months following my onboarding, that number grew to &lt;strong&gt;28&lt;/strong&gt;, with a corresponding &lt;strong&gt;weekly velocity of 0.67 deployments&lt;/strong&gt;. This represents a &lt;strong&gt;0.58 increase in weekly production deployments&lt;/strong&gt;, or a &lt;strong&gt;633.33% improvement in deployment velocity&lt;/strong&gt; — a strong signal of enhanced team autonomy, release confidence, and operational maturity.&lt;/p&gt;

&lt;p&gt;But these numbers don’t exist in a vacuum. They represent a deeper story of teamwork, trust, and continuous learning. They reflect the changes we made together: better processes, clearer workflows, more confident code, and a shared commitment to improving how we deliver. The steep increase in velocity is also due to enabling deployments for different services while adding a lot more services to the team's portfolio. Tracking this wasn’t about proving a point—it was about understanding our pace, staying accountable, and creating space for sustainable growth.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Far4rcgk6i4wlptchp972.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Far4rcgk6i4wlptchp972.png" alt=" " width="630" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Meeting the Team That Made It Possible
&lt;/h2&gt;

&lt;p&gt;When I first joined the team, I walked into a group of individuals who, despite their different levels of experience, were deeply committed to getting things done. My manager was a key pillar—&lt;strong&gt;resourceful and responsive&lt;/strong&gt;, always quick to remove blockers and bridge communication with upper management so I could focus on solving problems. Our &lt;strong&gt;scrum master&lt;/strong&gt; brought structure and consistency, especially in cross-team coordination, which was critical for syncing dependencies and moving work forward. I also had the support of &lt;strong&gt;two highly detail-oriented QA engineers&lt;/strong&gt; who ensured we maintained quality even under tight timelines. Then there were the &lt;strong&gt;engineers—young, talented, and incredibly teachable&lt;/strong&gt;. While some were more senior and confident in their technical abilities, others were still finding their footing, but all of them shared a willingness to learn and improve. A few had an impressive grasp of the product and its edge cases, which was a huge help in my early days—they helped accelerate my understanding of the system far more than any documentation could have. Looking back, I’m reminded that transformation doesn’t start with tools—it starts with people. And I was fortunate to walk into a team that had the right mix of curiosity, humility, and heart.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Codebase and the System We Serve
&lt;/h2&gt;

&lt;p&gt;When I first met the codebase, I was stepping into a system built to solve a very specific and critical set of problems—&lt;strong&gt;managing user viewership access across multiple OTT partners&lt;/strong&gt;, syncing that with a central CRM, and surfacing valuable data for the analytics team. At its core, the software ensures that when a user is granted access to a service like Prime or Viu, that entitlement is correctly handled, tracked, and communicated across platforms. The stack was familiar: &lt;strong&gt;JavaScript (Node.js)&lt;/strong&gt; on the backend, &lt;strong&gt;DynamoDB and RDS&lt;/strong&gt; for storage, and a broad use of &lt;strong&gt;AWS services&lt;/strong&gt; to handle deployment and orchestration. What made it more interesting, though, was the fact that I joined during a pivotal architectural transition. The team was &lt;strong&gt;shifting from a fragmented service-per-OTT model to a unified, partner-agnostic architecture&lt;/strong&gt;—something that not only streamlined logic, but allowed for better reusability and maintenance. We were also moving away from long-running &lt;strong&gt;EC2-based services&lt;/strong&gt; toward a &lt;strong&gt;modular, event-driven architecture powered by AWS Lambda&lt;/strong&gt;, which significantly reduced costs and simplified scaling.&lt;/p&gt;

&lt;p&gt;The codebase itself was structured as a &lt;strong&gt;collection of discrete Lambda functions&lt;/strong&gt;, each mapped to specific handlers and responsibilities. Shared logic and utilities were published and reused across functions using &lt;strong&gt;private NPM packages&lt;/strong&gt;, allowing for cleaner separation and less duplication. The entire deployment flow was managed using the &lt;strong&gt;Serverless Framework&lt;/strong&gt;, which abstracted much of the infrastructure creation. Serverless allowed us to define shared AWS resources—&lt;strong&gt;API Gateways, IAM roles, queues, and more&lt;/strong&gt;—and expose them cleanly across services, making infrastructure both declarative and portable. It was clear that the building blocks were there. The challenge now was to refactor and elevate what existed, without disrupting what already worked.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Technical Gaps That Slowed Us Down
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Structure and Duplication&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;One of the first things I noticed was the lack of a clear and robust file structure. It wasn’t always obvious where functionality lived, and in some cases, versioning was misunderstood. New features were simply added as "v2" or "v3" rather than being named appropriately. More critically, logic was &lt;strong&gt;heavily duplicated&lt;/strong&gt; across the codebase. Similar functions existed in multiple places, often slightly tweaked but essentially performing the same task. This made maintenance time-consuming and error-prone—changing one behavior often meant hunting down and editing several versions of the same logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Configuration Chaos&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The handling of configurations posed a major challenge. Frequently changing values—such as partner IDs, environment toggles, or feature switches—were &lt;strong&gt;hardcoded directly in the code&lt;/strong&gt;. This led to repeated declarations and multiple sources of truth, making even minor updates feel fragile. Without a centralized config management system, engineers had to manually trace where each variable lived and whether it was safe to change—adding unnecessary complexity to what should’ve been routine work.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Readability and Coupling&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The code itself was often difficult to reason about. Naming conventions lacked consistency, semantics were unclear, and logic wasn’t always placed where you’d expect. This made the onboarding experience slower and raised the cost of every change. On top of that, many components were &lt;strong&gt;tightly coupled&lt;/strong&gt;—meaning a small update in one area could cause unexpected issues elsewhere. Without clear boundaries or separation of concerns, engineers were sometimes forced to write new solutions for problems that had already been solved elsewhere—just because the existing ones weren’t reusable or discoverable.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Testing and CI/CD Gaps&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Another big contributor to slow delivery was the &lt;strong&gt;lack of automated testing&lt;/strong&gt;. There were no unit tests or integration tests, so regressions were common. Every change carried risk, and confidence was low. The CI/CD pipeline also wasn’t set up to support iterative development. There was &lt;strong&gt;no continuous delivery flow&lt;/strong&gt;, and previous working features in production were sometimes overwritten by newer, unstable releases. These issues made velocity unpredictable, and it was clear that test coverage and release automation needed to be addressed before we could move faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Environment Bottlenecks&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Finally, the absence of a local development and testing environment severely limited parallel work. Engineers were forced to deploy to shared dev or staging environments just to verify basic functionality—often waiting in line to test their code. This not only delayed releases but also introduced friction into everyday development. It was clear that &lt;strong&gt;having a local sandbox&lt;/strong&gt; wasn’t just a convenience—it was a requirement for a healthy, high-velocity engineering workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Operational Gaps That introduced bottlenecks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Gaps in Requirement Gathering and Design Planning&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before I joined, there was no dedicated architect or technical lead guiding the product-engineering process. As a result, &lt;strong&gt;requirement gathering was often skipped or done informally&lt;/strong&gt;. Even after stepping in, shifting this habit took time. In the absence of structured discovery, requirements were sometimes misaligned or incomplete—leading to features being built with &lt;strong&gt;incorrect assumptions&lt;/strong&gt; or missing critical edge cases. Key stakeholders were not always engaged early enough, which meant that essential business details were occasionally left out. Additionally, &lt;strong&gt;non-functional requirements&lt;/strong&gt;—like performance, scalability, and maintainability—were rarely discussed, which impacted architectural decisions. There was little focus on translating requirements into thoughtful &lt;strong&gt;system designs&lt;/strong&gt;, leaving modularity, reusability, and extensibility by the wayside.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Inefficient QA Feedback Loops&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Our testing process also posed a challenge to velocity. Because there was limited automated test coverage and no structured regression suite, &lt;strong&gt;QA engineers had to manually retest large parts of the system—even for small changes&lt;/strong&gt;. This led to longer feedback loops, bottlenecks in the staging environment, and delays in releases. The manual nature of testing also made it difficult to move quickly and safely, especially when features or bug fixes affected shared areas of the codebase. As a result, a lot of time was spent in the validation phase, even for otherwise minor adjustments.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Ambiguous or Incomplete Tickets&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Many Jira tickets lacked &lt;strong&gt;clear acceptance criteria&lt;/strong&gt;, which caused confusion during implementation and validation. Engineers often had to chase down clarifications or interpret the requirements on their own, which led to misalignment and rework. For QA, the absence of well-defined success criteria made it harder to validate whether a feature was complete or working as intended. This ambiguity not only slowed development—it also created uncertainty around what “done” actually meant, which is critical when working in a fast-paced environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Clean up and restructuring
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Laying the Groundwork&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before diving into any cleanup or restructuring, I dedicated the &lt;strong&gt;first few weeks&lt;/strong&gt; to simply &lt;strong&gt;understanding the system and the product&lt;/strong&gt;. It was important to take a step back and observe—&lt;strong&gt;not just the code&lt;/strong&gt;, but the &lt;strong&gt;broader domain we were operating in&lt;/strong&gt;, how the existing &lt;strong&gt;architecture was structured&lt;/strong&gt;, and where the boundaries between &lt;strong&gt;what could be changed&lt;/strong&gt; and &lt;strong&gt;what needed to be worked around&lt;/strong&gt; actually lay. This initial period was essential for building context: what the service was meant to do, how different OTT integrations functioned, and where the pain points lived—both technically and operationally. I also took time to align with stakeholders on &lt;strong&gt;current deliverables&lt;/strong&gt; and expectations. One of the first pressing tasks was to &lt;strong&gt;lead the removal of the payment functionality&lt;/strong&gt; from our service. This part of the system was no longer relevant as it had been marked for migration to the CRM—and its presence was adding unnecessary complexity and risk. Taking ownership of that cleanup gave me an early opportunity to untangle a critical path, work closely with the team, and begin setting a standard for the kind of change we were about to make together.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Reshaping the Codebase&lt;/strong&gt;
&lt;/h3&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.1 Establishing a Consistent Foundation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The first step in cleaning up the codebase was to bring in some &lt;strong&gt;consistency and formatting discipline&lt;/strong&gt;. I introduced &lt;strong&gt;Prettier&lt;/strong&gt; across the entire repository and enforced a standard configuration so all contributors were working from the same baseline. This removed noise from pull requests and made the code easier to read and review. While cosmetic, this change set the tone for a more maintainable codebase and gave us a common starting point as we prepared for deeper structural changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.2 Introducing Safe Refactoring Through Testing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Given how &lt;strong&gt;coupled and fragile&lt;/strong&gt; parts of the system were, it wasn’t safe to dive straight into large refactors. To address this, I &lt;strong&gt;set up a unit testing framework&lt;/strong&gt; and added some &lt;strong&gt;base tests&lt;/strong&gt; as a starting point. I then created &lt;strong&gt;unit test jobs in the CI pipeline&lt;/strong&gt;, and hosted a walkthrough with the team to align on how this would work within our development flow. To encourage meaningful adoption, I added a &lt;strong&gt;coverage enforcement check&lt;/strong&gt; that allowed pull requests to pass only if test coverage increased compared to the current baseline. This ensured that every MR helped improve the safety net, bit by bit.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.3 Reinforcing Testing Through Example and Guidance&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To avoid wasting engineering effort or creating resistance, I &lt;strong&gt;took the lead in writing initial tests&lt;/strong&gt; for some of the more complex or obscure sections of the code. This helped show what good tests could look like and made it easier for others to follow. I also used &lt;strong&gt;TODO markers within the code&lt;/strong&gt; to flag key functions that needed coverage as they were updated during feature work. Rather than enforcing testing through policy alone, I made a habit of using &lt;strong&gt;code reviews as an opportunity to reinforce quality practices&lt;/strong&gt;—encouraging things like early returns, meaningful naming, modular design, and reusability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft1rnhepr6zz5bag7r9a4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft1rnhepr6zz5bag7r9a4.png" alt=" " width="800" height="544"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2.4 Cleaning Up Config and Reducing Duplication&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;As work progressed, one persistent pain point was the handling of configuration values. Critical settings were &lt;strong&gt;hardcoded in multiple places&lt;/strong&gt;, leading to duplication and the risk of inconsistency. To solve this, I wrote &lt;strong&gt;utility scripts that centralized config management into a single folder&lt;/strong&gt;, making updates easier and safer. This drastically reduced context-switching for engineers and helped eliminate a common source of friction. Together, these efforts gave the team a cleaner, more predictable development experience—making it easier to deliver with confidence and iterate quickly.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;3 Integration and delivery&lt;/strong&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.1 Reworking the Branching Strategy&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When I joined, all deployments were happening directly from the &lt;strong&gt;&lt;code&gt;dev&lt;/code&gt; branch&lt;/strong&gt;, which made it hard to manage stability or separate experimental changes from production-ready features. To restore control, I took the &lt;strong&gt;last known stable release branch&lt;/strong&gt;, merged it into &lt;strong&gt;&lt;code&gt;master&lt;/code&gt;&lt;/strong&gt;, and then &lt;strong&gt;rebased &lt;code&gt;master&lt;/code&gt; onto &lt;code&gt;dev&lt;/code&gt;&lt;/strong&gt; to realign the branches. Going forward, we used &lt;code&gt;dev&lt;/code&gt; as a &lt;strong&gt;long-term integration space&lt;/strong&gt;—a place for ongoing cleanup, experimentation, and quick tests—while &lt;code&gt;master&lt;/code&gt; served as the canonical source for release-ready code. This branching model created clear boundaries between work in progress and what was considered deployable, which was a crucial step toward predictable delivery.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.2 Enabling Local Development&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;One of the biggest blockers to delivery was the lack of a &lt;strong&gt;local development environment&lt;/strong&gt;. Engineers were forced to &lt;strong&gt;deploy to shared environments just to test their code&lt;/strong&gt;, meaning only one person could realistically test changes at a time. Since the system was built on a &lt;strong&gt;serverless architecture&lt;/strong&gt;, the team hadn’t yet figured out how to simulate AWS Lambda behavior locally. To solve this, I built a &lt;strong&gt;lightweight Express server&lt;/strong&gt; that mimicked the Lambda runtime. I wired up routes to invoke the existing Lambda handlers and moved all environment variables for staging and dev into &lt;strong&gt;gitignored &lt;code&gt;.env&lt;/code&gt; files&lt;/strong&gt;, using &lt;strong&gt;dotenv&lt;/strong&gt; for local support. I also refactored the handlers to support &lt;strong&gt;dual execution&lt;/strong&gt;—as both Lambda functions and Express route handlers. This allowed engineers to run and test features entirely offline. I documented this setup and added &lt;strong&gt;README steps&lt;/strong&gt;, making it easy for anyone to spin it up and start testing immediately.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.3 Expanding Testing Capacity with an Additional Environment&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With dev stabilized and local testing unlocked, the next bottleneck was &lt;strong&gt;staging&lt;/strong&gt;. QA typically validated features in this environment, but with multiple releases in play, it often became &lt;strong&gt;a single point of contention&lt;/strong&gt;. To ease the pressure, I created an &lt;strong&gt;additional staging-like environment&lt;/strong&gt; that mirrored the original setup. This provided a second testing lane, allowing the QA team to test features in parallel and helping us reduce wait times during release cycles. It was a simple change with immediate impact—engineers no longer had to wait for the “main” test environment to free up before validating their work.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.4 Building Integration Testing from the Ground Up&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Integration testing was completely absent when I arrived, which meant QA had to &lt;strong&gt;manually retest wide portions of the system&lt;/strong&gt; for even minor changes. To fix this, I created a &lt;strong&gt;dedicated integration test repository&lt;/strong&gt;. I made the tests compact and easy to run by &lt;strong&gt;embedding encrypted environment variables into the repo&lt;/strong&gt;, so engineers could decrypt and run them out of the box. The test structure mirrored the system’s endpoint layout, making them easy to navigate and extend. To drive adoption, I began pairing integration test tickets with each feature or bug fix ticket, so tests could be written alongside product work. And anytime QA uncovered an issue, we didn’t just fix it—we wrote a test for it. This wasn’t easy to automate initially, but it steadily matured into a reliable system that reduced regression risk and increased deployment confidence across the board.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzc3p5l1bvm1gi5bizksd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzc3p5l1bvm1gi5bizksd.png" alt=" " width="359" height="563"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3.5 Automating the Pipeline and Parallelizing Tests&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To bring all the pieces together, I turned my attention to the &lt;strong&gt;CI/CD pipeline&lt;/strong&gt;, which needed significant cleanup to support the environments and workflows we were building. I streamlined the pipeline configuration to properly reflect &lt;strong&gt;all available environments&lt;/strong&gt; and automated critical stages of the deployment process. I integrated &lt;strong&gt;Jira deployments&lt;/strong&gt;, allowing us to track releases directly from our task board. I also ensured that &lt;strong&gt;unit tests would run on every commit and every new merge request&lt;/strong&gt;, creating faster feedback loops and encouraging engineers to catch issues early.&lt;/p&gt;

&lt;p&gt;As we started integrating end-to-end tests, we noticed a drop in regression issues—but also a &lt;strong&gt;slowdown in build times&lt;/strong&gt;, especially since the system handled similar functionality across multiple partners. To address this, I parallelized the test suite by &lt;strong&gt;running tests separately per partner&lt;/strong&gt;, each in its own job. This was done by checking out the repo in multiple runners, tagging test files with &lt;strong&gt;partner-specific annotations&lt;/strong&gt;, and using &lt;strong&gt;Mocha&lt;/strong&gt; to selectively run the right set of tests for each parallel job. The result was a dramatic reduction in test execution time and an overall &lt;strong&gt;increase in velocity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Finally, I added &lt;strong&gt;dedicated pipeline jobs for different environments&lt;/strong&gt;, as well as manual release triggers. This gave the team &lt;strong&gt;an abstracted, automated delivery flow&lt;/strong&gt;, where engineers no longer had to manually intervene or piece together build steps. They simply pushed their code, opened a merge request, and the pipeline took care of the rest—only requiring clicks when human validation or release approvals were necessary.&lt;/p&gt;

&lt;h2&gt;
  
  
  4 Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4.1 Requirements with Architecture in Mind&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A major part of increasing delivery efficiency came from getting &lt;strong&gt;ahead of the work with clear requirements&lt;/strong&gt;. I made it a point to &lt;strong&gt;collaborate closely with stakeholders&lt;/strong&gt; early in the process—aligning on what needed to be built, freezing requirements where possible, and translating them into &lt;strong&gt;system architecture diagrams&lt;/strong&gt;. These diagrams weren’t just for me—they became a visual communication tool to bounce ideas off other engineers and architects, ensuring the design made sense before we wrote a line of code. Once confident, I broke the requirements into &lt;strong&gt;Jira tickets&lt;/strong&gt;, often with &lt;strong&gt;partial implementations or code snippets&lt;/strong&gt; inside to give engineers a head start and illustrate what clean, modular implementation could look like. When necessary, I would even join the implementation directly, which helped &lt;strong&gt;move things faster&lt;/strong&gt; and reduced context-switching across the team.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn0ozs2qssaemp41a62pu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn0ozs2qssaemp41a62pu.png" alt=" " width="800" height="542"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff49e0fqpckxdjbzinsva.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff49e0fqpckxdjbzinsva.png" alt=" " width="632" height="506"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4.2 Streamlining Workflows with Jira Automation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To reduce time spent on task management and coordination, I introduced &lt;strong&gt;lightweight automations in Jira&lt;/strong&gt; that aligned with how we actually worked. Our flow moved from &lt;strong&gt;TODO → In Progress → Review → QA → Testing → Done&lt;/strong&gt;, and my scrum master configured Jira so that &lt;strong&gt;tickets automatically moved to “Review” and were assigned to me&lt;/strong&gt; when a merge request was opened. This meant engineers could stay focused on the task itself, without having to manually update the ticket status or chase reviewers. It also helped me stay on top of what needed to be reviewed without delay.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4.3 Handling QA Feedback with Structured Ticketing&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;During QA testing, we often uncovered bugs or edge cases that weren’t initially accounted for. To manage this smoothly, we established a routine: &lt;strong&gt;categorize the issue&lt;/strong&gt;, assess its impact, and take immediate action. If it was a &lt;strong&gt;functionality break&lt;/strong&gt;, we created a &lt;strong&gt;bug ticket in the current sprint&lt;/strong&gt;. If it was a newly discovered &lt;strong&gt;edge case&lt;/strong&gt;, we’d revisit the requirements, update them if needed, and either add a ticket to the sprint or backlog. For &lt;strong&gt;architectural improvements or design gaps&lt;/strong&gt;, we created &lt;strong&gt;spike issues&lt;/strong&gt; that I usually handled personally. This workflow ensured that feedback loops were tight and transparent—and most importantly, nothing fell through the cracks.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4.4 Evolving into Automated Release Branching&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;As we matured, we moved toward a &lt;strong&gt;release branching strategy&lt;/strong&gt;, but I wanted to validate whether it fit the team’s workflow before enforcing it. So, for about &lt;strong&gt;10–40 releases&lt;/strong&gt;, we did it manually—tracking how the team responded and whether it introduced friction. Once I saw the team was comfortable, I automated the entire release flow using a &lt;strong&gt;small serverless function&lt;/strong&gt;. This script was triggered each time I created a release in Jira and handled the branching logic end-to-end. Automating this step eliminated manual effort, removed the chance of errors, and further &lt;strong&gt;increased velocity by streamlining how we shipped code&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyezqyn0ac8sgr90cqhr5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyezqyn0ac8sgr90cqhr5.png" alt=" " width="800" height="84"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk27spfsg4kvaez8ml0mm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk27spfsg4kvaez8ml0mm.png" alt=" " width="661" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  5 Enhancing the eagle's eye
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5.1 The Problem: Limited Visibility and High Debugging Cost&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before we had proper observability, investigating issues in the system was a time-consuming process. Engineers often had to &lt;strong&gt;manually query databases or comb through log streams&lt;/strong&gt; just to gather basic information. There was no easy way to trace a user's history, understand recent changes, or view how a partner integration behaved at a specific point in time. Even something as essential as &lt;strong&gt;subscription change history didn’t exist&lt;/strong&gt;, which made debugging regressions or investigating edge cases particularly difficult. This lack of visibility slowed us down in moments when speed and clarity were most needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5.2 The Solution: Observability Dashboard and Data Aggregation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To solve this, I built a &lt;strong&gt;custom observability backend and an internal only dashboard&lt;/strong&gt; that consolidated the most critical system and user data in one place. At a high level, it provided an &lt;strong&gt;overview of all partner-related activity&lt;/strong&gt;, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total subscriptions per partner&lt;/li&gt;
&lt;li&gt;Sales channel distribution&lt;/li&gt;
&lt;li&gt;Monthly subscription trends and breakdowns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also gave stakeholders powerful tools for &lt;strong&gt;user-level investigation&lt;/strong&gt;. By searching a user, they could instantly access:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identity (minus sensitive info)&lt;/li&gt;
&lt;li&gt;Device and eligibility details&lt;/li&gt;
&lt;li&gt;Subscription status and full change history&lt;/li&gt;
&lt;li&gt;All push notifications triggered by the CRM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This dramatically &lt;strong&gt;reduced the turnaround time&lt;/strong&gt; for debugging and helped teams get to the root of issues without needing backend support or deep system access.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhz9w14xowmkkrm75hnmk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhz9w14xowmkkrm75hnmk.png" alt=" " width="800" height="363"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5.3 Impact: Faster Resolution and Strategic Insight&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In addition to reducing debugging overhead, the dashboard became a &lt;strong&gt;valuable source of insight for product managers and upper leadership&lt;/strong&gt;. It helped them monitor &lt;strong&gt;subscription growth across partners&lt;/strong&gt;, identify patterns in user behavior, and assess the effectiveness of CRM events and entitlements. The real-time &lt;strong&gt;event tracking view&lt;/strong&gt; made it easier to confirm whether user actions had triggered expected flows—or pinpoint where something had silently failed. What started as a tool for engineering observability quickly became a &lt;strong&gt;shared knowledge surface&lt;/strong&gt; across teams, enabling faster collaboration, smarter decisions, and a stronger sense of control over a complex ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  6 Alerting System
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6.1 The Challenge: No Central Alerting System&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;At the time I joined, the system had &lt;strong&gt;no unified alerting mechanism&lt;/strong&gt;, and critical issues often went unnoticed until they became user-facing or required manual inspection. There was no structured way to monitor key system failures or event anomalies, which made it difficult to respond quickly in moments that required urgent action. The absence of real-time visibility into failures not only delayed incident resolution but also made the system feel opaque for both engineers and stakeholders.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6.2 The Solution: Centralized, Reusable Notification System&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To address this, I built a &lt;strong&gt;centralized error logging and alerting mechanism&lt;/strong&gt;, powered by &lt;strong&gt;AWS SNS&lt;/strong&gt;. Critical system errors and high-priority events were published to a single topic with &lt;strong&gt;filtered subscribers&lt;/strong&gt;—allowing me to fan out alerts to various consumers (emails, logs, dashboards, etc.) without duplicating logic or tightly coupling components. This architecture ensured the system remained &lt;strong&gt;modular and reusable&lt;/strong&gt;, enabling new subscribers to plug into alert streams effortlessly. More importantly, it gave key stakeholders real-time visibility into what was happening, so they could respond to incidents faster and with context.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6.3 User-Facing Notification View&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To make alerts even more accessible, I also built a &lt;strong&gt;notifications view directly into the dashboard&lt;/strong&gt;, giving users the option to &lt;strong&gt;opt in or out&lt;/strong&gt; of in-app alerts. This view allowed team members and stakeholders to see critical system activity and messages &lt;strong&gt;without relying solely on email&lt;/strong&gt;, creating a more intuitive and centralized experience. By surfacing this information in a user-friendly way, we gave everyone—engineers, QA, product leads—&lt;strong&gt;a shared awareness of system health&lt;/strong&gt;, directly within the tools they already used day-to-day.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61zlmyu01w97xy3iu7tg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F61zlmyu01w97xy3iu7tg.png" alt=" " width="440" height="180"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  7 The scheduler
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;7.1 The Problem: Scattered, Rigid Async Logic&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;When I joined, there were already some solutions in place for handling &lt;strong&gt;asynchronous activity&lt;/strong&gt;, but they were tightly coupled to specific actions—like &lt;strong&gt;subscription renewals&lt;/strong&gt; or &lt;strong&gt;email notifications&lt;/strong&gt;. While these worked in isolation, they weren’t scalable. If a new async task needed to be introduced—say, for downgrading a subscription or sending reminders—&lt;strong&gt;a brand new solution had to be built from scratch&lt;/strong&gt;. For a middleware team expected to handle a wide range of integrations and business workflows, this wasn’t sustainable. We needed something that could adapt with us.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;7.2 The Solution: Designing a Scalable, Generic Scheduler&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To solve this, I took the initiative to design and implement a &lt;strong&gt;reusable, extensible scheduling service&lt;/strong&gt;—a topic I cover in more detail in another &lt;a href="https://dev.to/joojodontoh/building-a-scalable-reliable-and-cost-effective-event-scheduler-for-asynchronous-jobs-2ac3"&gt;article&lt;/a&gt;. This new &lt;strong&gt;scheduler was built to be action-agnostic&lt;/strong&gt;. Any asynchronous activity could be represented as a scheduled "action" with its own configuration: execution time, repeat logic, and stop condition. It now handles everything from &lt;strong&gt;subscription terminations and downgrades&lt;/strong&gt; to &lt;strong&gt;reminders and notifications&lt;/strong&gt;—all in a single system. On top of that, I integrated it with our dashboard so we could get &lt;strong&gt;hourly visibility&lt;/strong&gt; into scheduled activity, giving us a strong signal on system health and operational progress. This wasn’t just a technical upgrade—it gave us clarity and control.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;7.3 The Impact: A Reusable Core That Keeps Improving&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This scheduler became a &lt;strong&gt;central piece of our architecture&lt;/strong&gt;, dramatically reducing the time needed to implement new async flows. Instead of reinventing the wheel, all we needed to do was &lt;strong&gt;schedule a new action&lt;/strong&gt; or &lt;strong&gt;extend the fulfillment logic&lt;/strong&gt;. It became a flexible engine that &lt;strong&gt;directly improved our delivery speed&lt;/strong&gt;, because it removed the need for repetitive, boilerplate async infrastructure. That said, the journey wasn’t without challenges. We faced (and still refine) issues around &lt;strong&gt;load handling&lt;/strong&gt;, &lt;strong&gt;concurrency&lt;/strong&gt;, &lt;strong&gt;rate limiting&lt;/strong&gt;, and &lt;strong&gt;deduplication&lt;/strong&gt;. But the difference now is that we’re improving a single, unified system—not stitching together new ones with every requirement. The scheduler turned a scattered pattern into a strategic capability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiex8dm5no3oj7086h2oz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiex8dm5no3oj7086h2oz.png" alt=" " width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;The Challenges&lt;/strong&gt;
&lt;/h1&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Understanding a Complex System Without a Map&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;One of the toughest parts of stepping into this role was &lt;strong&gt;grasping the entire system end-to-end&lt;/strong&gt;—not because the business model was deeply complex, but because the supporting &lt;strong&gt;documentation was sparse or outdated&lt;/strong&gt;. There were gaps between what the system &lt;em&gt;was supposed&lt;/em&gt; to do and what the code actually did. That disconnect made onboarding harder than it needed to be. I’ve always believed that the best documentation is often the code itself, but that only works when the code is &lt;strong&gt;readable, modular, and semantically meaningful&lt;/strong&gt;. In this case, I was dealing with a codebase that had accumulated &lt;strong&gt;poor naming conventions, logic sprawl, and limited structure&lt;/strong&gt;, which made it feel like I was reverse-engineering behavior instead of working with an intentionally designed system. It took me a while to mentally map how each function connected to a business workflow, and often I had to rely on multiple sources—logs, QA inputs, and even trial-and-error debugging—to fully understand the purpose of certain components. That cognitive overhead slowed me down initially and made early decisions riskier than I liked.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Balancing Leadership, Reviews, and Individual Contribution&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Another major challenge was &lt;strong&gt;balancing technical leadership with hands-on contribution&lt;/strong&gt;. To move the transformation forward at a sustainable pace, I had to go beyond guiding the work—I had to get involved in the work. I love writing code, and during the early stages of cleanup, &lt;strong&gt;I was actively implementing changes, setting up tools, fixing tests, and writing automation&lt;/strong&gt;. But that came with a cost. As a lead, I was also pulled into multiple meetings—syncs with stakeholders, platform discussions, issue triage, and architecture planning. Add to that &lt;strong&gt;the weight of code reviews&lt;/strong&gt;, planning sessions, and mentoring, and it became increasingly difficult to manage my time. While it was rewarding to stay hands-on, it required constant context switching and discipline to ensure I wasn’t bottlenecking others or burning out myself. I had to build boundaries around deep work time and become more intentional about &lt;strong&gt;prioritizing leadership tasks without losing my engineering rhythm&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Driving Cultural Change Through Code Reviews&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Introducing cultural change is never instant—especially when it comes to &lt;strong&gt;engineering discipline and code quality expectations&lt;/strong&gt;. Early on, many of our review sessions turned into &lt;strong&gt;mini workshops&lt;/strong&gt;, where I’d explain why we needed early returns, how to name things clearly, or why separating concerns was critical for reusability. While the team was wonderfully teachable and open-minded, these sessions often &lt;strong&gt;made reviews longer and more involved&lt;/strong&gt;. It wasn’t just about green checks—it was about transferring thinking patterns and reshaping habits. I didn’t want to enforce standards through silence or bureaucracy; I wanted to help the team see the “why” behind each change. Over time, this started to stick—engineers began reflecting those practices in their pull requests, asking better questions, and thinking more critically about structure. But the &lt;strong&gt;emotional and cognitive load of being both a gatekeeper and a teacher&lt;/strong&gt; was something I had to carry consistently, and it’s one of the less visible but most persistent challenges in trying to build a better culture.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion: From Foundation to Flow
&lt;/h1&gt;

&lt;p&gt;Looking back, this journey wasn’t just about speeding up deployments or cleaning up a codebase—it was about building &lt;strong&gt;clarity, culture, and confidence&lt;/strong&gt; into a system and a team that were already doing their best with what they had. When I joined, the signs of potential were everywhere: a team that cared, a product with purpose, and a system that—despite its complexity—had survived real-world pressure. But to go from surviving to thriving, we had to be intentional. We had to understand what was slowing us down, challenge it at its roots, and rebuild with scale and sustainability in mind.&lt;/p&gt;

&lt;p&gt;The improvements didn’t happen overnight. From setting up local development environments and refactoring legacy code, to establishing CI/CD pipelines and writing test coverage policies, every step required focus, patience, and a willingness to collaborate. We untangled infrastructure, designed reusable patterns, centralized configurations, and introduced observability tools that brought transparency to everyone—from engineers to product leads. We shifted away from reactive firefighting to proactive design, and began using data to guide our decisions, track our growth, and prove our value.&lt;/p&gt;

&lt;p&gt;But perhaps the most meaningful transformation wasn’t in the code—it was in the team. We evolved how we work together. Engineers became more confident, more consistent, and more aware of their impact. QA gained tools to test smarter and faster. Stakeholders got visibility into what's really happening. And as for me, I got to witness the kind of change that can only happen when &lt;strong&gt;people trust the process and commit to the long haul&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;There’s still more to do. There always will be. But the foundation is solid now, and the flow has begun. We’re no longer just delivering—we’re delivering &lt;strong&gt;well&lt;/strong&gt;, and that’s the kind of velocity that matters most.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Understanding My Relationship with AI Through the Lens of TAM</title>
      <dc:creator>JOOJO DONTOH</dc:creator>
      <pubDate>Sat, 10 May 2025 06:33:29 +0000</pubDate>
      <link>https://dev.to/joojodontoh/understanding-my-relationship-with-ai-through-the-lens-of-tam-368e</link>
      <guid>https://dev.to/joojodontoh/understanding-my-relationship-with-ai-through-the-lens-of-tam-368e</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Today I want to talk about something a little different. During my master’s degree, I wrote a short paper on technology acceptance. Seeing parts of that thinking show up in how people are adopting AI today led me to write this piece.&lt;/p&gt;

&lt;p&gt;We all have our own way of deciding whether a new piece of technology is worth using. Sometimes it just clicks (PlayStation, anyone?). Other times, we hesitate, question it, or drop it altogether. That’s where models come in—to help explain how and why people adopt new tech.&lt;/p&gt;

&lt;p&gt;There are plenty of these models out there. One of the most well-known is the &lt;strong&gt;Technology Acceptance Model (TAM)&lt;/strong&gt;. Another is the &lt;strong&gt;Innovation Diffusion Theory (IDT)&lt;/strong&gt;. A few years ago, researchers combined them to better understand how people were responding to blockchain technology. Their model, still in the hypothesis stage as of 2017, laid out a bunch of helpful ideas that feel just as relevant today—especially when it comes to AI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvezq9gr71n8o1dnnhll.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhvezq9gr71n8o1dnnhll.JPG" alt=" " width="800" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article doesn’t try to prove the model’s hypotheses, but I’ll use them to explain how certain ideas from the model have shown up in my personal adoption of AI.&lt;/p&gt;

&lt;p&gt;The model looks at how useful and easy a technology feels, how well it fits into our existing habits (compatibility), how much better it is than what came before (relative advantage), and how complex it seems (complexity). Together, these things shape our attitude, our intention, and ultimately whether we actually use the technology.&lt;/p&gt;

&lt;p&gt;Here’s a quick summary of what the model suggests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If something feels &lt;strong&gt;useful&lt;/strong&gt; and &lt;strong&gt;easy&lt;/strong&gt;, we’re more likely to use it.&lt;/li&gt;
&lt;li&gt;If it &lt;strong&gt;fits our needs&lt;/strong&gt; and seems &lt;strong&gt;better than what we had&lt;/strong&gt;, that boosts our interest.&lt;/li&gt;
&lt;li&gt;If it feels &lt;strong&gt;too complex&lt;/strong&gt;, that can get in the way—even if it’s powerful.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even though the original model focused on blockchain, I’ve found that it reflects a lot of how I’ve come to use AI in my own life and work. So in this article, I want to walk through how these ideas apply to &lt;strong&gt;my relationship with AI&lt;/strong&gt;—what pulled me in, what slowed me down, and what finally made it feel worth using.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Deeper dive&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Before diving into how this applies to AI, let’s take a closer look at the model. The &lt;strong&gt;Technology Acceptance Model (TAM)&lt;/strong&gt;—especially when extended with ideas from the Innovation Diffusion Theory—tries to explain why we accept or reject new technologies. It starts with three foundational factors:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compatibility&lt;/strong&gt;&lt;br&gt;
This refers to how well the technology fits into your existing habits, tools, or lifestyle. If something aligns with how you already think or work, you're far more likely to adopt it. For example, if you're already using cloud tools, switching to a new cloud-based AI service feels natural—it’s compatible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Relative Advantage&lt;/strong&gt;&lt;br&gt;
This is about whether the new technology clearly offers something &lt;em&gt;better&lt;/em&gt; than what came before. It could be faster, cheaper, more accurate, or just more convenient. If the benefits are obvious and meaningful, people are more motivated to try it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Complexity&lt;/strong&gt;&lt;br&gt;
This looks at how difficult the technology feels to understand or use. The more confusing or overwhelming it is, the less likely people are to embrace it—no matter how powerful it is. Simplicity matters, especially early on.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These three factors shape how we experience the technology in two major ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Perceived Usefulness&lt;/strong&gt; – Does this actually help me get things done better or faster?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Perceived Ease of Use&lt;/strong&gt; – Is this simple enough for me to pick up without frustration?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, these perceptions form the backbone of our &lt;strong&gt;attitude toward using&lt;/strong&gt; the technology. If that attitude is positive, it leads to &lt;strong&gt;behavioral intention&lt;/strong&gt; (the decision to try or continue using it), which then leads to &lt;strong&gt;actual use&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In short, TAM lays out a kind of emotional and cognitive journey:&lt;br&gt;
&lt;strong&gt;Fit → Benefits → Simplicity → Attitude → Intention → Action.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It’s not just about what the tech can do—it’s about how it feels to use it.&lt;/p&gt;

&lt;p&gt;Here’s a paragraph that fits into your current section and captures how &lt;strong&gt;compatibility&lt;/strong&gt; applies to you personally, using your structure and tone:&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How do these factors apply to me&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Compatibility (How well it fits)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To make sense of how compatibility plays out in my relationship with AI, I’m going to break it down into &lt;strong&gt;entity pairs&lt;/strong&gt;—essentially, how well two things work together. This approach mirrors the basic idea behind compatibility: &lt;em&gt;does this new thing align with what already exists?&lt;/em&gt; So I’ll be looking at how AI fits with &lt;strong&gt;me as a person&lt;/strong&gt;, how it fits with the &lt;strong&gt;tools and devices I use&lt;/strong&gt;, and how I, in turn, relate to those tools. These pairings help unpack not just whether AI &lt;em&gt;works&lt;/em&gt;, but whether it &lt;em&gt;fits&lt;/em&gt;—with my mindset, my habits, and my environment.&lt;/p&gt;

&lt;p&gt;First, there’s no inner conflict between me and the concept of AI. I’ve been in software engineering for years, so the idea of feeding structured/semi-structured data into a system and getting meaningful output doesn’t feel foreign—it feels normal. In that sense, AI and I are a good match. &lt;br&gt;
Then there’s the compatibility between AI and the devices I rely on. Whether it’s my phone, MacBook, iPad, or smart speakers, AI already integrates with many of them, often invisibly, through personal assistants and smart apps. &lt;br&gt;
Finally, there’s the link between me and those devices themselves. I spend a lot of time working with technology and virtual assistants, which makes the transition to AI-enhanced workflows almost frictionless. In short, AI fits well into both my mindset and my digital environment—it didn’t need to force its way in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Relative Advantage (How it’s better than before)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Relative advantage&lt;/strong&gt; is all about whether AI gives me a &lt;strong&gt;noticeable improvement&lt;/strong&gt; over the way I used to work, think, or solve problems. It’s not just about what AI can do—it’s about what it can do &lt;strong&gt;better&lt;/strong&gt;, faster, or more intelligently compared to my previous tools or workflows. In this section, I’ll explore this idea by looking at how AI has helped me in &lt;strong&gt;both professional and personal contexts&lt;/strong&gt;, using a few practical examples to show where the advantage really shows up.&lt;/p&gt;

&lt;p&gt;At work, AI has become more than a novelty tbh it’s a &lt;strong&gt;strategic companion&lt;/strong&gt;. I use it to break down complex technical problems into more manageable parts. It gives me access to a curated knowledge base that spans across domains and industries—something no single search or documentation site can offer. When I'm stuck on a design decision, I can use AI as a sounding board to validate assumptions or explore alternatives. It has helped me &lt;strong&gt;spot edge cases&lt;/strong&gt; I hadn’t considered in architectural discussions, and has become a go-to tool for &lt;strong&gt;code generation&lt;/strong&gt;, &lt;strong&gt;implementation strategies&lt;/strong&gt;, and even &lt;strong&gt;migration planning&lt;/strong&gt;. In moments when I need to move fast without losing depth, AI offers clarity and speed.&lt;/p&gt;

&lt;p&gt;Outside of work, the relative advantage shows up in &lt;strong&gt;small but meaningful ways&lt;/strong&gt;. I often use AI to answer quick questions or explore unfamiliar topics—replacing the old process of jumping through multiple tabs or forums. It helps me refine emails (honestly this is a big one), especially when I want to strike the right tone or simplify a message. And sometimes, when I need dinner ideas, it becomes a personalized recipe engine that saves me time and mental energy. My current banana cake recipe is AI generated.&lt;/p&gt;

&lt;p&gt;In all of these cases, AI doesn’t just replicate what I used to do—it &lt;strong&gt;elevates it&lt;/strong&gt; and I think this is one of the most important bits (enhancement and not replacement). That’s what makes it relatively advantageous: it extends my capability, accelerates my thinking, and fills in knowledge gaps I didn’t even know I had. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Complexity (How hard it is to understand or use)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI is, by nature, incredibly complex. Behind the scenes, there are &lt;strong&gt;machine learning models&lt;/strong&gt;, &lt;strong&gt;massive datasets&lt;/strong&gt;, and layers of statistical reasoning that all work together to produce something that feels intelligent. It involves everything from neural networks to natural language processing, and these systems don’t just respond—they learn, adapt, and generate context-aware answers. When you think about the engineering and math that goes into building these systems, it’s no surprise that AI can seem intimidating.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwuvqc3y1u6xwdbul6hnz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwuvqc3y1u6xwdbul6hnz.png" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But here’s the thing: all that complexity has been &lt;strong&gt;beautifully abstracted&lt;/strong&gt;. Though I have a fair understanding of how some of the items like transformer models, embeddings, vectors or search algorithms work under the hood, I really appreciate how these apps and platforms I use don’t actually require me to understand how embeddings represent meaning in high-dimensional space. Instead, I interact with AI through &lt;strong&gt;simple input boxes&lt;/strong&gt;, I get help building &lt;strong&gt;context naturally&lt;/strong&gt; through prompts and clarifications, and I receive &lt;strong&gt;surprisingly meaningful outputs&lt;/strong&gt;—whether I’m asking for architectural advice or rewriting a sentence. Pure gold&lt;/p&gt;

&lt;p&gt;This abstraction removes the complexity from my day-to-day experience almost entirely. As a user, I don’t feel overwhelmed or buried in technical detail. I just use it—and it works. In many ways, it’s like driving a high-performance car without needing to know how the engine is tuned.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;All roads lead to actual use&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;When I reflect on why AI has become part of my life, it’s because of how well it fits for certain tasks, how much better it makes things, and how effortlessly I’m able to interact with it. AI feels like a seamless extension of the tools I already use. I don’t have to jump through hoops to access it or figure out how it works. This is all too familiar in a way that the input-prompt model mirrors how I’ve always used search engines. This effectively presents a low barrier of adoption for me. Beyond that, AI actually helps me get where I want to go. In my work, it allows me to reach my goals efficiently and effectively. Outside of work, it’s just as valuable—whether I’m refining an email, learning something new, or figuring out what to cook. The tools I use abstract away all the complexity happening under the hood, so I never feel intimidated or overwhelmed. It doesn’t feel like I’m using advanced machine learning models, it feels like I’m just having a conversation. And thanks to my small background in machine learning, I have an idea of how complex these systems really are, which only deepens my appreciation for how simple they’ve made the experience.&lt;/p&gt;

&lt;p&gt;All of that naturally shapes how I feel about using AI. I have a positive attitude toward it because my experience has consistently been smooth, helpful, and efficient. That attitude makes me more likely to keep using it, and more open to exploring new ways it could support my work and life. It’s not a tool I’m testing anymore—it’s a tool I &lt;em&gt;intend&lt;/em&gt; to use safely and regularly. And that intention shows up in my behavior. The line between trying it out and actually using it has quietly disappeared. Heck I pay for it 😂&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Spoof, the bad and the fugly&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The journey through the TAM model hasn’t always been smooth. One thing that’s easy to overlook is that TAM isn’t a one-time path you walk through and complete. It’s not a straight line from "this is useful" to "I use it forever." Instead, it’s &lt;strong&gt;dynamic&lt;/strong&gt;—your perceptions can shift, and in some cases, the very things that once made a tool feel useful and easy can start to erode. Over time, the &lt;strong&gt;good aspects need to be reinforced&lt;/strong&gt;, or else the &lt;strong&gt;bad ones begin to grow&lt;/strong&gt;, especially as the product evolves—or fails to.&lt;/p&gt;

&lt;p&gt;In my experience with AI, there have been clear breaches in that positive flow. I have to admit, there are moments when the system just doesn’t deliver. During research, I’ve encountered &lt;strong&gt;hallucinations&lt;/strong&gt;—responses that sound confident but are completely made up. When I push deeper with more nuanced or technical questions, I start to see the &lt;strong&gt;limits in reasoning&lt;/strong&gt; or coherence. Sometimes, AI forgets the context I’ve already provided, or applies it incorrectly, forcing me to rephrase, reframe, or start again. And in those moments, my perception of ease and usefulness takes a hit. In moments like these, its feels more brittle than smooth.&lt;/p&gt;

&lt;p&gt;When that happens, I find myself taking a step back. I double-check everything. I slow down. I cross-reference with my own knowledge or other sources. That extra cognitive load—the need to &lt;strong&gt;proofread, verify, and interpret&lt;/strong&gt;—interrupts the simplicity I had come to expect. Thankfully, things have improved over time. The models have gotten better, and so have the interfaces that help manage these limitations. But those setbacks are still reminds me that that &lt;strong&gt;usefulness is fragile&lt;/strong&gt;, and &lt;strong&gt;ease of use is conditional&lt;/strong&gt;. Two of the key things to not in my experience with this mode&lt;/p&gt;

&lt;p&gt;There’s also something more personal I’ve noticed. I’ve come to believe that keeping a &lt;strong&gt;healthy distance from AI&lt;/strong&gt; is important—not just for accuracy, but for maintaining my own cognitive edge. It’s easy to slip into a mode of over-reliance, especially when a tool is so capable and responsive. But leaning on AI for things I already know—or could figure out with a bit of effort—can dull that edge over time. I think there’s value in doing things the “manual” way sometimes. Recalling knowledge. Struggling a bit. Solving things on my own. That’s not a rejection of AI; it’s a reminder that &lt;strong&gt;human ability still matters&lt;/strong&gt;, and that AI is a tool—not a crutch.&lt;/p&gt;

&lt;p&gt;So yes, the TAM can work in reverse. If a system starts to feel less reliable, or too controlling, or too limiting, it can cause attitudes to shift. Intention drops. Actual use fades. And unless those gaps are acknowledged and improved, even the most impressive tech can lose its place in your workflow. For me, staying conscious of these cracks helps me use AI wisely—not blindly. It keeps the relationship &lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Ok,  Technology Acceptance Model (TAM) Matters, Why should we care???&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Startups and Product Design&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For startups, TAM isn’t just a theoretical model—it’s a tactical advantage. In the early stages of building a product, every decision counts. TAM provides a structured way to think about how real users will perceive and interact with the product. Startups can intentionally design their tools to be more &lt;strong&gt;compatible&lt;/strong&gt; with users’ existing habits and workflows, reducing the barrier to entry. They can also focus on delivering a &lt;strong&gt;clear relative advantage&lt;/strong&gt; over existing solutions—whether it's faster execution, better insights, or increased convenience. At the same time, simplifying onboarding and usage helps reduce &lt;strong&gt;perceived complexity&lt;/strong&gt;, allowing users to quickly experience value without feeling overwhelmed.&lt;/p&gt;

&lt;p&gt;By keeping these factors in mind—compatibility, advantage, and simplicity—startups are better positioned to shape user perceptions of usefulness and ease. These are critical to early adoption. Additionally, TAM helps startups test whether they’re truly solving a meaningful problem in a way that users will embrace. If users don’t find it useful or easy to adopt, that’s a signal for iteration. TAM trends can also be used to forecast whether the product will gain traction in a competitive market or fall into the trap of being too niche, too hard to use, or just not compelling enough.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Understanding Technology Adoption Over Time&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;TAM, along with the models that have extended or evolved from it, offers a window into how people have historically accepted (or resisted) new technology. What makes TAM especially interesting is that it doesn't isolate adoption to just features or technical specs—it looks at &lt;strong&gt;human perception&lt;/strong&gt;, which is often influenced by much more than just functionality. Over time, researchers have built on TAM to incorporate new dimensions like trust, user experience, hype, and social influence.&lt;/p&gt;

&lt;p&gt;These models reveal a lot about how &lt;strong&gt;technology acceptance isn’t always driven by individual logic&lt;/strong&gt;. In many cases, people use new tools because they’re &lt;strong&gt;socially pressured&lt;/strong&gt; to, or because someone they trust has validated the tool. This is especially true in workplace environments or tight-knit communities. Influencers, managers, or even teammates can significantly affect whether a tool is picked up or pushed aside. TAM captures this broader narrative—that &lt;strong&gt;adoption is both individual and collective&lt;/strong&gt;, and shaped by shifting social, technical, and emotional factors. Understanding these patterns helps us predict not only &lt;em&gt;if&lt;/em&gt; a tool will be adopted, but &lt;em&gt;why&lt;/em&gt; and &lt;em&gt;under what conditions&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Enterprise Software Adoption&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In larger organizations, where buying and rolling out software affects hundreds or thousands of users, TAM becomes a critical decision-making tool. It allows companies to think beyond cost and features, and instead ask: &lt;em&gt;Will our employees actually use this?&lt;/em&gt; Companies can apply TAM to assess internal attitudes toward new tools—either through surveys, small pilots, or feedback loops—and use that insight to design better onboarding, training, and internal communication strategies. For example, if employees perceive a tool as too complex or irrelevant, adoption rates will be low, no matter how powerful the tool is on paper. A clear example of this for me was a software in one of my earlier companies used for time sheets.&lt;/p&gt;

&lt;p&gt;To counteract this, some companies introduce &lt;strong&gt;gamification&lt;/strong&gt;—reward systems, challenges, or social features—to increase engagement and shift employee attitudes toward the tool. Over time, these tactics can improve perceived ease of use and usefulness, which in turn boosts intention and actual use. These days, businesses don’t have to start from scratch. They can look at how other companies in their industry have applied TAM principles to software adoption, learning from their successes or failures. In this way, TAM becomes not just a research tool, but a &lt;strong&gt;practical playbook for driving internal tech adoption&lt;/strong&gt; at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;My journey with AI has mirrored many of the ideas in the Technology Acceptance Model—how something feels useful, how easy it is to use, how well it fits into my life, and how much better it makes things compared to what came before. But just like any relationship with technology, it hasn’t been perfect. There have been moments of friction, doubt, and necessary distance. What this model reminds me is that adoption isn’t a single decision—it’s a continuous process. As AI continues to evolve, so will my experience with it. And by staying aware of what makes it valuable—and where it falls short—I can keep using it in a way that supports me, not replaces me.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Building a Scalable, Reliable, and Cost-Effective Event Scheduler for Asynchronous Jobs</title>
      <dc:creator>JOOJO DONTOH</dc:creator>
      <pubDate>Mon, 20 Jan 2025 09:32:04 +0000</pubDate>
      <link>https://dev.to/joojodontoh/building-a-scalable-reliable-and-cost-effective-event-scheduler-for-asynchronous-jobs-2ac3</link>
      <guid>https://dev.to/joojodontoh/building-a-scalable-reliable-and-cost-effective-event-scheduler-for-asynchronous-jobs-2ac3</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;Introduction&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Welcome back to my blog! 😁 This is where I talk to myself—and hopefully, to you—about the engineering problems I solve at work. I do this mainly because finding solutions excites me. My journey of identifying inefficiencies, bottlenecks, and challenges has led me to tackle a common yet critical problem in software engineering.&lt;/p&gt;

&lt;p&gt;That problem is the need to execute actions asynchronously—often with precise timing and sometimes on a recurring basis. Following my core approach to problem-solving (across space and time), I decided to build a solution that wasn’t just tailored to a single action but was extendable to various use cases. Whether it's sending notifications, processing transactions, or triggering system workflows, many tasks require scheduled execution. Without a robust scheduling mechanism, handling these jobs efficiently can quickly become complex, unreliable, and costly.&lt;/p&gt;

&lt;p&gt;To address this, I set out to build a &lt;strong&gt;scalable, reliable, and cost-effective event scheduler&lt;/strong&gt;—one that could manage delayed, immediate and recurring actions seamlessly.  &lt;/p&gt;

&lt;p&gt;In this article, I’ll walk you through:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The problem that led to the need for an event scheduler
&lt;/li&gt;
&lt;li&gt;The functional and non-functional requirements for an ideal solution
&lt;/li&gt;
&lt;li&gt;The system design and architecture decisions behind the implementation
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end, you’ll have a clear understanding of how to build a &lt;strong&gt;serverless scheduled actions system&lt;/strong&gt; that ensures &lt;strong&gt;accuracy, durability, and scalability&lt;/strong&gt; while keeping costs in check. Let’s dive in!  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.giphy.com%2Fmedia%2FD8wcIurZrew5drwWdl%2Fgiphy.gif%3Fcid%3D790b76110iuxn9y1efgjo6yhbi0d7hazjc2dsez1ufa0i6ju%26ep%3Dv1_gifs_search%26rid%3Dgiphy.gif%26ct%3Dg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.giphy.com%2Fmedia%2FD8wcIurZrew5drwWdl%2Fgiphy.gif%3Fcid%3D790b76110iuxn9y1efgjo6yhbi0d7hazjc2dsez1ufa0i6ju%26ep%3Dv1_gifs_search%26rid%3Dgiphy.gif%26ct%3Dg" alt="GIF" width="480" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Problem: Managing Subscription Changes Across a Calendar Cycle&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Subscription management comes with unique challenges, especially when handling cancellations or downgrades 😭. Users can request these changes at any time during their billing cycle, but due to the &lt;strong&gt;prepaid nature of subscriptions&lt;/strong&gt;, such modifications can only take effect at the &lt;strong&gt;end of the cycle&lt;/strong&gt;. This delay introduces a need for &lt;strong&gt;asynchronous execution&lt;/strong&gt;—a system that can record these requests immediately but defer their execution until the appropriate time.  &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Solution: A proper scheduling mechanism&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Without a proper scheduling mechanism, managing these deferred actions efficiently becomes complex. The system must ensure that every request is executed &lt;strong&gt;at the right time&lt;/strong&gt; while preventing missed or duplicate actions. Furthermore, frequent executions—such as batch processing of multiple scheduled changes—must be handled without overwhelming the system. To address this, we needed a &lt;strong&gt;reliable, scalable, and cost-effective scheduler&lt;/strong&gt; capable of handling &lt;strong&gt;delayed and recurring execution&lt;/strong&gt; seamlessly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu950h0gavjrck1gcmtgm.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu950h0gavjrck1gcmtgm.jpeg" alt=" " width="800" height="392"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Functional Requirements: Defining the Core Capabilities
&lt;/h3&gt;

&lt;p&gt;A robust and scalable scheduled actions system must be able to efficiently schedule, execute, update, monitor, and retry actions while ensuring reliability and flexibility.  &lt;/p&gt;

&lt;h3&gt;
  
  
  1. Scheduling and Creating Actions
&lt;/h3&gt;

&lt;p&gt;The system must allow users to schedule actions with:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Action type, execution time, execution data, and metadata as required fields.
&lt;/li&gt;
&lt;li&gt;Optional fields like repeat, frequency, and execution remainder.
&lt;/li&gt;
&lt;li&gt;Early validation to ensure actions conform to their fulfillment requirements.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Updating or Deleting an Action
&lt;/h3&gt;

&lt;p&gt;Users can update or delete an action before it is locked (2 minutes before execution). Once locked, no external changes are allowed.  &lt;/p&gt;

&lt;h3&gt;
  
  
  3. Action Status Management
&lt;/h3&gt;

&lt;p&gt;Each action must have an internally managed status that reflects its execution progress. Status transitions and results must be logged in metadata for tracking.  &lt;/p&gt;

&lt;h3&gt;
  
  
  4. Action Fulfillment Mapping
&lt;/h3&gt;

&lt;p&gt;Every action must map to a specific fulfillment service responsible for its execution. Actions without a matching fulfillment service must be flagged to prevent execution errors.  &lt;/p&gt;

&lt;h3&gt;
  
  
  5. Retrying Failed Actions
&lt;/h3&gt;

&lt;p&gt;Failed actions must retry using exponential backoff to handle temporary failures. Actions that exceed the maximum retry limit must be flagged for manual intervention.  &lt;/p&gt;

&lt;h3&gt;
  
  
  6. Handling Immediate vs. Delayed Actions vs. Repeated Actions
&lt;/h3&gt;

&lt;p&gt;The system must distinguish between immediate and delayed actions to ensure timely execution:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Immediate actions (execution within 2 minutes) must be processed in real-time without scheduling delays.
&lt;/li&gt;
&lt;li&gt;Delayed actions (execution after 2 minutes) must be scheduled and processed at the correct time.&lt;/li&gt;
&lt;li&gt;Repeated actions must be proceed for the required number of times at the required frequency&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Non-Functional Requirements (NFR): Ensuring a Reliable and Scalable System&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A scheduled actions system must meet key &lt;strong&gt;NFRs&lt;/strong&gt; to guarantee reliability, scalability, security, and maintainability.  &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Reliability and Durability&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Actions must execute &lt;strong&gt;correctly and on time&lt;/strong&gt; (±2 minutes).
&lt;/li&gt;
&lt;li&gt;Repeating actions must execute &lt;strong&gt;exactly as scheduled&lt;/strong&gt; with the correct frequency.
&lt;/li&gt;
&lt;li&gt;Failed actions must &lt;strong&gt;retry with exponential backoff&lt;/strong&gt;, and non-repeating actions must execute &lt;strong&gt;only once&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Scalability&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The system must &lt;strong&gt;scale dynamically&lt;/strong&gt; to handle high request loads.
&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;serverless architecture&lt;/strong&gt; ensures cost-efficiency and flexibility.
&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;queue-based approach&lt;/strong&gt; (e.g., AWS SQS) must regulate execution frequency to prevent overloading downstream services.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Availability&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The system must be &lt;strong&gt;always available&lt;/strong&gt; with &lt;strong&gt;no cold starts&lt;/strong&gt;, ensuring immediate execution when needed. A serverless architecture supports this with reasonable cost &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Security&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Signature-based validation&lt;/strong&gt; must secure requests and prevent unauthorized execution.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Maintainability&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;The system must be &lt;strong&gt;modular, encapsulated, and organized&lt;/strong&gt; within a &lt;strong&gt;single repository&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure and database indexing rules&lt;/strong&gt; must be codified.
&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;typed language&lt;/strong&gt; must be used for better reliability.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local testing&lt;/strong&gt; must be enabled with encrypted environment variables.
&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;startup script&lt;/strong&gt; can automate package installation and environment setup.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Comprehensive tests&lt;/strong&gt; must ensure safe changes and integration.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;6. Observability&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;API endpoints must expose:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;All scheduled actions&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actions filtered by status&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retry functionality for failed actions&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;A &lt;strong&gt;centralized logging system&lt;/strong&gt; must track execution issues consistently.
&lt;/li&gt;

&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Tools: Powering the Scheduled Actions System&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxp2eblgjfvcy8vuy1i12.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxp2eblgjfvcy8vuy1i12.png" alt="Infrastructure Diagram" width="800" height="157"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;AWS Lambda: Serverless Compute for Execution&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Enables event-driven execution without managing servers.
&lt;/li&gt;
&lt;li&gt;Handles action scheduling and validation.
&lt;/li&gt;
&lt;li&gt;Processes immediate actions using real-time event streams.
&lt;/li&gt;
&lt;li&gt;Executes delayed actions at the scheduled time.
&lt;/li&gt;
&lt;li&gt;Manages fulfillment tasks based on the action type.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Amazon EventBridge: Managing Scheduled Execution&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Acts as a scheduler for delayed actions.
&lt;/li&gt;
&lt;li&gt;Polls for due pending actions every 5 minutes and enqueues them for processing.
&lt;/li&gt;
&lt;li&gt;Ensures execution happens within ±2 minutes of the scheduled time.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Amazon SQS: Queueing Actions for Scalability&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Decouples execution workloads by handling scheduled actions asynchronously.
&lt;/li&gt;
&lt;li&gt;Controls fulfillment request frequency to prevent system overload.
&lt;/li&gt;
&lt;li&gt;Uses FIFO (First-In-First-Out) processing to maintain execution order and prevent duplicate executions.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Amazon DynamoDB: Storing Scheduled Actions&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Serves as the primary database for storing scheduled actions.
&lt;/li&gt;
&lt;li&gt;Provides fast read/write operations for handling high workloads.
&lt;/li&gt;
&lt;li&gt;Stores metadata for tracking execution status, retries, and results.
&lt;/li&gt;
&lt;li&gt;Uses DynamoDB Streams to trigger immediate executions.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Amazon API Gateway: Exposing Endpoints for Management&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Provides HTTP endpoints for creating, updating, and deleting scheduled actions.
&lt;/li&gt;
&lt;li&gt;Exposes monitoring endpoints to retrieve actions by status and retry failed actions.
&lt;/li&gt;
&lt;li&gt;Ensures secure access with authentication and authorization mechanisms.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;System Design: Database Schema for Scheduled Actions&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;id&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unique identifier for each scheduled action.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;data&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stores execution-specific details.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;action&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Defines the type of action to execute.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;executionTime&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Specifies when the action should run.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;repeat&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Indicates if the action should repeat.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;frequency&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Defines the interval for recurring actions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;executionRemainder&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tracks the remaining number of executions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;status&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Execution state ("PENDING", "IN_PROGRESS", "COMPLETED", "FAILED").&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;createdAt&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Timestamp when the action was created.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;updatedAt&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Last modified timestamp.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;retryCount&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Counts failed execution retries.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;metadata&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stores logs and additional execution details.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Example: Scheduled Notification Action&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"mobile"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"60123456789"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"subject"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Test"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Joojo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"templateType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USER_LATE_PAYMENT_NOTIFICATION"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"notificationType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SMS"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"repeat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"frequency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DAILY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"executionRemainder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SEND_NOTIFICATION"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"executionTime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1736930117120&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Project structure&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I used a &lt;strong&gt;layered-modular approach&lt;/strong&gt; for maintainability, scalability, and ease of change. Many times, different teams may want to extend changes in a service without introducing unintended side effects. I tried to achieve this by organizing components into distinct modules. Let's dive deeper below&lt;/p&gt;

&lt;h5&gt;
  
  
  &lt;strong&gt;1. Single Application with a Modular Design&lt;/strong&gt;
&lt;/h5&gt;

&lt;p&gt;The entire system is built as a &lt;strong&gt;single application&lt;/strong&gt;, but with a &lt;strong&gt;modular structure&lt;/strong&gt; that separates concerns. Each module is responsible for a specific aspect of the system, making the codebase easier to navigate and modify.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./src
├── app.ts
├── clients
├── config
├── controllers
├── handlers
├── helpers
├── middleware
├── models
├── routes
├── service
├── types
└── utils
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  &lt;strong&gt;2. Serverless Handlers for Distributed Execution&lt;/strong&gt;
&lt;/h5&gt;

&lt;p&gt;The project is designed around AWS Lambda, with different &lt;strong&gt;handlers&lt;/strong&gt; exported and structured to allow seamless execution of scheduled actions. These handlers ensure that various tasks are processed independently, improving fault tolerance and scalability.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Action Handlers&lt;/strong&gt;: Manage creating, scheduling, retrieving, updating, deleting, and processing scheduled actions. This keeps all action-related logic centralized, making it easy to modify without affecting other parts of the system.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delayed Action Handlers&lt;/strong&gt;: Specifically handle actions that need to be initiated &lt;strong&gt;at a later time&lt;/strong&gt;. This separation ensures that delayed actions are efficiently scheduled and processed without interfering with real-time execution.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Immediate Action Handlers&lt;/strong&gt;: Trigger execution for actions that &lt;strong&gt;must start within 2 minutes&lt;/strong&gt;, using &lt;strong&gt;DynamoDB Streams&lt;/strong&gt; to detect changes and initiate execution instantly. This ensures timely processing of urgent tasks.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fulfillment Handlers&lt;/strong&gt;: Ensure that scheduled actions are executed properly by interacting with the appropriate fulfillment services. This design allows fulfillment logic to evolve independently of action scheduling.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;├── handlers
│   ├── fulfillment.ts
│   ├── initiate-scheduled-actions.ts
│   ├── initiate-stream-actions.ts
│   └── process.ts
       ├── http-apis.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  &lt;strong&gt;3. Maintainability through Separation of Concerns&lt;/strong&gt;
&lt;/h5&gt;

&lt;p&gt;Each module in the project is &lt;strong&gt;self-contained&lt;/strong&gt;, meaning changes to one component do not directly impact others. This reduces the risk of breaking existing functionality and simplifies debugging.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Controllers&lt;/strong&gt; handle request routing and execution logic.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Services&lt;/strong&gt; manage business logic and data interactions.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clients&lt;/strong&gt; interact with external services like databases, queues, and APIs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Models&lt;/strong&gt; define the data structures used across the system.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Middleware&lt;/strong&gt; ensures that requests pass through validation and authentication layers.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Utilities&lt;/strong&gt; provide reusable helper functions for logging, error handling, and retries.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./src
├── clients
├── controllers
├── middleware
├── models
├── service
└── utils
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h5&gt;
  
  
  &lt;strong&gt;4. Ease of Extensibility&lt;/strong&gt;
&lt;/h5&gt;

&lt;p&gt;With a modular design, &lt;strong&gt;new features can be added without modifying core components&lt;/strong&gt;. For example:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A new type of scheduled action can be introduced by &lt;strong&gt;adding a new action in the fulfillment service&lt;/strong&gt; without modifying the existing scheduling or queuing logic.
&lt;/li&gt;
&lt;li&gt;A new external service integration can be implemented by &lt;strong&gt;extending the clients module&lt;/strong&gt;, ensuring seamless communication with third-party systems.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ageha1u742heply4okf.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ageha1u742heply4okf.jpeg" alt=" " width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Delayed Execution: Ensuring Timely Execution&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The system efficiently processes scheduled actions through &lt;strong&gt;periodic execution&lt;/strong&gt;, ensuring that all pending actions are executed at the right time without delays.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Periodic Execution for Scheduled Actions&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;Lambda function&lt;/strong&gt; periodically scans the database for actions with &lt;strong&gt;PENDING status&lt;/strong&gt; and an &lt;strong&gt;executionTime&lt;/strong&gt; that is due.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon EventBridge&lt;/strong&gt; acts as a scheduler, triggering this Lambda function &lt;strong&gt;every 5 minutes&lt;/strong&gt; to ensure that actions are picked up on time.
&lt;/li&gt;
&lt;li&gt;The function &lt;strong&gt;enqueues these pending actions into Amazon SQS&lt;/strong&gt;, ensuring a reliable and scalable execution pipeline.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why This Approach Works&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Efficient batch processing&lt;/strong&gt; ensures that multiple actions can be picked up at once.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt; is maintained by decoupling execution with &lt;strong&gt;SQS&lt;/strong&gt;, preventing system overload. Queues are extremely critical to handling load towards downstream systems. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State Management:&lt;/strong&gt; Actions follow a lifecycle (&lt;code&gt;PENDING → IN_PROGRESS → COMPLETED/FAILED/NO_ACTION&lt;/code&gt;), with each state &lt;strong&gt;persisted in the database&lt;/strong&gt; for tracking and recovery.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution Handling:&lt;/strong&gt; Successful executions are marked &lt;strong&gt;COMPLETED&lt;/strong&gt;, failures are marked &lt;strong&gt;FAILED&lt;/strong&gt;, and recurring actions update their execution remainder before resetting to &lt;strong&gt;PENDING&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic Retries:&lt;/strong&gt; Failed actions use &lt;strong&gt;exponential backoff&lt;/strong&gt; for retries. If retries exceed the limit, the action remains &lt;strong&gt;FAILED&lt;/strong&gt; until manually reset.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Idempotency &amp;amp; Data Integrity:&lt;/strong&gt; Execution remainders prevent duplicate executions, and invalid operations (e.g., negative remainders) are blocked.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability:&lt;/strong&gt; Metadata stores logs, execution timestamps, API responses, and failure reasons for easy debugging.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Immediate Execution: Handling Time-Sensitive Actions Using DynamoDB Streams&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Some scheduled actions require &lt;strong&gt;immediate execution&lt;/strong&gt; if their execution time is within &lt;strong&gt;2 minutes&lt;/strong&gt; of creation. To handle these efficiently, the system leverages &lt;strong&gt;DynamoDB Streams&lt;/strong&gt; and AWS Lambda for real-time processing.  &lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1. How Immediate Execution Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DynamoDB Streams&lt;/strong&gt; detect changes in the database when a new action is inserted or modified.
&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;Lambda function listens to these changes&lt;/strong&gt;, processes new actions, and determines whether they require immediate execution.
&lt;/li&gt;
&lt;li&gt;If an action is scheduled to execute &lt;strong&gt;within 2 minutes&lt;/strong&gt;, the Lambda function &lt;strong&gt;enqueues it into Amazon SQS&lt;/strong&gt; for execution.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. Breakdown of the Processing Logic&lt;/strong&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Listening to DynamoDB Stream Events&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;The function &lt;code&gt;initiateProcessFromDynamoStream&lt;/code&gt; is triggered whenever a new record is &lt;strong&gt;INSERTED&lt;/strong&gt; or &lt;strong&gt;MODIFIED&lt;/strong&gt; in DynamoDB.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;initiateProcessFromDynamoStream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DynamoDBStreamEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Records&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;Records&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;Records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;No records to process in DynamoDB stream event.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;Records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; records received.`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The function &lt;strong&gt;checks if there are new records&lt;/strong&gt; in the event.
&lt;/li&gt;
&lt;li&gt;If no records exist, the function exits early.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Processing Each Record&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;The function &lt;strong&gt;loops through each record&lt;/strong&gt;, extracts its details, and determines whether it needs to be processed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;processingPromises&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;DynamoDBRecord&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;eventName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;dynamodb&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;record&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;NewImage&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Skipping record: Missing NewImage.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cleanedImage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;unmarshall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NewImage&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cleaned NewImage object:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cleanedImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;INSERT&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;MODIFY&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;eventName&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Skipping record with eventName &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;eventName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Processing record with eventName: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;eventName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The function extracts &lt;strong&gt;newly inserted or modified data&lt;/strong&gt; from the DynamoDB stream.
&lt;/li&gt;
&lt;li&gt;It &lt;strong&gt;filters out irrelevant records&lt;/strong&gt; (i.e., records that don’t have a &lt;code&gt;NewImage&lt;/code&gt; or are not newly inserted/modified).
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Checking for Immediate Execution&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;The function then checks whether the action &lt;strong&gt;needs immediate execution&lt;/strong&gt; by calculating the time difference.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;retryCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;executionTime&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cleanedImage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Check buffer time logic&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;currentTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;timeUntilExecution&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;executionTime&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;currentTime&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;timeUntilExecution&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;TWO_MINUTES_IN_MS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`Skipping record with id &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;: Execution time is outside the 2-minute buffer window.`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;current time&lt;/strong&gt; is compared with the action’s &lt;strong&gt;executionTime&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;If the action is &lt;strong&gt;more than 2 minutes away&lt;/strong&gt;, it is &lt;strong&gt;skipped&lt;/strong&gt; (it will be picked up later by the periodic execution).
&lt;/li&gt;
&lt;li&gt;If the action &lt;strong&gt;needs immediate execution&lt;/strong&gt;, it continues processing.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Ensuring Valid Status and Retry Limits&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;STATUSES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PENDING&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Skipping record: Missing or invalid status.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retryCount&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;retryCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;CONSTANTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MAX_RETRY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`Skipping record with retryCount exceeding limit: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;retryCount&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Skipping record: Missing id.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resolve&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Ensures the action &lt;strong&gt;has a valid &lt;code&gt;PENDING&lt;/code&gt; status&lt;/strong&gt; before processing.
&lt;/li&gt;
&lt;li&gt;Checks whether the &lt;strong&gt;retry limit has been exceeded&lt;/strong&gt; to prevent infinite retries.
&lt;/li&gt;
&lt;li&gt;Ensures the &lt;strong&gt;action has a valid ID&lt;/strong&gt; before sending it to the queue.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Sending the Action to SQS for Execution&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cleanedImage&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`Failed to add action to the queue: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;If the action qualifies for &lt;strong&gt;immediate execution&lt;/strong&gt;, it is &lt;strong&gt;sent to SQS&lt;/strong&gt;, where it will be processed by the fulfillment service.
&lt;/li&gt;
&lt;li&gt;If &lt;strong&gt;SQS fails&lt;/strong&gt;, the action is &lt;strong&gt;marked as FAILED&lt;/strong&gt; and logged for debugging.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  &lt;strong&gt;3. Why This Approach is Reliable&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real-Time Execution:&lt;/strong&gt; Actions scheduled within 2 minutes execute immediately instead of waiting for periodic polling.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic Filtering:&lt;/strong&gt; Actions scheduled for later execution are &lt;strong&gt;skipped&lt;/strong&gt; and processed by EventBridge at the right time.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error Handling:&lt;/strong&gt; If an action &lt;strong&gt;fails to enqueue in SQS&lt;/strong&gt;, it is marked as &lt;strong&gt;FAILED&lt;/strong&gt; instead of being lost.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability:&lt;/strong&gt; The Lambda function can &lt;strong&gt;process multiple events concurrently&lt;/strong&gt;, ensuring no action is delayed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Repeated Execution: Managing Recurring Actions&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Some scheduled actions need to &lt;strong&gt;execute multiple times&lt;/strong&gt; at fixed intervals. The system handles repeated execution using three key fields:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;repeat&lt;/strong&gt; – Indicates whether the action should run multiple times.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;executionRemainder&lt;/strong&gt; – Tracks how many more times the action should execute.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;frequency&lt;/strong&gt; – Defines the time interval between executions.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1. Handling Repeated Execution&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The function &lt;code&gt;complete(id, notes)&lt;/code&gt; is responsible for managing the completion of actions. If an action is &lt;strong&gt;set to repeat&lt;/strong&gt;, it updates the execution time and tracks how many executions remain.  &lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Deducting Execution Remainder&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;newExecutionRemainder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;repeat&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;executionRemainder&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;executionRemainder&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newExecutionRemainder&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Execution remainder cannot be negative.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;If the action is &lt;strong&gt;repeating&lt;/strong&gt;, the execution remainder &lt;strong&gt;decreases by 1&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;If the remainder is &lt;strong&gt;less than 0&lt;/strong&gt;, an error is thrown to prevent unintended behavior.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;2. Completing the Final Execution&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repeat&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;newExecutionRemainder&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;STATUSES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;COMPLETED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;calculateTTL&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;executionRemainder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;executionResponses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...(&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;executionResponses&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[]),&lt;/span&gt; &lt;span class="nx"&gt;notes&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Final execution completed successfully:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;If &lt;strong&gt;no executions remain&lt;/strong&gt;, the action is marked as &lt;strong&gt;COMPLETED&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;TTL (Time-To-Live)&lt;/strong&gt; is set to &lt;strong&gt;delete the record after two weeks&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Execution metadata is updated for tracking and observability.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;3. Scheduling the Next Execution&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;If the action &lt;strong&gt;still has remaining executions&lt;/strong&gt;, the function schedules the next execution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;repeat&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;newExecutionRemainder&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;frequency&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;frequencyInMs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getFrequencyInMilliseconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;frequency&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;frequencyInMs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Invalid frequency: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;frequency&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;STATUSES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PENDING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;executionTime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;executionTime&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;frequencyInMs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;executionRemainder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;newExecutionRemainder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;executionResponses&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...(&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;executionResponses&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[]),&lt;/span&gt; &lt;span class="nx"&gt;notes&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Recurring action updated successfully:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;execution time is updated&lt;/strong&gt; by adding the interval from the frequency field.
&lt;/li&gt;
&lt;li&gt;The action status is set to &lt;strong&gt;PENDING&lt;/strong&gt; so it can be picked up again.
&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;metadata is updated&lt;/strong&gt; to log execution history.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  &lt;strong&gt;4. Frequency Conversion&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;The system converts predefined frequencies into milliseconds to update the execution time.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;frequencyDurations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;TEN_MINS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;HOURLY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;DAILY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;WEEKLY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;MONTHLY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;//Not accurate and for demonstration purposes&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This allows flexible scheduling based on predefined intervals.  &lt;/p&gt;




&lt;h4&gt;
  
  
  &lt;strong&gt;5. Ensuring Idempotency and Data Integrity&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;For non-repeating actions, the system ensures that execution happens &lt;strong&gt;only once&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;STATUSES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;COMPLETED&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;calculateTTL&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;notes&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Action completed successfully with TTL:&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Actions that &lt;strong&gt;do not repeat&lt;/strong&gt; are marked &lt;strong&gt;COMPLETED&lt;/strong&gt; immediately.
&lt;/li&gt;
&lt;li&gt;The TTL ensures that &lt;strong&gt;data is retained for a limited time before deletion&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  &lt;strong&gt;Why This Approach Works&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automated Rescheduling&lt;/strong&gt; – The system automatically sets the next execution time.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preventing Overexecution&lt;/strong&gt; – Execution stops when the remainder reaches &lt;strong&gt;zero&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Efficient Tracking&lt;/strong&gt; – Each execution updates metadata for debugging and observability.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Integrity&lt;/strong&gt; – Ensures that frequency values are valid and that the execution remainder is correctly decremented. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fua50x9rwhvciuekxxmhr.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fua50x9rwhvciuekxxmhr.jpeg" alt="Process" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Action Processing: Ensuring Reliable Execution with Deduplication&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The system processes scheduled actions using &lt;strong&gt;Amazon SQS FIFO Queues&lt;/strong&gt; or &lt;strong&gt;Redis-based deduplication&lt;/strong&gt; to ensure each action is executed only once, preventing duplicate processing.  &lt;/p&gt;




&lt;h4&gt;
  
  
  &lt;strong&gt;1. Handling Action Processing with SQS FIFO&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Messages are sent to an &lt;strong&gt;Amazon SQS FIFO queue&lt;/strong&gt;, ensuring actions are processed in &lt;strong&gt;first-in-first-out order&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FIFO queues guarantee deduplication&lt;/strong&gt;, preventing the same message from being processed multiple times.
&lt;/li&gt;
&lt;li&gt;This approach is &lt;strong&gt;ideal for strict ordering and exactly-once processing&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  &lt;strong&gt;2. Alternative Deduplication Using Redis&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;If &lt;strong&gt;a FIFO queue is not used&lt;/strong&gt;, the system leverages &lt;strong&gt;Redis&lt;/strong&gt; to manage deduplication before sending messages to a standard SQS queue.  &lt;/p&gt;

&lt;h5&gt;
  
  
  &lt;strong&gt;How Redis Deduplication Works&lt;/strong&gt;
&lt;/h5&gt;

&lt;ul&gt;
&lt;li&gt;Each message is assigned a &lt;strong&gt;deduplication ID&lt;/strong&gt; based on:

&lt;ul&gt;
&lt;li&gt;The action’s &lt;strong&gt;unique ID&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;status&lt;/strong&gt; of the action
&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;retry count&lt;/strong&gt; (if applicable)
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;deduplicationId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;messageBody&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;messageBody&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;messageBody&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;retryCount&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;redisKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`sqs-deduplication:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;deduplicationId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Before sending a message to SQS, Redis &lt;strong&gt;checks if the deduplication ID exists&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;redisCheck&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;RedisClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;redisKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;redisCheck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;success&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;redisCheck&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s2"&gt;`Duplicate message detected. Skipping send for ID: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;deduplicationId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;If a duplicate is &lt;strong&gt;detected&lt;/strong&gt;, the message is &lt;strong&gt;not sent&lt;/strong&gt;, avoiding redundant processing.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  &lt;strong&gt;3. Sending Messages to SQS&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;If the action is &lt;strong&gt;not a duplicate&lt;/strong&gt;, it is sent to the &lt;strong&gt;SQS queue&lt;/strong&gt; for processing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;command&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;SendMessageCommand&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;QueueUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;queueUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;MessageBody&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messageBody&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;executeSQSCommand&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;command&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Message sent successfully to &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;queueUrl&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The system ensures messages are delivered &lt;strong&gt;without unnecessary duplicates&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Actions proceed to the &lt;strong&gt;fulfillment stage&lt;/strong&gt; after entering the queue.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;4. Storing Deduplication Data in Redis&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;After sending a message, the deduplication ID is &lt;strong&gt;stored in Redis&lt;/strong&gt; with a &lt;strong&gt;TTL of 5 minutes&lt;/strong&gt; to ensure temporary deduplication.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;RedisClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;redisKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 300 seconds = 5 minutes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;short expiration time&lt;/strong&gt; ensures that &lt;strong&gt;retried actions&lt;/strong&gt; are still processed if needed.
&lt;/li&gt;
&lt;li&gt;Redis helps manage &lt;strong&gt;temporary deduplication&lt;/strong&gt; without affecting long-term action execution.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;5. Handling Errors Gracefully&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;If sending a message to SQS &lt;strong&gt;fails&lt;/strong&gt;, errors are handled based on the failure type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;TimeoutError&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AppError&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;CommonErrors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;REQUEST_TIMEOUT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Timeout occurred while sending the message to SQS.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;queueUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messageBody&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AppError&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;CommonErrors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;INTERNAL_SERVER_ERROR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Failed to send message to SQS.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;queueUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messageBody&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Timeouts&lt;/strong&gt; trigger a specific retry strategy.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Other failures&lt;/strong&gt; log metadata to help diagnose issues.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Why This Approach Works&lt;/strong&gt;
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FIFO queues&lt;/strong&gt; ensure &lt;strong&gt;strict ordering and deduplication&lt;/strong&gt; for time-sensitive tasks.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redis deduplication&lt;/strong&gt; prevents &lt;strong&gt;unnecessary duplicate processing&lt;/strong&gt; when a FIFO queue is not available.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error handling mechanisms&lt;/strong&gt; ensure messages are retried when necessary.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Action Fulfillment: Processing Scheduled Actions from SQS&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Once scheduled actions reach their execution time, they are &lt;strong&gt;processed by a Lambda function&lt;/strong&gt; that reads messages from &lt;strong&gt;Amazon SQS&lt;/strong&gt;. The function ensures that actions are executed correctly, updates their status accordingly, and handles errors or retries when needed.  &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Processing Actions from SQS&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;fulfill&lt;/code&gt; function listens for &lt;strong&gt;SQS events&lt;/strong&gt;, where each record represents a scheduled action that needs execution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fulfill&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;Records&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SQSRecord&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;}):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Records&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;Records&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;Records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;No records to process in SQS event.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;The function checks if there are &lt;strong&gt;new records to process&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;If no records exist, it exits early.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Ensuring Actions are Executed Correctly&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For each action in the queue:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the action has exceeded the &lt;strong&gt;maximum retry attempts&lt;/strong&gt;, it is &lt;strong&gt;marked as FAILED&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retryCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nx"&gt;CONSTANTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MAX_RETRY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;handleFailure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;receiptHandle&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;retryReason&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Exceeded maximum retry attempts&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;If the action is &lt;strong&gt;being executed for the first time&lt;/strong&gt;, its status is set to &lt;strong&gt;IN_PROGRESS&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retryCount&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;3. Handling Different Types of Actions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Even though the &lt;strong&gt;scheduler does not allow an action to be scheduled without a valid fulfillment service&lt;/strong&gt;, there is a &lt;strong&gt;defensive mechanism (NO_ACTION)&lt;/strong&gt; to handle cases where an action is &lt;strong&gt;manually altered or corrupted in the database&lt;/strong&gt;.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If the &lt;strong&gt;action type is not recognized&lt;/strong&gt;, it is marked as &lt;strong&gt;NO_ACTION&lt;/strong&gt; and removed from the queue.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ACTIONS&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scheduledAction&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;action&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;Actions&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;noAction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;deleteMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;receiptHandle&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;If the action is &lt;strong&gt;valid&lt;/strong&gt;, it is processed based on its type.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Processing a General Task&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nx"&gt;ACTIONS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;EXECUTE_TASK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;taskExecutionService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;performTask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scheduledAction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Calls an &lt;strong&gt;external service&lt;/strong&gt; to execute a &lt;strong&gt;generic task&lt;/strong&gt; (e.g., processing a user request).
&lt;/li&gt;
&lt;li&gt;Marks the action as &lt;strong&gt;COMPLETED&lt;/strong&gt; once finished.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Processing a Notification&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nx"&gt;ACTIONS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SEND_ALERT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;recipient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;messageType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;messageData&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;scheduledAction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;processedMessageData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;processMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messageData&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;notificationService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;recipient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;messageType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;processedMessageData&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Sends an &lt;strong&gt;alert or notification&lt;/strong&gt; using a &lt;strong&gt;notification service&lt;/strong&gt;. &lt;/li&gt;
&lt;li&gt;Marks the action as &lt;strong&gt;COMPLETED&lt;/strong&gt; after execution.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Handling Failures and Retries&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If an action fails, the system &lt;strong&gt;applies exponential backoff&lt;/strong&gt; and &lt;strong&gt;retries the execution&lt;/strong&gt; before marking it as permanently failed.  &lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Marking an Action as Failed&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handleFailure&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;receiptHandle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Action with id: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; failed after maximum retries. Reason: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fail&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;deleteMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;receiptHandle&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;If an action &lt;strong&gt;exceeds retry limits&lt;/strong&gt;, it is &lt;strong&gt;marked as FAILED&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;The message is removed from the queue to prevent further processing.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Retrying an Action with Backoff&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handleProcessingError&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;receiptHandle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;retryCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Error processing message with id: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;. Error: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;applyExponentialBackoff&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retryCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;actionMarkedForRetry&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;retry&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nx"&gt;AppError&lt;/span&gt;
      &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
      &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Processing error : &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;actionMarkedForRetry&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;deleteMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;receiptHandle&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;If an error occurs, the system &lt;strong&gt;applies exponential backoff&lt;/strong&gt; and &lt;strong&gt;increments the retry count&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;The failed action is &lt;strong&gt;re-enqueued into SQS&lt;/strong&gt; for a retry.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Finalizing Execution&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Once an action &lt;strong&gt;completes successfully&lt;/strong&gt;, it is removed from SQS.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;deleteMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;receiptHandle&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Message with id: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; processed successfully.`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;If &lt;strong&gt;all records&lt;/strong&gt; from the SQS event are processed, a summary log is printed.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;Records&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; records from the SQS event have been processed.`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Why This Approach Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ensures Actions are Always Executed&lt;/strong&gt; – Each action is &lt;strong&gt;retried with backoff&lt;/strong&gt; before failing permanently.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handles Different Action Types&lt;/strong&gt; – Supports &lt;strong&gt;notifications, tasks, subscription updates, and other scheduled jobs&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prevents Duplicate Execution&lt;/strong&gt; – Uses &lt;strong&gt;SQS FIFO or Redis deduplication&lt;/strong&gt; to avoid duplicate processing.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliable State Management&lt;/strong&gt; – Updates the database with &lt;strong&gt;IN_PROGRESS, COMPLETED, FAILED, or NO_ACTION&lt;/strong&gt; statuses.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Defensive Handling&lt;/strong&gt; – &lt;strong&gt;NO_ACTION&lt;/strong&gt; is a safeguard in case an action is altered manually in the database.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Presentation Layer: Exposing Endpoints for Observability&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;presentation layer&lt;/strong&gt; consists of a &lt;strong&gt;single Lambda function&lt;/strong&gt; that serves as an API, exposing HTTP endpoints through &lt;strong&gt;Amazon API Gateway&lt;/strong&gt;. These endpoints allow users to &lt;strong&gt;observe, manage, and interact&lt;/strong&gt; with scheduled actions, ensuring real-time monitoring and control.  &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Exposing HTTP Endpoints via API Gateway&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;serverless API&lt;/strong&gt; is built using &lt;strong&gt;AWS Lambda&lt;/strong&gt; and &lt;strong&gt;API Gateway&lt;/strong&gt;, providing access to key functionalities related to scheduled actions.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Get Actions by Status and Counts&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fetches actions grouped by their &lt;strong&gt;current status&lt;/strong&gt; (e.g., pending, completed, failed).
&lt;/li&gt;
&lt;li&gt;Provides &lt;strong&gt;count summaries&lt;/strong&gt; to track execution trends.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Initiate Failed Actions&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allows users to &lt;strong&gt;retry&lt;/strong&gt; failed actions manually.
&lt;/li&gt;
&lt;li&gt;Ensures failed jobs can be &lt;strong&gt;reprocessed&lt;/strong&gt; without waiting for an automated retry cycle.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Delete Actions&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provides an endpoint to &lt;strong&gt;remove old or unnecessary actions&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Helps maintain a &lt;strong&gt;clean database&lt;/strong&gt; by managing expired records.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Integrating with a Monitoring Dashboard&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The data exposed by these endpoints can be &lt;strong&gt;visualized on a dashboard&lt;/strong&gt; for real-time observability.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqljjjz6l312dy5tzpscp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqljjjz6l312dy5tzpscp.png" alt=" " width="800" height="131"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Displays &lt;strong&gt;action counts by status&lt;/strong&gt; to track performance.
&lt;/li&gt;
&lt;li&gt;Allows users to &lt;strong&gt;manually retry or delete actions&lt;/strong&gt; via an interface.
&lt;/li&gt;
&lt;li&gt;Provides insights into &lt;strong&gt;system health and execution reliability&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F75xu0fpixwz9pvxoqunp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F75xu0fpixwz9pvxoqunp.png" alt=" " width="800" height="519"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Challenges in Building the Scheduled Actions System&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Developing a reliable and scalable scheduled actions system comes with several challenges that need to be carefully addressed.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Race Conditions&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When multiple processes attempt to update or execute the same action simultaneously, inconsistencies can occur.
&lt;/li&gt;
&lt;li&gt;Proper locking mechanisms, deduplication, and FIFO queues help prevent duplicate execution.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Load Testing at Every Point of the Cycle&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The system must be tested under &lt;strong&gt;high loads&lt;/strong&gt; to ensure that scheduling, execution, retries, and fulfillment scale properly.
&lt;/li&gt;
&lt;li&gt;Testing includes &lt;strong&gt;database performance, SQS message handling, Lambda execution limits, and API response times&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Jobs Being Picked Up by Both Streams and the Scheduler&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Actions scheduled within &lt;strong&gt;2 minutes of execution&lt;/strong&gt; are processed by &lt;strong&gt;DynamoDB Streams&lt;/strong&gt;, while others rely on &lt;strong&gt;EventBridge&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Without proper coordination, &lt;strong&gt;duplicate executions&lt;/strong&gt; may occur. Ensuring actions transition correctly between pending, in-progress, and completed states prevents this issue.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Understanding Limitations of Tools&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Concurrency Handling:&lt;/strong&gt; AWS Lambda &lt;strong&gt;scales automatically&lt;/strong&gt;, but high concurrency can lead to &lt;strong&gt;throttling and delays&lt;/strong&gt; in processing.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lambda Runtime Limits:&lt;/strong&gt; Since &lt;strong&gt;Lambda has a max execution time&lt;/strong&gt;, long-running tasks must be &lt;strong&gt;broken into smaller executions or offloaded to a worker service&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Future Improvements&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Webhooks for Real-Time Notifications&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implementing &lt;strong&gt;webhooks&lt;/strong&gt; would allow external services that schedule actions to receive &lt;strong&gt;real-time updates&lt;/strong&gt; when an action &lt;strong&gt;executes, fails, or retries&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;This reduces the need for polling and improves system responsiveness.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Handler for Missed Actions&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A dedicated handler to &lt;strong&gt;detect and process actions&lt;/strong&gt; that are still in a &lt;strong&gt;PENDING&lt;/strong&gt; state but have an &lt;strong&gt;execution time in the past&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;This ensures no scheduled action is permanently missed due to &lt;strong&gt;system failures, delays, or scaling issues&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;These improvements would &lt;strong&gt;enhance reliability, observability, and integration&lt;/strong&gt; with external systems, making the scheduling system even more robust.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Conclusion: Scheduling, Scaling, and Sanity&lt;/strong&gt; 🚀
&lt;/h4&gt;

&lt;p&gt;Building a &lt;strong&gt;reliable, scalable, and fault-tolerant&lt;/strong&gt; scheduled actions system isn’t just about setting up a cron job and hoping for the best—it’s about &lt;strong&gt;engineering resilience&lt;/strong&gt; into every step of the process. From &lt;strong&gt;scheduling and execution to retries and observability&lt;/strong&gt;, every component must work together to ensure that &lt;strong&gt;no action is lost, no notification is forgotten, and no subscription goes unmanaged.&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Through this journey, we've tackled:&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Dynamic scheduling&lt;/strong&gt; with &lt;strong&gt;DynamoDB Streams, EventBridge, and SQS&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Precise execution&lt;/strong&gt; using a mix of &lt;strong&gt;immediate and delayed actions&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Reliable processing&lt;/strong&gt; with &lt;strong&gt;deduplication, retries, and exponential backoff&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Scalability&lt;/strong&gt; by leveraging &lt;strong&gt;serverless architecture&lt;/strong&gt; to handle high loads.&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Observability&lt;/strong&gt; with APIs to &lt;strong&gt;monitor, retry, and delete scheduled actions&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;Of course, every system has its &lt;strong&gt;quirks and challenges&lt;/strong&gt;—race conditions, tool limitations, and unexpected failures—but with the right &lt;strong&gt;design patterns, defensive coding, and future improvements (like webhooks and missed action recovery)&lt;/strong&gt;, this system can evolve into an even more &lt;strong&gt;powerful, intelligent, and autonomous scheduler.&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;At the end of the day, &lt;strong&gt;automation is about making life easier&lt;/strong&gt;—whether it’s managing user subscriptions, sending notifications, or processing time-sensitive transactions. And while &lt;strong&gt;computers never sleep, we certainly need to&lt;/strong&gt;, which is why designing a system that can handle its own problems &lt;strong&gt;before waking us up at 3 AM&lt;/strong&gt; is always worth the effort.  &lt;/p&gt;

&lt;p&gt;So here’s to building &lt;strong&gt;systems that work while we don’t!&lt;/strong&gt; 🎉&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Building a Robust and Cost effective Token Verification Service: A Guide to Secure API Integrations</title>
      <dc:creator>JOOJO DONTOH</dc:creator>
      <pubDate>Sun, 10 Nov 2024 11:56:26 +0000</pubDate>
      <link>https://dev.to/joojodontoh/building-a-robust-and-cost-effective-token-verification-service-a-guide-to-secure-api-integrations-35ka</link>
      <guid>https://dev.to/joojodontoh/building-a-robust-and-cost-effective-token-verification-service-a-guide-to-secure-api-integrations-35ka</guid>
      <description>&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In today’s interconnected systems, secure communication between services is very important 🤔. APIs are increasingly used to bridge services across organizational boundaries, often involving sensitive data exchanges such as top secret pancake recipes😅. This has amplified the need for robust mechanisms to verify and authenticate requests between systems securely. Imagine a token verification service that not only signs requests but also automates key rotation, providing security and convenience for API integrations.&lt;/p&gt;

&lt;p&gt;In this article, we’ll walk through a solution architecture to build a scalable and secure token verification service using AWS tools like Lambda, Secrets Manager, and API Gateway 🎉. This service is designed to enable partners and internal teams to verify requests originating from your service while maintaining high availability and resilience. The solution balances functional requirements with performance and cost-effectiveness, making it an ideal addition to your organization’s security toolkit. Now let's look at the problem 😢&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;&lt;br&gt;
Imagine needing to authenticate and authorize requests from internal or external services, sign requests, or implement role-based access control. A straightforward solution could be to integrate symmetric verification endpoints into an existing service and expose them to other services that require these capabilities. &lt;br&gt;
This approach provides a quick and accessible way to add secure, verifiable authentication without extensive restructuring.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.giphy.com%2Fmedia%2F9104Ok6PDF4SCzNwvm%2Fgiphy.gif%3Fcid%3D790b76116w6xgek5169rsb8zl0wv3d893s0bg3cecmurmy9i%26ep%3Dv1_gifs_search%26rid%3Dgiphy.gif%26ct%3Dg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.giphy.com%2Fmedia%2F9104Ok6PDF4SCzNwvm%2Fgiphy.gif%3Fcid%3D790b76116w6xgek5169rsb8zl0wv3d893s0bg3cecmurmy9i%26ep%3Dv1_gifs_search%26rid%3Dgiphy.gif%26ct%3Dg" width="480" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But wait! While this quick fix might address the immediate problem, it may not scale well or handle similar challenges beyond the current scope. Here’s why: if the verification is symmetric, meaning the same key is used for both signing and verifying requests, it can create inefficiencies. Depending on the architecture of the host service, this could quickly turn into a bottleneck, especially under high load or across multiple services needing validation.&lt;br&gt;
So, what’s the alternative? Imagine solving authentication and authorization issues “across space and time.” 🧐&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExNWY2OXNqbjZxODV6d2s3ZWRjdGFsd3lweG5sdHF2aW94aGljNXg2dyZlcD12MV9naWZzX3NlYXJjaCZjdD1n/wpvrvjjDf7uRoM61w5/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExNWY2OXNqbjZxODV6d2s3ZWRjdGFsd3lweG5sdHF2aW94aGljNXg2dyZlcD12MV9naWZzX3NlYXJjaCZjdD1n/wpvrvjjDf7uRoM61w5/giphy.gif" width="500" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Solving problems "across space" means designing a solution that addresses the current issue not just for a single team or service, but for any team or service that might face the same challenge. Meanwhile, solving "across time" involves creating a solution that can address similar problems that may have existed in the past or could arise in the future—a versatile solution that can be applied in various scenarios or extended to tackle new issues.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, how do we devise a solution that achieves this?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.giphy.com%2Fmedia%2FSKGo6OYe24EBG%2Fgiphy.gif%3Fcid%3D790b7611py0tcnknm0u2myrmffdt0yqhnql8trqgqgd3gpuk%26ep%3Dv1_gifs_search%26rid%3Dgiphy.gif%26ct%3Dg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.giphy.com%2Fmedia%2FSKGo6OYe24EBG%2Fgiphy.gif%3Fcid%3D790b7611py0tcnknm0u2myrmffdt0yqhnql8trqgqgd3gpuk%26ep%3Dv1_gifs_search%26rid%3Dgiphy.gif%26ct%3Dg" width="400" height="287"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;We’ll create a service with the following capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exposing public keys for token verification&lt;/li&gt;
&lt;li&gt;Signing payloads with private keys on a secure, protected endpoint&lt;/li&gt;
&lt;li&gt;Automating key rotation to avoid manual intervention&lt;/li&gt;
&lt;li&gt;Ensuring high availability, security, and cost-effectiveness&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Solution Overview&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Our proposed solution leverages AWS’s serverless offerings—such as Secrets Manager, Lambda, and API Gateway—to handle token signing, verification, and key rotation efficiently, ensuring security, scalability, and maintainability. Here’s a breakdown of the tools and technologies we’ll use:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fde7wlsl5ofhk863522vm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fde7wlsl5ofhk863522vm.png" alt="Tools" width="800" height="172"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway&lt;/strong&gt;: Manages HTTP requests, allowing secure access to endpoints.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Lambda&lt;/strong&gt;: Facilitates secure, on-demand execution for signing, public key retrieval, and key rotation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS S3&lt;/strong&gt;: Stores public keys for easy access through the public endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Secrets Manager&lt;/strong&gt;: Manages private keys securely and supports automated rotation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serverless Framework&lt;/strong&gt;: Simplifies deployment and infrastructure management.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This combination allows us to build a service that is secure, highly available, and optimized for fast public key retrieval, making it both lightweight and cost-effective.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.giphy.com%2Fmedia%2FlOfOrezUUVNAhtGveA%2Fgiphy.gif%3Fcid%3D790b7611d4m4vwsj5uwocfih6shkcprz19fmyw0e2o2rfpm2%26ep%3Dv1_gifs_search%26rid%3Dgiphy.gif%26ct%3Dg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.giphy.com%2Fmedia%2FlOfOrezUUVNAhtGveA%2Fgiphy.gif%3Fcid%3D790b7611d4m4vwsj5uwocfih6shkcprz19fmyw0e2o2rfpm2%26ep%3Dv1_gifs_search%26rid%3Dgiphy.gif%26ct%3Dg" width="480" height="270"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Let's scope out the requirements&lt;/p&gt;

&lt;h3&gt;
  
  
  Functional Requirements
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Public Key Retrieval&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A public endpoint will serve public keys for verifying tokens issued by the service, allowing clients to authenticate tokens securely.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Private Key Management&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The service will securely store private keys in Secrets Manager, which can be accessible through a protected endpoint or directly from  AWS Secrets Manager via authorized roles.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Authorization Control for Signing&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only authorized clients will access the signing endpoint, and each client must have explicit permissions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Automated Key Rotation&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keys will be rotated automatically to ensure that they remain secure, with an immediate rotation option for emergency cases.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Non-Functional Requirements
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;High Availability&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The service should be designed with minimal downtime to guarantee constant availability.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Robustness and Maintainability&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Easy-to-maintain code that other teams can access, use, and adapt as needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Security and Performance&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strong security practices, including efficient public key retrieval and endpoint protection, without impacting speed or availability.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExMTd5dXJodDQ0M3c0dmt5OGV3NTYwbTcxaG42cGFqZ2lyN2loZXJhNCZlcD12MV9naWZzX3NlYXJjaCZjdD1n/usz0fqhUiVxSs6IUKB/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExMTd5dXJodDQ0M3c0dmt5OGV3NTYwbTcxaG42cGFqZ2lyN2loZXJhNCZlcD12MV9naWZzX3NlYXJjaCZjdD1n/usz0fqhUiVxSs6IUKB/giphy.gif" width="480" height="270"&gt;&lt;/a&gt; &lt;/p&gt;




&lt;h2&gt;
  
  
  Service Architecture and Flow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Key Components
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;🔐 Token Signing&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Clients authorized for signing can request token generation by calling a protected API endpoint. The process works as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Permissions Check:&lt;/strong&gt; Only clients with explicit permissions can access this endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Private Key Retrieval:&lt;/strong&gt; The service retrieves the private key securely from AWS Secrets Manager.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token Generation:&lt;/strong&gt; The payload is signed using the private key, and the resulting token is returned to the client.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;🔑 Token Verification&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Token verification is facilitated by a &lt;strong&gt;JWKS (JSON Web Key Set) endpoint&lt;/strong&gt;, allowing clients to retrieve public keys for verifying tokens. The process is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Public Key Access:&lt;/strong&gt; The JWKS endpoint exposes public keys that match the key ID in signed tokens.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verification:&lt;/strong&gt; Clients use the retrieved public keys to validate the authenticity and integrity of the token.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;🔄 Key Rotation&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Key rotation is handled automatically by AWS Secrets Manager:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Periodic Rotation:&lt;/strong&gt; Secrets Manager rotates keys at set intervals.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lambda Handling:&lt;/strong&gt; An AWS Lambda function manages the updates to public and private keys in Secrets Manager, ensuring smooth transitions.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;☁️ Serverless Framework and AWS Integration&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Resource Provisioning&lt;/strong&gt;: Serverless automates AWS Lambda and API Gateway setups, delivering responsive, load-adaptive infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment Configuration&lt;/strong&gt;: Enables secure, isolated environment configurations (e.g., dev, staging, production) with controlled access to AWS Secrets Manager and S3.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD Pipeline&lt;/strong&gt;: Integrates with Pipelines for automated, reliable deployments without manual intervention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Efficiency and Scalability&lt;/strong&gt;: Combines AWS Lambda’s pay-as-you-go model with Serverless’s efficient setup, reducing idle costs and scaling resources on demand.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  High-Level Flow
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fir6rh87km4nzeyjc30nn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fir6rh87km4nzeyjc30nn.png" alt="High-Level Flow" width="800" height="445"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  1. Token Signing Process
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Step 1:&lt;/strong&gt; A client with the necessary authorization initiates a request to sign a payload.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step 2:&lt;/strong&gt; The signing endpoint verifies the client’s permissions, retrieves the private key from Secrets Manager, signs the payload, and returns a token with a unique key ID (&lt;code&gt;kid&lt;/code&gt;) in the header for easy verification. If the private key is missing, the service automatically rotates keys to ensure availability before signing the payload.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foznsfon9z9swz6pd71oc.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foznsfon9z9swz6pd71oc.jpeg" alt="Token Signing Process" width="800" height="242"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  2. Token Verification Process
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Step 1:&lt;/strong&gt; A client receives a signed token, which includes a key ID in its header.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step 2:&lt;/strong&gt; The client fetches the matching public key from the JWKS endpoint and verifies the token.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvznuhkis3sz2f5biiq7w.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvznuhkis3sz2f5biiq7w.jpeg" alt="Token Verification Process" width="800" height="217"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  3. Key Rotation Process
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Generates two RSA key pairs (2048 and 4096 bits) with unique key IDs.&lt;/li&gt;
&lt;li&gt;Retrieves existing public keys, appends the new public keys, and saves the updated set of keys.&lt;/li&gt;
&lt;li&gt;Updates or ensures the existence of a private key in AWS Secrets Manager.&lt;/li&gt;
&lt;li&gt;Returns metadata upon successful rotation, with error handling for issues during the key generation or save process.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4tmt9q18d4ump63kqjkp.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4tmt9q18d4ump63kqjkp.jpg" alt="Key Rotation Process" width="800" height="261"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  🎉 Advantages of the Solution
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🔒 Security&lt;/strong&gt;: Private keys are securely managed in Secrets Manager, and only authorized clients can access sensitive endpoints.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🌐 Availability&lt;/strong&gt;: AWS’s serverless infrastructure ensures high availability, even under heavy traffic.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;💰 Cost-Effectiveness&lt;/strong&gt;: Serverless architecture allows on-demand scaling, making it a highly cost-effective solution.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🔄 Resilience&lt;/strong&gt;: Automated key rotation, with the option for immediate rotation, boosts both security and resilience.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;⚙️ Rotation Atomicity&lt;/strong&gt;: Key rotation is atomic, ensuring public keys update first. If public key storage fails, the process stops, keeping the current key pair intact.If private key storage fails, the failure isn't harmful, because the current key pair is intact. However a configured alert will call for manual intervention.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🔧 Key Signing Dynamism&lt;/strong&gt;: The sign endpoint supports flexible payload signing, allowing dynamic expiry, key length, and algorithm selection.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;📜 Key Signing Availability&lt;/strong&gt;: If the private key is missing, the sign endpoint triggers a key rotation, ensuring keys are always available for signing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;📉 Pay-as-You-Go Model&lt;/strong&gt;: AWS Lambda charges only for compute time, making this solution ideal for short-lived signing and verification tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🚫 No Dedicated Infrastructure&lt;/strong&gt;: Serverless architecture eliminates the need for infrastructure management, as AWS handles scaling and security patches.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;🔐 Efficient Secret Management&lt;/strong&gt;: AWS Secrets Manager securely stores and rotates private keys, reducing the risks and costs of manual management.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;📦 Scalable Storage with Amazon S3&lt;/strong&gt;: Public keys stored on Amazon S3 enjoy durable, scalable storage. Paired with CloudFront, this minimizes data transfer costs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;⚖️ Dynamic Scaling &amp;amp; No Downtime Costs&lt;/strong&gt;: AWS’s serverless infrastructure scales automatically, eliminating downtime costs and avoiding idle capacity.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExODFsdDRjNnUyZDBoNXA5azNuZDAweXF5MmNlamprYzhxczVxajg3ZiZlcD12MV9naWZzX3NlYXJjaCZjdD1n/cdXpgeB32BekIGzBNh/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExODFsdDRjNnUyZDBoNXA5azNuZDAweXF5MmNlamprYzhxczVxajg3ZiZlcD12MV9naWZzX3NlYXJjaCZjdD1n/cdXpgeB32BekIGzBNh/giphy.gif" width="500" height="281"&gt;&lt;/a&gt; &lt;/p&gt;




&lt;h3&gt;
  
  
  🛠️ Use Cases
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;🔑 Access and Refresh Token Management&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Securely issues and manages access and refresh tokens, allowing clients to verify token validity with ease.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;📜 API Request Signing&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enables clients to securely sign and validate API requests, ensuring the integrity and authenticity of requests.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;👥 Role-Based Authentication&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supports role-specific authentication for actions, restricting access to authorized roles only.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;🚨 Emergency Key Rotation and Compliance&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allows for immediate key rotation in case of a security incident, ensuring quick compliance with security best practices.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.giphy.com%2Fmedia%2Fa5viI92PAF89q%2Fgiphy.gif%3Fcid%3D790b7611qhw7luy16p7g19ktfiwfmmglbh2kpt931jcbeexn%26ep%3Dv1_gifs_search%26rid%3Dgiphy.gif%26ct%3Dg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.giphy.com%2Fmedia%2Fa5viI92PAF89q%2Fgiphy.gif%3Fcid%3D790b7611qhw7luy16p7g19ktfiwfmmglbh2kpt931jcbeexn%26ep%3Dv1_gifs_search%26rid%3Dgiphy.gif%26ct%3Dg" width="500" height="345"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Creating a token verification service with automated key rotation and secure signing endpoints is essential in modern API ecosystems. Our solution not only provides a robust token verification framework but also ensures security, availability, and cost-effectiveness. Leveraging AWS’s serverless and managed services makes it easy to scale, maintain, and secure the service, giving partners and teams the confidence to rely on the service for request verification.&lt;/p&gt;

&lt;p&gt;This service blueprint provides a powerful framework for implementing secure API integrations. By focusing on both functional and non-functional requirements, you can ensure that your token verification process is resilient, fast, and secure, setting the stage for a more connected and protected API ecosystem.&lt;/p&gt;

&lt;p&gt;PS: Though the idea solution and implementation are mine, this article was reviewed and enhanced by my Uncle: Mr Chat GPT.😁&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExaXlkODY2djlwazFkdm9ydG45MHhwMXRmY2podzFzaDU4ajlta2ljNiZlcD12MV9naWZzX3NlYXJjaCZjdD1n/12noFudALzfIynHuUp/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://i.giphy.com/media/v1.Y2lkPTc5MGI3NjExaXlkODY2djlwazFkdm9ydG45MHhwMXRmY2podzFzaDU4ajlta2ljNiZlcD12MV9naWZzX3NlYXJjaCZjdD1n/12noFudALzfIynHuUp/giphy.gif" width="480" height="270"&gt;&lt;/a&gt; &lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
