<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rahul Reddy Talatala</title>
    <description>The latest articles on DEV Community by Rahul Reddy Talatala (@rahul_talatala).</description>
    <link>https://dev.to/rahul_talatala</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2548667%2F146aed7a-634e-4cd1-b499-b08544ed9470.jpg</url>
      <title>DEV Community: Rahul Reddy Talatala</title>
      <link>https://dev.to/rahul_talatala</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rahul_talatala"/>
    <language>en</language>
    <item>
      <title>I Got Tired of Filling Out the Same Form 50 Times, So I Built an AI to Do It</title>
      <dc:creator>Rahul Reddy Talatala</dc:creator>
      <pubDate>Fri, 06 Mar 2026 06:44:27 +0000</pubDate>
      <link>https://dev.to/rahul_talatala/i-got-tired-of-filling-out-the-same-form-50-times-so-i-built-an-ai-to-do-it-4jfj</link>
      <guid>https://dev.to/rahul_talatala/i-got-tired-of-filling-out-the-same-form-50-times-so-i-built-an-ai-to-do-it-4jfj</guid>
      <description>&lt;p&gt;Every time I applied for a job, I faced the same ritual. Open the application form. Type my full name. Type my email. Paste my LinkedIn URL. Type my phone number. Select my country from a dropdown. Answer "Are you authorized to work in the US?" for the fifteenth time that week.&lt;/p&gt;

&lt;p&gt;The entire process takes about ten minutes per application, and roughly eight of those minutes are spent on fields I have answered hundreds of times before. The two minutes that actually matter, the cover letter, the portfolio link, the thoughtful answers to specific questions, get squeezed into whatever mental energy I have left.&lt;/p&gt;

&lt;p&gt;I am a GenAI engineer. I spend my days building systems that make computers do repetitive cognitive work. The irony of manually typing my zip code into yet another Greenhouse form at midnight was not lost on me.&lt;/p&gt;

&lt;p&gt;So I built ApplyAI: a Chrome extension that reads a job application form, sends the fields to an AI agent, gets back a fill plan, and applies it to the page in under ten seconds.&lt;/p&gt;

&lt;p&gt;🚀 &lt;a href="https://chromewebstore.google.com/detail/ApplyAI/ckknfphllkanlgikfaadoikjionkbmpf" rel="noopener noreferrer"&gt;Try it on the Chrome Web Store&lt;/a&gt; | 🌐 &lt;a href="https://apply-ai-extension.vercel.app/" rel="noopener noreferrer"&gt;Web App&lt;/a&gt; | 🔗 &lt;a href="https://github.com/rahult18/apply-ai-extension" rel="noopener noreferrer"&gt;Github&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;🎥 &lt;a href="https://youtu.be/k-OzDV68eeA?si=CnqCoK-jrmCEW2JQ" rel="noopener noreferrer"&gt;Full product demo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This post walks through how it works, the actual architecture, the hard problems, and the design decisions that shaped the final system.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chrome Extension&lt;/strong&gt;: Vanilla JS with a React popup (Vite + Tailwind)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend&lt;/strong&gt;: FastAPI (Python)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Agent&lt;/strong&gt;: LangGraph StateGraph with Gemini 2.5 Flash&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database and Auth&lt;/strong&gt;: Supabase (PostgreSQL + Auth + Storage)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend Dashboard&lt;/strong&gt;: Next.js 14 with TypeScript&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three components. One clear job each. The extension is the hands. The backend is the brain. The frontend is the face.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;Here is the data flow for a single autofill run:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;User navigates to a job application form (Lever, Ashby, Greenhouse, or any careers page)&lt;/li&gt;
&lt;li&gt;The Chrome extension extracts every form field from the live DOM, including dropdown options&lt;/li&gt;
&lt;li&gt;The extension sends those fields plus the raw DOM HTML to the FastAPI backend&lt;/li&gt;
&lt;li&gt;The backend runs a LangGraph agent that calls Gemini 2.5 Flash with the user's profile, resume, and the extracted job description&lt;/li&gt;
&lt;li&gt;Gemini returns a structured JSON answer for every field&lt;/li&gt;
&lt;li&gt;The backend assembles a fill plan and sends it back to the extension&lt;/li&gt;
&lt;li&gt;The extension applies each value to the correct form element using CSS selectors&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key insight that made the whole thing feasible: &lt;strong&gt;the browser is the only environment that can see a fully rendered React form&lt;/strong&gt;. A server-side scraper sees the HTML skeleton. A real browser running JavaScript sees the actual dropdown options, the dynamic field states, and the ARIA attributes that React Select generates at runtime. So the extraction has to happen inside the browser, and the AI reasoning has to happen on the server where I have access to the user's data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frke045f0k6q5ottm6477.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frke045f0k6q5ottm6477.png" alt="ApplyAI Architecture" width="800" height="522"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 1: Getting Form Fields Out of a Live Page
&lt;/h2&gt;

&lt;p&gt;This turned out to be the hardest non-AI problem in the whole project.&lt;/p&gt;

&lt;p&gt;Job application forms are not plain HTML. They are React components. The dropdowns are typically React Select, which renders options into a floating portal that only exists in the DOM when the dropdown is actually open. If you scrape the page while all dropdowns are closed, you get combobox elements with no options. The AI has no idea what values are valid.&lt;/p&gt;

&lt;p&gt;My solution: open every dropdown programmatically before scraping anything.&lt;/p&gt;

&lt;p&gt;The extension injects a script into the active tab using &lt;code&gt;chrome.scripting.executeScript&lt;/code&gt;. Before touching a single field, it finds every &lt;code&gt;[role="combobox"]&lt;/code&gt; element and simulates a real user opening it by dispatching mouse and keyboard events in sequence, then waits 300ms for React to render the options into the DOM.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;combobox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dispatchEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MouseEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;mousedown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;bubbles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="nx"&gt;combobox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dispatchEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MouseEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;click&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;bubbles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="nx"&gt;combobox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dispatchEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;KeyboardEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;keydown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ArrowDown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;bubbles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// wait for React to paint the listbox&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 300ms wait is not arbitrary. I tested against Lever, Ashby, and Greenhouse. At 200ms the options were missing about 30% of the time. At 300ms the failure rate dropped to near zero.&lt;/p&gt;

&lt;p&gt;Once the dropdown is open, options are extracted from the ARIA-controlled listbox. Each field is then serialized into a structured object containing its type, label, CSS selector, required status, and available options. That object is what gets sent to the backend.&lt;/p&gt;

&lt;p&gt;The CSS selector is computed at extraction time using the element's &lt;code&gt;id&lt;/code&gt; or &lt;code&gt;name&lt;/code&gt; attribute. This same selector is used later during the apply step to locate the exact element on the page, so precision matters here.&lt;/p&gt;

&lt;p&gt;Label detection uses multiple fallback strategies in order: the &lt;code&gt;for&lt;/code&gt; attribute on an associated &lt;code&gt;&amp;lt;label&amp;gt;&lt;/code&gt;, a parent &lt;code&gt;&amp;lt;label&amp;gt;&lt;/code&gt; element, &lt;code&gt;aria-label&lt;/code&gt;, and finally &lt;code&gt;placeholder&lt;/code&gt;. When none of those exist, the field name or id becomes the label.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 2: The LangGraph Autofill Agent
&lt;/h2&gt;

&lt;p&gt;Once the extension sends the extracted fields to the backend, a LangGraph &lt;code&gt;StateGraph&lt;/code&gt; takes over. The autofill pipeline has four clearly separated concerns and a DAG maps to that structure naturally.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;START -&amp;gt; initialize -&amp;gt; extract_form_fields -&amp;gt; generate_answers -&amp;gt; assemble_autofill_plan -&amp;gt; END
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each node receives the full shared state, does one job, returns its updates, and passes control to the next node.&lt;/p&gt;

&lt;h3&gt;
  
  
  Node 1: Initialize
&lt;/h3&gt;

&lt;p&gt;Sets the run ID, page URL, and initial empty collections. Straightforward bookkeeping.&lt;/p&gt;

&lt;h3&gt;
  
  
  Node 2: Extract Form Fields
&lt;/h3&gt;

&lt;p&gt;Converts the JavaScript field objects from the extension into typed Python &lt;code&gt;FormField&lt;/code&gt; dictionaries. This handles type mapping (React Select comboboxes become &lt;code&gt;select&lt;/code&gt; type), deduplication by field signature, and one non-obvious enrichment.&lt;/p&gt;

&lt;p&gt;If a &lt;code&gt;select&lt;/code&gt; field has "country", "nationality", or "citizenship" in its label and has zero extracted options, the backend automatically injects the full list of 196 standard country names. This is a safety net for the cases where the browser-side dropdown opening fails. Some Greenhouse forms use a custom country component that does not respond to standard mouse events. The backend catches this gap and fills in the options so the LLM still has something to work with.&lt;/p&gt;

&lt;h3&gt;
  
  
  Node 3: Generate Answers
&lt;/h3&gt;

&lt;p&gt;This is where Gemini does the work.&lt;/p&gt;

&lt;p&gt;I spent a lot of time on the prompt design. The format that worked best is &lt;strong&gt;structured JSON as the prompt body&lt;/strong&gt; rather than prose. The task description, rules, context, and output format are all keys in a JSON object. This consistently outperformed plain English paragraphs for precision tasks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt_obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generate answers for ALL &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fields_spec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; form fields.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;critical_rules&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MANDATORY: Set action=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;autofill&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; for ALL fields. Never use &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;skip&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; or &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;suggest&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;If you don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t know an answer, use action=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;autofill&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; with value=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="s"&gt; and low confidence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_ctx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# profile fields
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;job_ctx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;job_ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# extracted job description
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resume_ctx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;resume_ctx&lt;/span&gt; &lt;span class="c1"&gt;# parsed resume data
&lt;/span&gt;    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;form_fields&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;fields_spec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model treats JSON keys as hard constraints, not suggestions. Prose prompts produced more hedging and more "I cannot fill this" responses. JSON prompts produced precise, consistent output.&lt;/p&gt;

&lt;p&gt;Gemini is called with &lt;code&gt;response_mime_type: "application/json"&lt;/code&gt; and a &lt;code&gt;response_json_schema&lt;/code&gt; derived directly from a Pydantic model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_mime_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_json_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LLMAnswersResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_json_schema&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This eliminates an entire class of failure modes. No regex extraction. No JSON fence stripping. No trying to parse Gemini's explanation text alongside the output. The response is always valid JSON that maps directly to the Pydantic model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The aggressive autofill rule&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Early versions would skip EEO fields ("Race", "Veteran Status") and optional demographics because the LLM would flag them as sensitive. Users found this confusing: they had no idea why some fields were left blank.&lt;/p&gt;

&lt;p&gt;The fix was a two-layer no-skip guarantee. The prompt says &lt;code&gt;action='autofill'&lt;/code&gt; is mandatory. Then a &lt;code&gt;_normalize_answer()&lt;/code&gt; post-processing step converts any &lt;code&gt;skip&lt;/code&gt; or &lt;code&gt;suggest&lt;/code&gt; action to &lt;code&gt;autofill&lt;/code&gt; at plan assembly time, even if the LLM ignored the instruction. If the LLM has no idea what value to use, it returns an empty string with low confidence. The field gets pre-filled with nothing and the user can fix it manually. This is a much better experience than leaving fields blank silently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option matching&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The LLM might return "United States" when the dropdown says "USA". Or it might get the case right but add extra whitespace. After receiving the LLM response, each answer for a &lt;code&gt;select&lt;/code&gt;, &lt;code&gt;radio&lt;/code&gt;, or &lt;code&gt;checkbox&lt;/code&gt; field goes through a normalizer that strips both the LLM value and each available option down to lowercase alphanumeric characters, then tries an exact match first and falls back to substring containment. This handles most real-world mismatches without any hardcoded synonym maps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File inputs are handled separately, outside the LLM entirely&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;File upload fields are detected by &lt;code&gt;input_type == "file"&lt;/code&gt;. The code checks the label for "cover letter" keywords. If matched, the field is skipped because no one should submit an AI-generated cover letter without reviewing it. Everything else gets &lt;code&gt;value: "resume"&lt;/code&gt; with maximum confidence. No LLM call needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Node 4: Assemble Autofill Plan
&lt;/h3&gt;

&lt;p&gt;Combines &lt;code&gt;form_fields&lt;/code&gt; and &lt;code&gt;answers&lt;/code&gt; into the final plan structure, generates a summary with field counts, and writes the completed plan to the database. The plan contains each field's selector, value, action, and confidence score.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 3: Applying the Plan Back in the Browser
&lt;/h2&gt;

&lt;p&gt;The backend returns the plan. Now the extension has to actually fill the form.&lt;/p&gt;

&lt;p&gt;Another injected script iterates through every field in the plan and uses a different fill strategy based on input type:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text and textarea&lt;/strong&gt;: Set value through React's native property setter to trigger synthetic events, then dispatch &lt;code&gt;input&lt;/code&gt; and &lt;code&gt;change&lt;/code&gt; events so the React component knows the value changed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native &lt;code&gt;&amp;lt;select&amp;gt;&lt;/code&gt;&lt;/strong&gt;: Find the matching option by text content. Set &lt;code&gt;selectedIndex&lt;/code&gt;. Fire a &lt;code&gt;change&lt;/code&gt; event.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;React Select (combobox)&lt;/strong&gt;: Type the value into the input, wait for the listbox, find the matching option, click it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Radio and checkbox groups&lt;/strong&gt;: Find the label whose text matches the target value and click it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File inputs&lt;/strong&gt;: This one deserves its own explanation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The browser's security model prevents setting &lt;code&gt;input.value&lt;/code&gt; on a file input for obvious reasons. The only legitimate way to attach a file programmatically is through the &lt;code&gt;DataTransfer&lt;/code&gt; API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fillFileInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;fileUrl&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fileUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;arrayBuffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;File&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nx"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;resume.pdf&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/pdf&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;dataTransfer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;DataTransfer&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nx"&gt;dataTransfer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;items&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;dataTransfer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;files&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="nx"&gt;el&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dispatchEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;change&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;bubbles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;fileUrl&lt;/code&gt; is a Supabase Storage signed URL generated by the backend at plan time with a one-hour TTL. The extension fetches the user's resume as an ArrayBuffer, wraps it in a &lt;code&gt;File&lt;/code&gt; object, and attaches it to the input through the &lt;code&gt;DataTransfer&lt;/code&gt; API. This works reliably across Greenhouse, Lever, and Ashby.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 4: The Authentication Design
&lt;/h2&gt;

&lt;p&gt;The system runs two completely separate auth systems side by side.&lt;/p&gt;

&lt;p&gt;The web frontend uses Supabase's standard JWT (email/password and Google OAuth). The token lives in a cookie and is sent as a Bearer token on API calls.&lt;/p&gt;

&lt;p&gt;The Chrome extension cannot participate in cookie-based sessions. It runs in a sandboxed context with no access to the frontend's cookies. So I built a separate auth flow using custom JWTs.&lt;/p&gt;

&lt;p&gt;The connection process works like a one-time code exchange. When the user clicks "Connect" in the popup, the frontend generates a 32-character urlsafe random code, stores its SHA-256 hash in the database with a 10-minute expiry, and sends the plaintext code to the extension via &lt;code&gt;window.postMessage()&lt;/code&gt;. The extension exchanges this code at &lt;code&gt;POST /extension/connect/exchange&lt;/code&gt; for a 7-day JWT signed with a custom secret.&lt;/p&gt;

&lt;p&gt;The JWT carries an &lt;code&gt;audience&lt;/code&gt; claim set to &lt;code&gt;applyai-extension&lt;/code&gt;. The backend validates this on every extension endpoint, so the extension token cannot be replayed against any other part of the API. It also carries an &lt;code&gt;install_id&lt;/code&gt; (a UUID stored in &lt;code&gt;chrome.storage.local&lt;/code&gt;) for device-level tracking without building a separate device registry.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 5: Three Hard Problems I Did Not Expect
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Chrome extension popups throttle CSS animations
&lt;/h3&gt;

&lt;p&gt;I built the popup with loading spinners and skeleton loaders. They worked fine during development in the browser. Once I loaded the extension and opened the actual popup, every animation was frozen solid.&lt;/p&gt;

&lt;p&gt;Chrome throttles JavaScript execution in extension popup contexts to save battery. Tailwind's default &lt;code&gt;animate-spin&lt;/code&gt; and &lt;code&gt;animate-pulse&lt;/code&gt; classes get paused by this throttling.&lt;/p&gt;

&lt;p&gt;The fix is to redefine the keyframes explicitly in CSS and force the play state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="k"&gt;@keyframes&lt;/span&gt; &lt;span class="n"&gt;spin&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nt"&gt;to&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rotate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="m"&gt;360deg&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nc"&gt;.animate-spin&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;animation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;spin&lt;/span&gt; &lt;span class="m"&gt;0.85s&lt;/span&gt; &lt;span class="n"&gt;linear&lt;/span&gt; &lt;span class="n"&gt;infinite&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;animation-play-state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;running&lt;/span&gt; &lt;span class="cp"&gt;!important&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;!important&lt;/code&gt; on &lt;code&gt;animation-play-state&lt;/code&gt; overrides whatever Chrome tries to set. Every animation class in the popup needs this treatment.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. DOM hashing breaks plan caching
&lt;/h3&gt;

&lt;p&gt;The first version of plan caching computed a SHA-256 hash of the full DOM HTML and used it as part of the cache key. If the hash matched a previous run, return the cached plan.&lt;/p&gt;

&lt;p&gt;This broke constantly. Every Greenhouse page load includes a fresh CSRF token embedded in the HTML. Every page view produces a different hash even for the exact same form. The cache was useless.&lt;/p&gt;

&lt;p&gt;The fix: cache by &lt;code&gt;job_application_id + normalized_page_url&lt;/code&gt;. The DOM hash is still stored in the database but ignored for cache lookups. The same form at the same URL always returns the same cached plan regardless of what changed in the page source between visits.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Lever and Ashby split one job across two URLs
&lt;/h3&gt;

&lt;p&gt;On Lever, the job description lives at &lt;code&gt;jobs.lever.co/company/slug&lt;/code&gt;. The application form lives at &lt;code&gt;jobs.lever.co/company/slug/apply&lt;/code&gt;. These look like two different pages but they represent one job.&lt;/p&gt;

&lt;p&gt;If a user extracts the job on the description page and then navigates to the apply page, the backend needs to match the apply URL back to the original job record. The &lt;code&gt;extract_job_url_info()&lt;/code&gt; utility detects the &lt;code&gt;/apply&lt;/code&gt; or &lt;code&gt;/application&lt;/code&gt; suffix and strips it to get the canonical base URL before doing any database lookups.&lt;/p&gt;

&lt;p&gt;On the extension side, when the popup detects the user is on a job description page with a saved job, it shows an amber banner telling them to navigate to the application form. For Lever and Ashby, the banner includes a direct link constructed entirely client-side by appending &lt;code&gt;/apply&lt;/code&gt; or &lt;code&gt;/application&lt;/code&gt; to the current URL. No extra backend call needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 6: What Is Coming Next
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Streaming the autofill plan&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Right now the user sees a spinner for 5 to 10 seconds while the full plan generates. Gemini supports streaming and LangGraph supports streaming node outputs. The plan is to emit node completion events to the extension popup as each stage finishes, so users see live progress instead of one long blocking wait.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A retry node in the LangGraph DAG&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;On very long forms, Gemini occasionally returns fewer answers than there are fields. The current system fills missing fields with empty-string defaults. A better approach is a conditional retry edge in the graph: if the answer count is less than the field count, route back to &lt;code&gt;generate_answers&lt;/code&gt; with only the missing field signatures. LangGraph's conditional edges make this a clean addition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic resume matching&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The current resume match feature does a keyword overlap check between the job's required skills and the user's resume skills list. This misses "REST APIs" matching "API development" or "React.js" matching "React". The plan is to replace this with embedding-based similarity using a small vector store per user, which gives a much more accurate match score and surfaces genuinely missing skills rather than just unmatched strings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Playwright-based extraction fallback&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The browser-side dropdown opening works on about 90% of forms. The remaining 10% use heavy custom focus trapping, animated dropdowns with delays over 300ms, or React Select versions that ignore standard mouse events. A Playwright-powered extraction service on the backend would handle these edge cases by running a full headless browser with complete control over the page lifecycle, without requiring any changes to the extension.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;The most interesting thing I learned building this is that the Chrome extension is not just a UI layer. It is the only component in the system with access to the live, JavaScript-executed version of the page. That makes it the source of truth for form structure. The AI cannot do its job without what the browser extracts first.&lt;/p&gt;

&lt;p&gt;The second thing: &lt;strong&gt;structured JSON prompts beat prose prompts for precision tasks&lt;/strong&gt;. When I need Gemini to return exactly N answers with specific fields and constrained action values, a JSON prompt with rules expressed as an array performs better than a paragraph of instructions. The model treats it like a spec, not a suggestion.&lt;/p&gt;

&lt;p&gt;The combination of LangGraph for agent orchestration, Gemini's native JSON schema output mode, and the &lt;code&gt;DataTransfer&lt;/code&gt; API for file uploads turned out to be a surprisingly complete toolkit for this problem.&lt;/p&gt;

&lt;p&gt;If you are building something similar or have questions about any part of the architecture, drop a comment below.&lt;/p&gt;




&lt;p&gt;Built with ❤️ by Rahul&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.linkedin.com/in/rahul-reddy-t/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://github.com/rahult18" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://rahultalatala.netlify.app/" rel="noopener noreferrer"&gt;Portfolio&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>python</category>
      <category>agents</category>
    </item>
    <item>
      <title>Spring Core Fundamentals: A Beginner Guide</title>
      <dc:creator>Rahul Reddy Talatala</dc:creator>
      <pubDate>Wed, 12 Mar 2025 05:48:22 +0000</pubDate>
      <link>https://dev.to/rahul_talatala/spring-core-fundamentals-a-beginner-guide-3daa</link>
      <guid>https://dev.to/rahul_talatala/spring-core-fundamentals-a-beginner-guide-3daa</guid>
      <description>&lt;p&gt;Spring is one of the most popular frameworks in the Java ecosystem, known for its powerful dependency injection capabilities and extensive ecosystem. Whether you're just starting with Spring or looking to refresh your knowledge, this guide will walk you through the core concepts that make Spring such a powerful framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Spring vs Spring Boot
&lt;/h2&gt;

&lt;p&gt;Before diving into the core concepts, let's clarify the difference between Spring and Spring Boot:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Spring Framework&lt;/th&gt;
&lt;th&gt;Spring Boot&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Definition&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A comprehensive framework providing infrastructure support for Java applications&lt;/td&gt;
&lt;td&gt;An extension of Spring that simplifies development with pre-configured defaults&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Configuration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Requires extensive manual configuration&lt;/td&gt;
&lt;td&gt;Provides &lt;strong&gt;auto-configuration&lt;/strong&gt;, reducing manual setup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Standalone Apps&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Requires external servers&lt;/td&gt;
&lt;td&gt;Comes with &lt;strong&gt;embedded servers&lt;/strong&gt; for standalone applications&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Boilerplate Code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Requires more configuration code&lt;/td&gt;
&lt;td&gt;Reduces boilerplate with &lt;strong&gt;opinionated defaults&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dependency Management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual dependency management&lt;/td&gt;
&lt;td&gt;Uses &lt;strong&gt;Spring Boot Starter dependencies&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At its core, Spring framework is primarily a dependency injection container with additional convenience layers for database access, proxies, aspect-oriented programming, and web MVC.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dependency Injection: The Heart of Spring
&lt;/h2&gt;

&lt;p&gt;Dependency Injection (DI) is a design pattern where a class receives its dependencies from an external source rather than creating them itself. This promotes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loose coupling between components&lt;/li&gt;
&lt;li&gt;Improved testability&lt;/li&gt;
&lt;li&gt;Better code maintainability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Spring's core functionality is implementing this pattern effectively through its IoC container.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inversion of Control: Shifting Responsibility
&lt;/h2&gt;

&lt;p&gt;Inversion of Control (IoC) is a design principle where the control of object creation and lifecycle management is transferred from your application code to the Spring framework.&lt;/p&gt;

&lt;h3&gt;
  
  
  Traditional Approach vs IoC
&lt;/h3&gt;

&lt;p&gt;Without IoC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Car&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;Car&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Object creation inside the class (tight coupling)&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With IoC (Spring's approach):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Car&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="nd"&gt;@Autowired&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;Car&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Engine&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Spring injects the dependency&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key difference? With Spring's IoC, your application code doesn't control dependency creation - Spring does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Types of Dependency Injection in Spring
&lt;/h2&gt;

&lt;p&gt;Spring supports three primary types of dependency injection:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Constructor-Based Injection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Car&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// immutable&lt;/span&gt;
    &lt;span class="nd"&gt;@Autowired&lt;/span&gt; 
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;Car&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Engine&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Advantages&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dependencies are explicit&lt;/li&gt;
&lt;li&gt;Encourages immutability&lt;/li&gt;
&lt;li&gt;Objects are always created with required dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is generally the recommended approach in modern Spring applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Setter-Based Injection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Car&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="nd"&gt;@Autowired&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;setEngine&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Engine&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// optional dependency&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Advantages&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Allows optional dependencies&lt;/li&gt;
&lt;li&gt;Provides flexibility in modifying dependencies at runtime&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Objects might be in an incomplete state if dependencies aren't provided&lt;/li&gt;
&lt;li&gt;More verbose than constructor injection&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Field-Based Injection
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Car&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@Autowired&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Advantages&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Less boilerplate code&lt;/li&gt;
&lt;li&gt;Simple to implement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Disadvantages&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Harder to test&lt;/li&gt;
&lt;li&gt;Cannot enforce immutability&lt;/li&gt;
&lt;li&gt;Dependencies aren't explicitly declared&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Spring IoC Container: Managing Your Beans
&lt;/h2&gt;

&lt;p&gt;Spring provides two types of IoC containers to manage the lifecycle of beans:&lt;/p&gt;

&lt;h3&gt;
  
  
  BeanFactory: The Lightweight Container
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Provides basic bean management capabilities&lt;/li&gt;
&lt;li&gt;Uses &lt;strong&gt;lazy initialization&lt;/strong&gt; - beans are created only when requested&lt;/li&gt;
&lt;li&gt;Uses less memory and is suitable for resource-constrained environments&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ApplicationContext: The Feature-Rich Container
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Extends BeanFactory with enterprise features&lt;/li&gt;
&lt;li&gt;Uses &lt;strong&gt;eager initialization&lt;/strong&gt; - beans are created at startup&lt;/li&gt;
&lt;li&gt;Provides advanced features like:

&lt;ul&gt;
&lt;li&gt;Annotation-based dependency injection&lt;/li&gt;
&lt;li&gt;Event handling&lt;/li&gt;
&lt;li&gt;Internationalization support&lt;/li&gt;
&lt;li&gt;AOP integration&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Spring Boot automatically uses ApplicationContext for its enhanced feature set.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Spring Beans
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;bean&lt;/strong&gt; in Spring is simply an object created and managed by the Spring IoC container. Spring creates instances of classes annotated with &lt;code&gt;@Component&lt;/code&gt; (or its specialized annotations), manages their lifecycle, and injects them where needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bean Scopes: Controlling Instance Creation
&lt;/h3&gt;

&lt;p&gt;Spring offers several bean scopes that determine how instances are created and shared:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;singleton&lt;/strong&gt; (default)&lt;/td&gt;
&lt;td&gt;A single shared instance of the bean is created for the entire application&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;prototype&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A new instance is created each time the bean is requested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;request&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A new instance is created for each HTTP request (web applications only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;session&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A new instance is created for each user session (web applications only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;application&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A single instance for the entire ServletContext (web applications only)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can specify the scope using the &lt;code&gt;@Scope&lt;/code&gt; annotation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="nd"&gt;@Scope&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"prototype"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PrototypeBean&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Each request for this bean will create a new instance&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Essential Spring Annotations
&lt;/h2&gt;

&lt;p&gt;Spring provides a rich set of annotations to simplify configuration:&lt;/p&gt;

&lt;h3&gt;
  
  
  Component Annotations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;@Component&lt;/strong&gt;: Generic annotation for any Spring-managed component&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;@Service&lt;/strong&gt;: For service layer classes containing business logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;@Repository&lt;/strong&gt;: For data access objects handling database operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;@Controller&lt;/strong&gt;: For Spring MVC controllers handling web requests&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;@RestController&lt;/strong&gt;: For RESTful controllers (combines @Controller and @ResponseBody)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Configuration Annotations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;@Configuration&lt;/strong&gt;: Marks a class as a source of bean definitions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a class="mentioned-user" href="https://dev.to/bean"&gt;@bean&lt;/a&gt;&lt;/strong&gt;: Explicitly declares a bean method within a @Configuration class&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;@ComponentScan&lt;/strong&gt;: Tells Spring where to look for annotated components&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Dependency Injection Annotations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;@Autowired&lt;/strong&gt;: Marks a constructor, field, or setter method for automatic dependency injection
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Configuration&lt;/span&gt;
&lt;span class="nd"&gt;@ComponentScan&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"com.example"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AppConfig&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@Bean&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;Engine&lt;/span&gt; &lt;span class="nf"&gt;customEngine&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;TurboEngine&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Aspect-Oriented Programming with Spring
&lt;/h2&gt;

&lt;p&gt;Spring AOP allows you to separate cross-cutting concerns (like logging, security, transactions) from your business logic. It does this by using proxies to intercept method calls and apply additional behavior.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Aspect&lt;/span&gt;
&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LoggingAspect&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@Before&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"execution(* com.example.Service.*(..))"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;logBefore&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; 
        &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Method execution started..."&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; 
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key AOP concepts include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Aspect&lt;/strong&gt;: Module containing cross-cutting logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Join Point&lt;/strong&gt;: Execution point where aspect can be applied&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advice&lt;/strong&gt;: Action performed at a join point (before, after, around)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pointcut&lt;/strong&gt;: Expression that determines where advice should be applied&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weaving&lt;/strong&gt;: Process of applying aspects to target objects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A real-world example is Spring's &lt;code&gt;@Transactional&lt;/code&gt; annotation, which uses AOP to handle transaction management without cluttering your business logic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Spring Core provides a robust foundation for building Java applications by implementing dependency injection and inversion of control. By managing dependencies externally, Spring helps create more modular, testable, and maintainable applications.&lt;/p&gt;

&lt;p&gt;Understanding these fundamental concepts - IoC, DI, beans, annotations, and AOP - will give you a solid foundation for working with the Spring ecosystem, whether you're using Spring Framework directly or Spring Boot.&lt;/p&gt;

&lt;p&gt;The true power of Spring comes from how these concepts work together, allowing you to focus on your business logic while Spring handles the infrastructure concerns. &lt;/p&gt;

&lt;p&gt;Curious about building a rock-solid e-commerce system with Spring Boot microservices? Dive into my latest blog where I break it all down—resilience, scalability, and the magic of Spring! Check it out &lt;a href="https://dev.to/rahul_talatala/spring-commerce-building-a-resilient-e-commerce-system-with-spring-boot-microservices-43ek"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Happy Springing!🚀&lt;/p&gt;

&lt;p&gt;Built with ❤️ by Rahul&lt;br&gt;
&lt;a href="https://www.linkedin.com/in/rahul-reddy-t/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://github.com/rahult18" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://rahultalatala.netlify.app/" rel="noopener noreferrer"&gt;Portfolio&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Spring Commerce: Building a Resilient E-commerce System with Spring Boot Microservices</title>
      <dc:creator>Rahul Reddy Talatala</dc:creator>
      <pubDate>Wed, 12 Mar 2025 05:32:14 +0000</pubDate>
      <link>https://dev.to/rahul_talatala/spring-commerce-building-a-resilient-e-commerce-system-with-spring-boot-microservices-43ek</link>
      <guid>https://dev.to/rahul_talatala/spring-commerce-building-a-resilient-e-commerce-system-with-spring-boot-microservices-43ek</guid>
      <description>&lt;p&gt;Hola 👋, &lt;/p&gt;

&lt;p&gt;I recently completed a comprehensive microservices project using Spring Boot 3, implementing various modern patterns and technologies. Here's an overview of what I built and the key learnings from this journey.&lt;/p&gt;

&lt;p&gt;The project code is available &lt;a href="https://github.com/rahult18/springcommerce/" rel="noopener noreferrer"&gt;here&lt;/a&gt; on GitHub&lt;/p&gt;

&lt;p&gt;🌱 Before diving deep into the project, make sure to lay a solid foundation! Check out my Spring Boot basics blog first—trust me, it'll make everything click! 🔗 &lt;a href="https://dev.to/rahul_talatala/spring-core-fundamentals-a-beginner-guide-3daa"&gt;Click Here&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Overview
&lt;/h2&gt;

&lt;p&gt;I developed a simple e-commerce application with a microservices architecture. The system allows customers to browse products, place orders, and receive notifications. Instead of building a monolithic application, I split functionality into specialized services that communicate with each other.&lt;/p&gt;

&lt;p&gt;The project implements several crucial microservice architectural patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Service Discovery&lt;/li&gt;
&lt;li&gt;API Gateway&lt;/li&gt;
&lt;li&gt;Circuit Breaker&lt;/li&gt;
&lt;li&gt;Event-Driven Architecture&lt;/li&gt;
&lt;li&gt;Observability&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Project Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1feeot3hg1rqvkduast6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1feeot3hg1rqvkduast6.png" alt="Project Architecture" width="800" height="672"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The architecture consists of multiple services, an API Gateway, and supporting components like Kafka for messaging and Keycloak for security. Services communicate both synchronously (via REST) and asynchronously (via Kafka).&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Services
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Product Service
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Acts as a product catalog&lt;/li&gt;
&lt;li&gt;Built with Spring Boot, MongoDB&lt;/li&gt;
&lt;li&gt;Provides REST APIs to create and view products&lt;/li&gt;
&lt;li&gt;Implements CRUD operations for product management&lt;/li&gt;
&lt;li&gt;Technical implementation:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Document&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"product"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Product&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;    
    &lt;span class="nd"&gt;@Id&lt;/span&gt;    
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;    
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;    
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;    
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;BigDecimal&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Uses records for DTOs (Java 16+ feature) for clean, immutable data transfer:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="nf"&gt;ProductRequest&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;BigDecimal&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Order Service
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Handles customer orders&lt;/li&gt;
&lt;li&gt;Uses MySQL via Spring Data JPA with Flyway for database migrations&lt;/li&gt;
&lt;li&gt;Communicates with Inventory Service to check product availability via REST&lt;/li&gt;
&lt;li&gt;Sends events to Notification Service via Kafka&lt;/li&gt;
&lt;li&gt;Implements Resilience4J for circuit breaking and retry mechanisms&lt;/li&gt;
&lt;li&gt;Technical implementation includes transaction management:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="nd"&gt;@RequiredArgsConstructor&lt;/span&gt;
&lt;span class="nd"&gt;@Transactional&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderService&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;    
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;OrderRepository&lt;/span&gt; &lt;span class="n"&gt;orderRepository&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;    
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;InventoryClient&lt;/span&gt; &lt;span class="n"&gt;inventoryClient&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;    
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;KafkaTemplate&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;OrderPlacedEvent&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;kafkaTemplate&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;        
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;placeOrder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OrderRequest&lt;/span&gt; &lt;span class="n"&gt;orderRequest&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;        
        &lt;span class="c1"&gt;// Check inventory        &lt;/span&gt;
        &lt;span class="c1"&gt;// Create and save order        &lt;/span&gt;
        &lt;span class="c1"&gt;// Send event via Kafka &lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Inventory Service
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Manages product stock information&lt;/li&gt;
&lt;li&gt;Provides REST API to check if products are in stock&lt;/li&gt;
&lt;li&gt;Uses MySQL database with JPA&lt;/li&gt;
&lt;li&gt;Uses Flyway for database schema migrations&lt;/li&gt;
&lt;li&gt;Implementation example:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Transactional&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;readOnly&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="nf"&gt;isInStock&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;skuCode&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Integer&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;    
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;inventoryRepository&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;existsBySkuCodeAndQuantityIsGreaterThanEqual&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skuCode&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Notification Service
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Sends notification emails to customers&lt;/li&gt;
&lt;li&gt;Listens to Kafka events from Order Service&lt;/li&gt;
&lt;li&gt;Uses Spring Kafka and Avro for message serialization&lt;/li&gt;
&lt;li&gt;Implementation with Kafka listener:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@KafkaListener&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"order-placed"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OrderPlacedEvent&lt;/span&gt; &lt;span class="n"&gt;orderPlacedEvent&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;   
    &lt;span class="c1"&gt;// Send email notification&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key Patterns &amp;amp; Technologies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  API Gateway (Spring Cloud Gateway MVC)
&lt;/h3&gt;

&lt;p&gt;I implemented an API Gateway that serves as the entry point for all client requests. This simplified client interactions and provided a centralized point for cross-cutting concerns like security and routing.&lt;/p&gt;

&lt;p&gt;The routing configuration uses the functional programming model in Spring:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Configuration&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Routes&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@Bean&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;RouterFunction&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ServerResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;productServiceRoute&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;GatewayRouterFunctions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"product_service"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;route&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RequestPredicates&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/api/product"&lt;/span&gt;&lt;span class="o"&gt;),&lt;/span&gt; 
                    &lt;span class="nc"&gt;HandlerFunctions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;http&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"http://localhost:8080"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;circuitBreaker&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"productServiceCircuitBreaker"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"/fallbackRoute"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// More routes for other services...&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Security with Keycloak
&lt;/h3&gt;

&lt;p&gt;The API Gateway integrates with Keycloak for authentication and authorization, providing OAuth2 security for all services behind it. I configured the API Gateway as an OAuth2 Resource Server.&lt;/p&gt;

&lt;p&gt;Security configuration in the API Gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Configuration&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SecurityConfig&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@Bean&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;SecurityFilterChain&lt;/span&gt; &lt;span class="nf"&gt;securityFilterChain&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;HttpSecurity&lt;/span&gt; &lt;span class="n"&gt;httpSecurity&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;httpSecurity&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;authorizeHttpRequests&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;authorize&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;authorize&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;anyRequest&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;authenticated&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;oauth2ResourceServer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;oauth2&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;oauth2&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;jwt&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Customizer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withDefaults&lt;/span&gt;&lt;span class="o"&gt;()))&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Application properties configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;spring.security.oauth2.resourceserver.jwt.issuer-uri=http://localhost:8181/realms/springcommerce&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Circuit Breaker Pattern (Resilience4J)
&lt;/h3&gt;

&lt;p&gt;To prevent cascading failures between services, I implemented circuit breakers using Resilience4J. This helps maintain system stability when individual services fail.&lt;/p&gt;

&lt;p&gt;Resilience4J operates in three states:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Closed&lt;/strong&gt;: Normal operation, requests flow through&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open&lt;/strong&gt;: Circuit is broken, requests are blocked after failure threshold is reached&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Half-Open&lt;/strong&gt;: Testing recovery by allowing limited requests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Configuration in application.properties:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Circuit Breaker configuration
&lt;/span&gt;&lt;span class="py"&gt;resilience4j.circuitbreaker.configs.default.registerHealthIndicator&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;resilience4j.circuitbreaker.configs.default.slidingWindowType&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;COUNT_BASED&lt;/span&gt;
&lt;span class="py"&gt;resilience4j.circuitbreaker.configs.default.slidingWindowSize&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;10&lt;/span&gt;
&lt;span class="py"&gt;resilience4j.circuitbreaker.configs.default.failureRateThreshold&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;50&lt;/span&gt;
&lt;span class="py"&gt;resilience4j.circuitbreaker.configs.default.waitDurationInOpenState&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;5s&lt;/span&gt;
&lt;span class="py"&gt;resilience4j.circuitbreaker.configs.default.permittedNumberOfCallsInHalfOpenState&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;
&lt;span class="py"&gt;resilience4j.circuitbreaker.configs.default.automaticTransitionFromOpenToHalfOpenEnabled&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Timeout configuration
&lt;/span&gt;&lt;span class="py"&gt;resilience4j.timelimiter.configs.default.timeout-duration&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;3s&lt;/span&gt;

&lt;span class="c"&gt;# Retry configuration
&lt;/span&gt;&lt;span class="py"&gt;resilience4j.retry.configs.default.max-attempts&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;
&lt;span class="py"&gt;resilience4j.retry.configs.default.wait-duration&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;2s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In my implementation, if 50% of requests fail in a window of 10 requests, the circuit opens for 5 seconds, then transitions to half-open where it allows 3 test requests.&lt;/p&gt;

&lt;h3&gt;
  
  
  Asynchronous Communication (Kafka)
&lt;/h3&gt;

&lt;p&gt;For order notifications, I set up async communication between Order Service and Notification Service using Kafka, implementing a true event-driven architecture.&lt;/p&gt;

&lt;p&gt;I used Apache Avro for schema definition and Confluent Schema Registry for schema management:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"record"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OrderPlacedEvent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"namespace"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"com.springcommerce.order_service.event"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fields"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"orderNumber"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"firstName"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lastName"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Producing events in the Order Service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Inside OrderService.java&lt;/span&gt;
&lt;span class="nc"&gt;OrderPlacedEvent&lt;/span&gt; &lt;span class="n"&gt;orderPlacedEvent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OrderPlacedEvent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getOrderNumber&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;orderRequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;userDetails&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;orderRequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;userDetails&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;firstName&lt;/span&gt;&lt;span class="o"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;orderRequest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;userDetails&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;lastName&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;kafkaTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"order-placed"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;orderPlacedEvent&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Consuming events in the Notification Service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="nd"&gt;@RequiredArgsConstructor&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;NotificationService&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;JavaMailSender&lt;/span&gt; &lt;span class="n"&gt;mailSender&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="nd"&gt;@KafkaListener&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"order-placed"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OrderPlacedEvent&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Send email notification logic&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Observability with Grafana Stack
&lt;/h3&gt;

&lt;p&gt;I integrated the complete Grafana stack to implement comprehensive observability across all services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Grafana&lt;/strong&gt;: Visualization dashboard for all observability data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prometheus&lt;/strong&gt;: Time-series database for collecting and querying metrics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loki&lt;/strong&gt;: Log aggregation system (similar to Elasticsearch)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tempo&lt;/strong&gt;: Distributed tracing backend&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For logging, I configured Loki integration using logback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;configuration&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;appender&lt;/span&gt; &lt;span class="na"&gt;name=&lt;/span&gt;&lt;span class="s"&gt;"LOKI"&lt;/span&gt; &lt;span class="na"&gt;class=&lt;/span&gt;&lt;span class="s"&gt;"com.github.loki4j.logback.Loki4jAppender"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;http&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;url&amp;gt;&lt;/span&gt;http://localhost:3100/loki/api/v1/push&lt;span class="nt"&gt;&amp;lt;/url&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/http&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;format&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;label&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;pattern&amp;gt;&lt;/span&gt;application=${appName},host=${HOSTNAME},level=%level&lt;span class="nt"&gt;&amp;lt;/pattern&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;/label&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;message&amp;gt;&lt;/span&gt;
                &lt;span class="nt"&gt;&amp;lt;pattern&amp;gt;&lt;/span&gt;${FILE_LOG_PATTERN}&lt;span class="nt"&gt;&amp;lt;/pattern&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;/message&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/format&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/appender&amp;gt;&lt;/span&gt;

    &lt;span class="nt"&gt;&amp;lt;root&lt;/span&gt; &lt;span class="na"&gt;level=&lt;/span&gt;&lt;span class="s"&gt;"INFO"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;appender-ref&lt;/span&gt; &lt;span class="na"&gt;ref=&lt;/span&gt;&lt;span class="s"&gt;"LOKI"&lt;/span&gt;&lt;span class="nt"&gt;/&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;/root&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/configuration&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For metrics, I used Spring Boot Actuator with Micrometer Prometheus registry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;management.endpoints.web.exposure.include&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;health, info, metrics, prometheus&lt;/span&gt;
&lt;span class="py"&gt;management.metrics.distribution.percentiles-histogram.http.server.requests&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;management.observations.key-values.application&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;order-service&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For tracing, I implemented distributed tracing with Micrometer Tracing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Observed&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderRepository&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Repository methods&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This setup provides a complete picture of system behavior, making debugging and performance analysis much easier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing Strategy
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Integration Testing with TestContainers
&lt;/h3&gt;

&lt;p&gt;Rather than mocking databases, I used TestContainers to spin up actual database instances (MongoDB, MySQL) during tests. This provides a more realistic testing environment.&lt;/p&gt;

&lt;p&gt;TestContainers implementation for Product Service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@SpringBootTest&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;webEnvironment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SpringBootTest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;WebEnvironment&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;RANDOM_PORT&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ProductServiceApplicationTests&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@ServiceConnection&lt;/span&gt;
    &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="nc"&gt;MongoDBContainer&lt;/span&gt; &lt;span class="n"&gt;mongoDBContainer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;MongoDBContainer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"mongo:7.0.5"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="nd"&gt;@LocalServerPort&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;Integer&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="nd"&gt;@BeforeEach&lt;/span&gt;
    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;setup&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;RestAssured&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;baseURI&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://localhost"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="nc"&gt;RestAssured&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;mongoDBContainer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Test&lt;/span&gt;
    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;shouldCreateProduct&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Test code using RestAssured&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  WireMock for Service Simulation
&lt;/h3&gt;

&lt;p&gt;For testing the Order Service without depending on the actual Inventory Service, I used WireMock to simulate the Inventory Service's responses, making tests more reliable and independent.&lt;/p&gt;

&lt;p&gt;Setting up WireMock for testing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@SpringBootTest&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;webEnvironment&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SpringBootTest&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;WebEnvironment&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;RANDOM_PORT&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@AutoConfigureWireMock&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// Dynamic port allocation&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OrderServiceApplicationTests&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@Test&lt;/span&gt;
    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;shouldSubmitOrder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Stub the inventory service call&lt;/span&gt;
        &lt;span class="n"&gt;stubFor&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urlEqualTo&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/api/inventory?skuCode=iphone_15&amp;amp;quantity=1"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;willReturn&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;aResponse&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withStatus&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withHeader&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Content-Type"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"application/json"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;withBody&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="o"&gt;)));&lt;/span&gt;

        &lt;span class="c1"&gt;// Test order submission&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test properties:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;inventory.url=http://localhost:${wiremock.server.port}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach ensures that our Order Service tests aren't affected by the availability or behavior of the actual Inventory Service.&lt;/p&gt;

&lt;h2&gt;
  
  
  Development Challenges &amp;amp; Solutions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Challenge 1: Implementing Retry Mechanisms in Order Service
&lt;/h3&gt;

&lt;p&gt;When the Inventory Service was temporarily unavailable, the Order Service would fail completely. I needed a way to make it more resilient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implemented Resilience4J's retry mechanism in the Order Service&lt;/li&gt;
&lt;li&gt;Configured optimal retry parameters (3 retries with exponential backoff)&lt;/li&gt;
&lt;li&gt;Added fallback mechanisms for when retries were exhausted&lt;/li&gt;
&lt;li&gt;Used Circuit Breaker to prevent overwhelming the system with retries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Implementation in the Inventory Client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@CircuitBreaker&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"inventory"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fallbackMethod&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"fallbackMethod"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@Retry&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"inventory"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="nf"&gt;isInStock&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@RequestParam&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;skuCode&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nd"&gt;@RequestParam&lt;/span&gt; &lt;span class="nc"&gt;Integer&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="nf"&gt;fallbackMethod&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Integer&lt;/span&gt; &lt;span class="n"&gt;quantity&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Throwable&lt;/span&gt; &lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Cannot get inventory for skucode {}, failure reason: {}"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; 
             &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exception&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getMessage&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configuration for the retry mechanism:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;resilience4j.retry.instances.inventory.max-attempts=3&lt;/span&gt;
&lt;span class="s"&gt;resilience4j.retry.instances.inventory.wait-duration=5s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trickiest part was finding the right balance - too aggressive retrying would flood the system, while too cautious an approach would affect user experience. Testing different configurations under various failure scenarios was essential to find the optimal setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenge 2: Setting Up Distributed Tracing
&lt;/h3&gt;

&lt;p&gt;Tracking requests as they traveled between services was difficult. When issues occurred, I couldn't easily see which service was causing the problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Integrated Spring Boot's tracing capabilities using Micrometer&lt;/li&gt;
&lt;li&gt;Set up Grafana Tempo to collect and visualize traces&lt;/li&gt;
&lt;li&gt;Added trace IDs to logs for correlation&lt;/li&gt;
&lt;li&gt;Ensured proper propagation of trace context between services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Dependencies added to implement tracing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.micrometer&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;micrometer-tracing-bridge-brave&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.zipkin.reporter2&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;zipkin-reporter-brave&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configuration for tracing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Configuration&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ObservationConfig&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nd"&gt;@Bean&lt;/span&gt;
    &lt;span class="nc"&gt;ObservedAspect&lt;/span&gt; &lt;span class="nf"&gt;observedAspect&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ObservationRegistry&lt;/span&gt; &lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;ObservedAspect&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;registry&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Property configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;management.tracing.sampling.probability=1.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tempo configuration in docker-compose.yml:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;tempo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;grafana/tempo:2.2.2&lt;/span&gt;
  &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-config.file=/etc/tempo.yaml'&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./docker/tempo/tempo.yml:/etc/tempo.yaml:ro&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./docker/tempo/tempo-data:/tmp/tempo&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3110:3100'&lt;/span&gt;  &lt;span class="c1"&gt;# Tempo&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;9411:9411'&lt;/span&gt;  &lt;span class="c1"&gt;# zipkin&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gave me an end-to-end view of request flows, making it much easier to debug issues across service boundaries. The most challenging aspect was ensuring that trace context properly propagated across all services, especially when using different communication methods (REST vs Kafka).&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Learnings
&lt;/h2&gt;

&lt;p&gt;This project reinforced that successful microservices architectures need more than just splitting services - they require careful implementation of patterns for resilience, observability, and communication.&lt;/p&gt;

&lt;p&gt;Key technical insights gained:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Design for Failure&lt;/strong&gt;: In distributed systems, failures are inevitable. Implementing circuit breakers, retries, and fallbacks is essential.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability is Not Optional&lt;/strong&gt;: Without proper logging, metrics, and tracing, debugging distributed systems becomes nearly impossible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Consistency Challenges&lt;/strong&gt;: Managing data consistency across services requires careful design. Use eventual consistency where appropriate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test Infrastructure Matters&lt;/strong&gt;: Using tools like TestContainers and WireMock resulted in more reliable tests that better reflect production behavior.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Requires Planning&lt;/strong&gt;: Implementing OAuth2 with KeyCloak showed the importance of designing security from the beginning.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Happy Springing! 🚀&lt;/p&gt;

&lt;p&gt;Built with ❤️ by Rahul&lt;br&gt;
&lt;a href="https://www.linkedin.com/in/rahul-reddy-t/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://github.com/rahult18" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://rahultalatala.netlify.app/" rel="noopener noreferrer"&gt;Portfolio&lt;/a&gt;&lt;/p&gt;

</description>
      <category>springboot</category>
      <category>microservices</category>
      <category>kubernetes</category>
      <category>programming</category>
    </item>
    <item>
      <title>AtmoFlow: Breathing Life into Data - Real Time Weather and Air Quality Insights</title>
      <dc:creator>Rahul Reddy Talatala</dc:creator>
      <pubDate>Mon, 20 Jan 2025 07:26:13 +0000</pubDate>
      <link>https://dev.to/rahul_talatala/atmoflow-breathing-life-into-data-real-time-weather-and-air-quality-insights-5db0</link>
      <guid>https://dev.to/rahul_talatala/atmoflow-breathing-life-into-data-real-time-weather-and-air-quality-insights-5db0</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In the realm of data engineering, few challenges are as exciting and impactful as harnessing the power of weather and air quality data. The AtmoFlow project embarks on a thrilling journey to create a robust, scalable data pipeline that leverages the capabilities of Google Cloud Platform (GCP) to process, analyze, and visualize this critical environmental data. Join us as we dive deep into the technical intricacies and architectural decisions that bring AtmoFlow to life.&lt;/p&gt;

&lt;p&gt;The project code is available &lt;a href="https://github.com/rahult18/atmo-flow" rel="noopener noreferrer"&gt;here&lt;/a&gt; on GitHub&lt;/p&gt;

&lt;h2&gt;
  
  
  GCP Architecture Overview
&lt;/h2&gt;

&lt;p&gt;At the heart of AtmoFlow lies a carefully crafted GCP architecture that enables seamless data ingestion, processing, and storage. The key components include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud Functions: Serverless functions that handle data collection from APIs and historical sources.&lt;/li&gt;
&lt;li&gt;Pub/Sub: A reliable messaging system for ingesting streaming data.&lt;/li&gt;
&lt;li&gt;Cloud Storage: Scalable object storage for storing raw data files.&lt;/li&gt;
&lt;li&gt;Dataproc: Managed Hadoop and Spark clusters for distributed data processing.&lt;/li&gt;
&lt;li&gt;BigQuery: A serverless, highly-scalable data warehouse for storing processed data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These GCP services work in harmony to create a robust foundation for AtmoFlow's data engineering pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Architecture
&lt;/h2&gt;

&lt;p&gt;AtmoFlow's architecture is designed to handle both batch and streaming data efficiently. Here's a high-level overview:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fue6uaxzjkha6dyxu1mo0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fue6uaxzjkha6dyxu1mo0.png" alt=" " width="800" height="1411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Data Ingestion:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud Functions collect data from weather and air quality APIs and historical files.&lt;/li&gt;
&lt;li&gt;Streaming data is published to Pub/Sub topics.&lt;/li&gt;
&lt;li&gt;Batch data is stored in Cloud Storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Data Processing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dataproc clusters are created on-demand using the &lt;code&gt;DataprocCreateClusterOperator&lt;/code&gt; in the Cloud Composer DAG.&lt;/li&gt;
&lt;li&gt;PySpark jobs are submitted to the cluster using the &lt;code&gt;DataprocSubmitJobOperator&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The PySpark code (&lt;code&gt;dataproc_stream_batch_pyspark.py&lt;/code&gt;) handles the core data processing logic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Data Storage:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Processed data is stored in BigQuery tables, partitioned and clustered for optimal querying.&lt;/li&gt;
&lt;li&gt;Fact and dimension tables are created to enable efficient analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Orchestration:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloud Composer, built on Apache Airflow, orchestrates the entire pipeline.&lt;/li&gt;
&lt;li&gt;The DAG (&lt;code&gt;cloud_composer.py&lt;/code&gt;) defines the tasks and their dependencies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;5. Visualization:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Looker is integrated with BigQuery to create interactive dashboards.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Data Sources and Ingestion
&lt;/h2&gt;

&lt;p&gt;AtmoFlow combines data from multiple sources to create a comprehensive dataset:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Weather data is fetched from the Open-Meteo API using the &lt;code&gt;openmeteo-requests&lt;/code&gt; library.&lt;/li&gt;
&lt;li&gt;Air quality data is fetched from the Open-Meteo Air Quality API.&lt;/li&gt;
&lt;li&gt;Historical data files are stored in Cloud Storage.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Batch Data Collection&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_air_quality_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_date&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch historical air quality data&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;air_quality_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latitude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LATITUDE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;longitude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LONGITUDE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;start_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;start_date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;end_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;end_date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strftime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%Y-%m-%d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hourly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pm10&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pm2_5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;carbon_monoxide&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nitrogen_dioxide&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sulphur_dioxide&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ozone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aerosol_optical_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dust&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;uv_index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us_aqi&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;# API request and response handling
&lt;/span&gt;    &lt;span class="c1"&gt;# Returns: DataFrame with historical air quality metrics
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Streaming Data Collection&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;publish_to_topic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Publish data to a Pub/Sub topic with dead letter handling&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;json_string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;data_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json_string&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;future&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;future&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Dead letter queue handling
&lt;/span&gt;        &lt;span class="n"&gt;error_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;original_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="n"&gt;dead_letter_future&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;publisher&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;DEAD_LETTER_TOPIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
            &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error_data&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;batch_cloud_function.py&lt;/code&gt; and &lt;code&gt;streaming_cloud_function.py&lt;/code&gt; scripts handle the data ingestion process. They use the &lt;code&gt;requests-cache&lt;/code&gt; library to optimize API requests and implement retry logic for improved reliability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Processing with Dataproc and PySpark
&lt;/h2&gt;

&lt;p&gt;The heart of AtmoFlow's data processing lies in the &lt;code&gt;dataproc_stream_batch_pyspark.py&lt;/code&gt; script, which runs on Dataproc clusters. The script performs the following key tasks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reads historical data from Cloud Storage using the defined schemas.&lt;/li&gt;
&lt;li&gt;Processes streaming data from Pub/Sub topics.&lt;/li&gt;
&lt;li&gt;Merges batch and streaming data using sliding windows and watermarks.&lt;/li&gt;
&lt;li&gt;Validates data quality using configurable thresholds and required columns.&lt;/li&gt;
&lt;li&gt;Creates dimension tables (e.g., time, location, weather condition, air quality status).&lt;/li&gt;
&lt;li&gt;Creates fact tables with derived metrics and aggregations.&lt;/li&gt;
&lt;li&gt;Writes processed data to BigQuery tables with partitioning and clustering.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Data Quality Monitoring&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;required_columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Enhanced data validation with configurable thresholds&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;total_rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Validate required columns
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;required_columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;missing_columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;required_columns&lt;/span&gt; 
                         &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;missing_columns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;DataQualityError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing columns in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;missing_columns&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Check for null values
&lt;/span&gt;    &lt;span class="n"&gt;null_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isNull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;int&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;
    &lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Merging Stream and Batch Data&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;merge_batch_and_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stream_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                          &lt;span class="n"&gt;window_duration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1 hour&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                          &lt;span class="n"&gt;watermark_delay&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10 minutes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Merge batch and streaming data with deduplication&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;stream_with_watermark&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stream_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withWatermark&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;watermark_delay&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create time windows
&lt;/span&gt;    &lt;span class="n"&gt;windowed_batch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;window&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nf"&gt;window&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;window_duration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Combine and deduplicate
&lt;/span&gt;    &lt;span class="n"&gt;merged_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;windowed_batch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unionByName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;windowed_stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;allowMissingColumns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script leverages PySpark's DataFrame API and SQL functions extensively to perform complex transformations and aggregations efficiently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dimensional Modeling and Data Warehouse
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwotuy3cmd5r9g2tuwln.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwotuy3cmd5r9g2tuwln.png" alt="Facts and Dimensions" width="800" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AtmoFlow employs dimensional modeling techniques to create a structured and optimized data warehouse in BigQuery. The key components include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Fact Tables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;AirQualityFact&lt;/code&gt;: Stores air quality measurements with time, location, and status dimensions.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WeatherFact&lt;/code&gt;: Stores weather measurements with time, location, condition, severity, and season dimensions.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Dimension Tables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;TimeDim&lt;/code&gt;: Represents time hierarchy with various time attributes.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LocationDim&lt;/code&gt;: Represents location hierarchy with geographic attributes.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;AirQualityStatusDim&lt;/code&gt;: Stores air quality status classifications and descriptions.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WeatherConditionDim&lt;/code&gt;: Stores weather condition classifications and descriptions.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SeasonDim&lt;/code&gt;: Represents seasonal characteristics.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;SeverityDim&lt;/code&gt;: Represents severity levels and impact information.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;The dimension tables are designed to provide rich context and enable efficient querying and analysis of the fact data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Harmonization
&lt;/h2&gt;

&lt;p&gt;To prepare the data for machine learning applications, AtmoFlow creates a harmonized dataset that combines weather and air quality features. The &lt;code&gt;create_harmonized_data&lt;/code&gt; function in the PySpark script joins the relevant data sources and performs feature engineering tasks, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Adding derived features like air density and dew point.&lt;/li&gt;
&lt;li&gt;Normalizing and scaling numerical features.&lt;/li&gt;
&lt;li&gt;Encoding categorical variables.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_harmonized_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;weather_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;air_quality_df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Create harmonized dataset combining weather and air quality metrics.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Add location key to both datasets
&lt;/span&gt;        &lt;span class="n"&gt;location_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;LATITUDE&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;LONGITUDE&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Select and rename weather features
&lt;/span&gt;        &lt;span class="n"&gt;weather_features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;weather_df&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;lit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location_key&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="bp"&gt;...&lt;/span&gt;
            &lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="c1"&gt;# Select air quality features
&lt;/span&gt;        &lt;span class="n"&gt;air_quality_features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;air_quality_df&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;lit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location_key&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
               &lt;span class="bp"&gt;...&lt;/span&gt;
            &lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="c1"&gt;# Combine features and add derived metrics
&lt;/span&gt;        &lt;span class="n"&gt;harmonized_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;weather_features&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;air_quality_features&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;outer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;air_density&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Calculate air density
&lt;/span&gt;                &lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;surface_pressure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;287.05&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature_2m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;273.15&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dew_point&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Calculate dew point
&lt;/span&gt;                &lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature_2m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;relative_humidity_2m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;harmonized_df&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The resulting harmonized dataset is stored in BigQuery and can be easily consumed by machine learning pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Orchestration with Cloud Composer
&lt;/h2&gt;

&lt;p&gt;Cloud Composer, based on Apache Airflow, serves as the orchestration layer for AtmoFlow. The &lt;code&gt;cloud_composer.py&lt;/code&gt; script defines the DAG that manages the entire pipeline. The key tasks include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Triggering the batch and streaming data collection Cloud Functions.&lt;/li&gt;
&lt;li&gt;Creating a Dataproc cluster with the specified configuration.&lt;/li&gt;
&lt;li&gt;Submitting the PySpark job to the Dataproc cluster.&lt;/li&gt;
&lt;li&gt;Deleting the Dataproc cluster after the job completion.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlmo2maaanur3b1fsh3y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmlmo2maaanur3b1fsh3y.png" alt="Cloud Composer DAG" width="800" height="671"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cloud Composer allows for easy scheduling, monitoring, and management of the pipeline, ensuring smooth execution and enabling data freshness.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Visualization and Dashboarding
&lt;/h2&gt;

&lt;p&gt;To unlock the full potential of the processed data, AtmoFlow integrates with Looker for data visualization and dashboarding. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0o1aisqg6u1053cd8mpf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0o1aisqg6u1053cd8mpf.png" alt="Looker Studio Dashboard UI" width="800" height="593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looker connects directly to the BigQuery tables, allowing users to create interactive and insightful visualizations. Key metrics and trends related to weather patterns, air quality levels, and their impact on various factors can be easily explored and analyzed.&lt;/p&gt;

&lt;p&gt;This is the &lt;a href="https://lookerstudio.google.com/reporting/8e41259e-4a59-4f36-9af3-98e54c02ac4f" rel="noopener noreferrer"&gt;link&lt;/a&gt; to the Looker Dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges and Lessons Learned
&lt;/h2&gt;

&lt;p&gt;Building a robust data pipeline like AtmoFlow comes with its fair share of challenges. Some key lessons learned include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Complex Data Processing Architecture:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Challenge:&lt;/strong&gt; The system needed to handle both historical weather data (batch) and real-time updates (streaming) while maintaining data consistency and accuracy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solution:&lt;/strong&gt; Built a dual-pipeline system using PySpark that processes both batch and streaming data. Implemented time-based windows to organize data arrival and created a deduplication system to ensure data integrity.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;merge_batch_and_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stream_df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Add time tracking for late data
&lt;/span&gt;    &lt;span class="n"&gt;stream_with_watermark&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stream_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withWatermark&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;10 minutes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Combine streams and remove duplicates
&lt;/span&gt;    &lt;span class="n"&gt;merged_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;batch_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unionByName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stream_with_watermark&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;deduplicated_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;merged_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dropDuplicates&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Data Quality Management:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Challenge:&lt;/strong&gt; Weather and air quality data frequently contained missing values, incorrect readings, or arrived late. The system required robust validation to ensure data reliability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solution:&lt;/strong&gt; Implemented configurable validation rules, built comprehensive error tracking, and developed automatic data cleaning procedures with alert systems for quality issues.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Monitor data quality with null checks
&lt;/span&gt;    &lt;span class="n"&gt;null_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isNull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;int&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;alias&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Raise alerts for quality issues
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;null_percentage&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;DataQualityError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Missing value threshold exceeded in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;table_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;BigQuery Optimization:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Challenge:&lt;/strong&gt; Growing data volumes led to increased query latency and costs. The system needed optimization for both performance and cost-effectiveness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solution:&lt;/strong&gt; Implemented strategic partitioning and clustering in BigQuery tables. Designed efficient table structures and query patterns to minimize resource usage.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;OR&lt;/span&gt; &lt;span class="k"&gt;REPLACE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;weather_air_quality&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WeatherFact&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;CLUSTER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;location_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;severity_key&lt;/span&gt;
&lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;staging_weather_data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Complex Data Harmonization:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Challenge:&lt;/strong&gt; The project required combining weather and air quality data from different formats into a unified, analyzable dataset.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Solution:&lt;/strong&gt; Developed a standardized data format and built a harmonization system that combines multiple data sources while adding derived metrics for enhanced analysis.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_harmonized_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;weather_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;air_quality_df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Combine different data sources
&lt;/span&gt;    &lt;span class="n"&gt;harmonized_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;weather_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;air_quality_df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Add calculated metrics
&lt;/span&gt;    &lt;span class="n"&gt;harmonized_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;harmonized_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;air_quality_index&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nf"&gt;calculate_aqi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pm25&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pm10&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;AtmoFlow showcases the power and potential of data engineering in the cloud, specifically leveraging GCP's robust suite of tools and services. By harnessing the capabilities of Cloud Functions, Pub/Sub, Dataproc, BigQuery, and Cloud Composer, AtmoFlow creates a seamless end-to-end pipeline for processing and analyzing weather and air quality data.&lt;/p&gt;

&lt;p&gt;Happy data engineering!🚀&lt;/p&gt;

&lt;p&gt;Built with ❤️ by Rahul&lt;br&gt;
&lt;a href="https://www.linkedin.com/in/rahul-reddy-t/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://github.com/rahult18" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://rahultalatala.netlify.app/" rel="noopener noreferrer"&gt;Portfolio&lt;/a&gt;&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>googlecloud</category>
      <category>datascience</category>
      <category>learning</category>
    </item>
  </channel>
</rss>
