<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kamaumbugua-dev</title>
    <description>The latest articles on DEV Community by Kamaumbugua-dev (@kamaumbuguadev).</description>
    <link>https://dev.to/kamaumbuguadev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3040183%2Fda0611d3-58a1-45f7-8ec1-ed0c4e14ea8a.png</url>
      <title>DEV Community: Kamaumbugua-dev</title>
      <link>https://dev.to/kamaumbuguadev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kamaumbuguadev"/>
    <language>en</language>
    <item>
      <title>I Built an AI That Finds Your Bugs and Rewrites Your Code to Fix Them.</title>
      <dc:creator>Kamaumbugua-dev</dc:creator>
      <pubDate>Sun, 15 Mar 2026 21:36:43 +0000</pubDate>
      <link>https://dev.to/kamaumbuguadev/i-built-an-ai-that-finds-your-bugs-and-rewrites-your-code-to-fix-them-4e21</link>
      <guid>https://dev.to/kamaumbuguadev/i-built-an-ai-that-finds-your-bugs-and-rewrites-your-code-to-fix-them-4e21</guid>
      <description>&lt;p&gt;How I built CodeLens — a Groq-powered code review tool that detects SQL injection, memory leaks, and O(n²) algorithms, then rewrites your entire file with all issues resolved. Full breakdown of the architecture, prompt engineering tricks, and the LLM hallucination problem I had to solve.&lt;/p&gt;

&lt;p&gt;Every developer has shipped a bug they should have caught.&lt;/p&gt;

&lt;p&gt;Not because they were careless. Because code review is expensive. You're scanning hundreds of lines for subtle patterns: a missing &lt;code&gt;conn.close()&lt;/code&gt;, an f-string wired directly into a SQL query, a nested loop that looks innocent at &lt;code&gt;n = 10&lt;/code&gt; but detonates at &lt;code&gt;n = 10,000&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I wanted to build a tool that never gets tired, never misses a pattern, and can tell you exactly what will go wrong in production — before you push.&lt;/p&gt;

&lt;p&gt;That's &lt;strong&gt;CodeLens&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What It Does
&lt;/h2&gt;

&lt;p&gt;Paste any code. In seconds you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;health score&lt;/strong&gt; (0–100) with an animated gauge&lt;/li&gt;
&lt;li&gt;Every vulnerability categorized by severity: &lt;code&gt;CRITICAL&lt;/code&gt;, &lt;code&gt;WARNING&lt;/code&gt;, &lt;code&gt;INFO&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Exact line numbers, descriptions, fix suggestions, and predicted production impact&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;"Rework Code"&lt;/strong&gt; button that rewrites your entire file with every issue resolved, with inline &lt;code&gt;# FIX:&lt;/code&gt; comments explaining each change&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's what it catches on a simple Python file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CRITICAL  SQL Injection            L7     f-string in cursor.execute()
CRITICAL  Hardcoded Credentials    L27    password = "admin123"
CRITICAL  Unsafe eval()            L29    eval(open("config.txt").read())
CRITICAL  Plaintext Card Numbers   L15    print(f"...card {card_number}")
WARNING   Resource Leak             L16    file handle never closed
WARNING   Resource Leak             L42    db connection never closed
WARNING   O(n²) Complexity          L46    nested loop over same list
WARNING   Unbounded Cache           L38    dict with no eviction policy
INFO      Division by Zero Risk     L50    len(transactions) unchecked
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Health score: &lt;strong&gt;28 / 100&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;One click later, the LLM rewrites the file. Every issue fixed. Every change commented.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;p&gt;Deliberately lean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;React 19 (Vercel)  →  FastAPI (Render)  →  Groq API (llama-3.3-70b)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No database. No auth. No queue. Every request is stateless — code goes in, analysis comes out.&lt;/p&gt;

&lt;p&gt;The frontend is a three-panel layout:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Code editor&lt;/strong&gt; — line numbers highlight affected lines in red&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analysis dashboard&lt;/strong&gt; — health gauge, metric bars, issue list with severity filters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vulnerability slides&lt;/strong&gt; — right panel with CSS scroll-snap, one full-height card per vulnerability&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The backend has three endpoints worth talking about: &lt;code&gt;/analyze&lt;/code&gt;, &lt;code&gt;/fix&lt;/code&gt;, and &lt;code&gt;/github/analyze&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hard Part: Getting the LLM to Return Valid JSON Every Time
&lt;/h2&gt;

&lt;p&gt;The analysis response needs to be machine-parseable. Every time. Across any language, any code quality, any edge case.&lt;/p&gt;

&lt;p&gt;This is harder than it sounds. By default, models wrap JSON in markdown fences, add explanatory preamble, or truncate responses mid-object when they hit a token limit. Any of these breaks the frontend.&lt;/p&gt;

&lt;p&gt;My system prompt ends with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Return ONLY valid JSON. No markdown, no code fences, no explanation outside the JSON.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And I strip artifacts post-response with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;raw_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;^```

(?:json)?\s*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;raw_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\s*

```$&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This handles 99% of cases. The remaining 1% raises a &lt;code&gt;json.JSONDecodeError&lt;/code&gt; that returns a structured 500 to the client.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Line Number Hallucination Problem
&lt;/h2&gt;

&lt;p&gt;This was the most interesting bug I fixed.&lt;/p&gt;

&lt;p&gt;Early versions of CodeLens would confidently report issues on lines that didn't exist. A 50-line file would get issues flagged at lines 73, 91, 108. The model was pattern-matching against training data — it recognized the &lt;em&gt;type&lt;/em&gt; of bug and estimated a line number based on where it typically appears in codebases it had seen, not in the code you gave it.&lt;/p&gt;

&lt;p&gt;The fix is obvious in hindsight: &lt;strong&gt;give the model line numbers to reference.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of sending:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM users WHERE username = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I send:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlite3&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
&lt;span class="mi"&gt;4&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;     &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM users WHERE username = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And I add an explicit constraint to the prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The code has 50 lines total. You MUST only reference line numbers
that actually exist (1 to 50).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_line_numbers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;splitlines&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;rjust&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; | &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hallucinated line numbers dropped to near zero. The model now has a concrete anchor instead of a floating reference.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Rework Pipeline
&lt;/h2&gt;

&lt;p&gt;The "Rework Code" feature is a second LLM call chained to the first.&lt;/p&gt;

&lt;p&gt;After analysis, the frontend sends the original code + the full issue list to &lt;code&gt;/fix&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FixRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix prompt encodes every issue as a line-referenced instruction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Fix&lt;/span&gt; &lt;span class="n"&gt;ALL&lt;/span&gt; &lt;span class="n"&gt;of&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;following&lt;/span&gt; &lt;span class="n"&gt;issues&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;this&lt;/span&gt; &lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

&lt;span class="n"&gt;ISSUES&lt;/span&gt; &lt;span class="n"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;FIX&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Line&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;CRITICAL&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;SQL&lt;/span&gt; &lt;span class="n"&gt;Injection&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Use&lt;/span&gt; &lt;span class="n"&gt;parameterized&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt;
  &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Line&lt;/span&gt; &lt;span class="mi"&gt;27&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;CRITICAL&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;Hardcoded&lt;/span&gt; &lt;span class="n"&gt;Credentials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Use&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
  &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Line&lt;/span&gt; &lt;span class="mi"&gt;29&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;CRITICAL&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="n"&gt;Unsafe&lt;/span&gt; &lt;span class="nf"&gt;eval&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="n"&gt;Use&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="n"&gt;instead&lt;/span&gt;
  &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="n"&gt;ORIGINAL&lt;/span&gt; &lt;span class="n"&gt;CODE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;Return&lt;/span&gt; &lt;span class="n"&gt;the&lt;/span&gt; &lt;span class="n"&gt;complete&lt;/span&gt; &lt;span class="n"&gt;fixed&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;inline&lt;/span&gt; &lt;span class="n"&gt;FIX&lt;/span&gt; &lt;span class="n"&gt;comments&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system prompt is strict:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Return&lt;/span&gt; &lt;span class="n"&gt;RAW&lt;/span&gt; &lt;span class="n"&gt;CODE&lt;/span&gt; &lt;span class="n"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;No&lt;/span&gt; &lt;span class="n"&gt;markdown&lt;/span&gt; &lt;span class="n"&gt;fences&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;no&lt;/span&gt; &lt;span class="n"&gt;explanation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;no&lt;/span&gt; &lt;span class="n"&gt;preamble&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Add&lt;/span&gt; &lt;span class="n"&gt;inline&lt;/span&gt; &lt;span class="n"&gt;comments&lt;/span&gt; &lt;span class="n"&gt;prefixed&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="c1"&gt;# FIX: explaining each change.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result gets placed back into the editor. The user sees their fixed file immediately.&lt;/p&gt;




&lt;h2&gt;
  
  
  The CORS Bug That Burned Two Hours
&lt;/h2&gt;

&lt;p&gt;Deploying to Vercel + Render exposed something I'd glossed over: &lt;code&gt;allow_origins=["*"]&lt;/code&gt; and &lt;code&gt;allow_credentials=True&lt;/code&gt; is &lt;strong&gt;invalid&lt;/strong&gt; per the CORS specification.&lt;/p&gt;

&lt;p&gt;Browsers enforce this at the preflight stage. Your OPTIONS request returns 200, but the browser rejects the response because the spec says wildcard origins cannot coexist with credentials. You get a cryptic console error and a silent failure in the UI.&lt;/p&gt;

&lt;p&gt;The fix is one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;CORSMiddleware&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;allow_origins&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;allow_credentials&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# must be False with wildcard origin
&lt;/span&gt;    &lt;span class="n"&gt;allow_methods&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;allow_headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Worth knowing before you spend two hours debugging network tab preflight responses.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Vulnerability Slides
&lt;/h2&gt;

&lt;p&gt;The right panel uses CSS &lt;code&gt;scroll-snap-type: y mandatory&lt;/code&gt;. Each vulnerability gets its own full-height card:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight css"&gt;&lt;code&gt;&lt;span class="nt"&gt;scroll-snap-type&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="nt"&gt;y&lt;/span&gt; &lt;span class="nt"&gt;mandatory&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;style&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;height&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;100%&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;scrollSnapAlign&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;start&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;VulnSlide&lt;/span&gt; &lt;span class="na"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;issue&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's a dot navigation sidebar that syncs with the scroll position:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight jsx"&gt;&lt;code&gt;&lt;span class="nx"&gt;onScroll&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;scrollTop&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;target&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clientHeight&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;setActiveSlide&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;idx&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rounding (not flooring) prevents the active dot from flickering during the snap animation — the snap always settles on an integer, but &lt;code&gt;scrollTop&lt;/code&gt; passes through fractional values mid-animation.&lt;/p&gt;

&lt;p&gt;Each slide has a "SLIDE" button in the issue list that calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;slidesRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scrollTo&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;top&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;idx&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;slidesRef&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;current&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;clientHeight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;behavior&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;smooth&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Bi-directional sync between the list and the slides, no state management library needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment Notes
&lt;/h2&gt;

&lt;p&gt;A few things that bit me:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Render cold starts.&lt;/strong&gt; The free tier sleeps services after 15 minutes of inactivity. First request after sleep takes 30–50 seconds. I added a loading state with an explanation so users wait instead of leave.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vite bakes env vars at build time.&lt;/strong&gt; &lt;code&gt;VITE_API_BASE&lt;/code&gt; is injected into the bundle when Vercel builds — not at runtime. Old preview deployment URLs serve old bundles permanently. The production domain always reflects the latest build. If your frontend is still hitting the wrong backend, you're on an old preview URL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Railway port mismatch.&lt;/strong&gt; I originally deployed on Railway. The dashboard had the networking port set to 8000, but the &lt;code&gt;$PORT&lt;/code&gt; environment variable was 8080. Internal healthchecks passed (Railway probed the container directly), but external traffic failed at the edge with persistent 502s. Moved to Render, problem gone.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Live:&lt;/strong&gt; &lt;a href="https://codelens-new.vercel.app" rel="noopener noreferrer"&gt;codelens-new.vercel.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/Kamaumbugua-dev/CODELENS" rel="noopener noreferrer"&gt;github.com/Kamaumbugua-dev/CODELENS&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Paste the worst code you can find. The demo loads a Python file with SQL injection, hardcoded secrets, unsafe &lt;code&gt;eval()&lt;/code&gt;, and an O(n²) algorithm. Hit Analyze, then Rework. The whole thing takes about 10 seconds on a warm backend.&lt;/p&gt;




&lt;p&gt;Built by &lt;strong&gt;Steven K.&lt;/strong&gt; — Head of &lt;strong&gt;AXON LATTICE LABS™&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;CodeLens™ — See your code's future before it ships.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>react</category>
      <category>ai</category>
      <category>security</category>
    </item>
    <item>
      <title>I Built an AI That Sees Your Screen and Speaks Your Answers, Here's How</title>
      <dc:creator>Kamaumbugua-dev</dc:creator>
      <pubDate>Thu, 26 Feb 2026 22:26:53 +0000</pubDate>
      <link>https://dev.to/kamaumbuguadev/i-built-an-ai-that-sees-your-screen-and-speaks-your-answers-heres-how-5dhl</link>
      <guid>https://dev.to/kamaumbuguadev/i-built-an-ai-that-sees-your-screen-and-speaks-your-answers-heres-how-5dhl</guid>
      <description>&lt;h1&gt;
  
  
  I Built an AI That Sees Your Screen and Speaks Your Answers — Here's How
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;This post was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With Typing
&lt;/h2&gt;

&lt;p&gt;Every day we spend hours switching between tabs, typing search queries, copying text, and manually reading through pages trying to find answers. What if you could just &lt;strong&gt;look at your screen and ask a question out loud&lt;/strong&gt; — and get an answer spoken back to you instantly?&lt;/p&gt;

&lt;p&gt;That's exactly what I built.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Voice UI Navigator&lt;/strong&gt; is an AI agent that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;👁️ &lt;strong&gt;Sees your browser screen&lt;/strong&gt; using Gemini multimodal vision&lt;/li&gt;
&lt;li&gt;🎙️ &lt;strong&gt;Listens to your voice&lt;/strong&gt; via the Gemini Live API&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Searches Google in real time&lt;/strong&gt; to research answers&lt;/li&gt;
&lt;li&gt;🔊 &lt;strong&gt;Speaks results back&lt;/strong&gt; to you naturally&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No typing. No DOM access. No browser extensions. Just pure visual AI understanding — the same way a human would look at a screen.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://voice-navigator-913580598688.us-central1.run.app" rel="noopener noreferrer"&gt;https://voice-navigator-913580598688.us-central1.run.app&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/Kamaumbugua-dev/GEMINI_CODING_CHALLENGE" rel="noopener noreferrer"&gt;https://github.com/Kamaumbugua-dev/GEMINI_CODING_CHALLENGE&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;The agent has three core capabilities wired together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User uploads screenshot + speaks query
              ↓
       ADK Web Server (Cloud Run)
              ↓
    root_agent [gemini-2.0-flash-live-001]
         ↓                    ↓
analyze_screenshot()     google_search()
         ↓                    ↓
  Gemini Vision         Google Search API
  (reads pixels)        (real-time results)
         ↓                    ↓
     Voice response spoken back to user
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1. Screen Vision (No DOM Required)
&lt;/h3&gt;

&lt;p&gt;The user takes a screenshot of their browser and attaches it in the chat. The agent calls &lt;code&gt;analyze_screenshot()&lt;/code&gt;, which sends the image to &lt;code&gt;gemini-2.0-flash&lt;/code&gt; with a structured prompt asking it to identify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Page type and title&lt;/li&gt;
&lt;li&gt;Visible UI elements (buttons, links, inputs)&lt;/li&gt;
&lt;li&gt;Main content summary&lt;/li&gt;
&lt;li&gt;Suggested next actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key insight: &lt;strong&gt;Gemini doesn't need DOM access to understand a UI&lt;/strong&gt;. It reads pixels the way a human does — and it's surprisingly accurate.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Real-Time Voice (Gemini Live API)
&lt;/h3&gt;

&lt;p&gt;The agent runs on &lt;code&gt;gemini-2.0-flash-live-001&lt;/code&gt;, which supports bidirectional audio streaming. Google's ADK handles the &lt;code&gt;/run_live&lt;/code&gt; WebSocket endpoint automatically — users just click the microphone button and start talking. The agent can be interrupted mid-sentence, just like a real conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Google Search Grounding
&lt;/h3&gt;

&lt;p&gt;When the user asks about something that needs current information, the ADK &lt;code&gt;google_search&lt;/code&gt; tool kicks in — pulling real-time web results and weaving them into the spoken response.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Agent Framework&lt;/td&gt;
&lt;td&gt;Google ADK v1.25.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Live Voice Model&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gemini-2.0-flash-live-001&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vision Model&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gemini-2.0-flash&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search&lt;/td&gt;
&lt;td&gt;ADK &lt;code&gt;google_search&lt;/code&gt; tool&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hosting&lt;/td&gt;
&lt;td&gt;Google Cloud Run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Container Registry&lt;/td&gt;
&lt;td&gt;Google Artifact Registry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI/CD&lt;/td&gt;
&lt;td&gt;Google Cloud Build&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;Python 3.11&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Building It: The Code
&lt;/h2&gt;

&lt;p&gt;The entire agent lives in two main components.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Agent (&lt;code&gt;app/agent.py&lt;/code&gt;)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.tools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;google_search&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;root_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;voice_ui_navigator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.0-flash-live-001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Voice-powered agent that sees your screen and searches the web.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;You are a Voice UI Navigator.
    When the user shares a screenshot, call analyze_screenshot.
    Use google_search for research questions.
    Always respond conversationally — you are speaking to the user.
    Never access the DOM. Read screens visually only.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;analyze_screenshot&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;google_search&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;generate_content_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;speech_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SpeechConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;voice_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;VoiceConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;prebuilt_voice_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PrebuiltVoiceConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;voice_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Puck&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Vision Tool (&lt;code&gt;analyze_screenshot&lt;/code&gt;)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_screenshot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Load the screenshot the user attached in the chat
&lt;/span&gt;    &lt;span class="n"&gt;screenshot_part&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_artifact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;screenshot.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Fall back: find any image artifact in the session
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;screenshot_part&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;artifact_names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list_artifacts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;image_artifacts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;artifact_names&lt;/span&gt;
                          &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;image_artifacts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;screenshot_part&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;tool_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_artifact&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_artifacts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# Send image + structured prompt to Gemini vision
&lt;/span&gt;    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.0-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;screenshot_part&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;analysis_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;])]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trick here is &lt;strong&gt;ADK's artifact system&lt;/strong&gt;. When a user attaches a file in the ADK web UI, it's automatically stored as a session artifact. The tool retrieves it with &lt;code&gt;tool_context.load_artifact()&lt;/code&gt; — no custom file upload endpoint needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deploying to Google Cloud Run
&lt;/h2&gt;

&lt;p&gt;The entire deployment is containerized with Docker and deployed to Cloud Run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dockerfile
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.11-slim&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /workspace&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; app/ ./app/&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8080&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["adk", "web", ".", "--host", "0.0.0.0", "--port", "8080"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Run &lt;code&gt;adk web .&lt;/code&gt; from the &lt;strong&gt;parent&lt;/strong&gt; directory of your agent folder — not from inside it. ADK scans for agent packages one level down from where the command runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build and Deploy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Build and push to Artifact Registry&lt;/span&gt;
gcloud builds submit &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tag&lt;/span&gt; us-central1-docker.pkg.dev/YOUR_PROJECT/voice-navigator-repo/voice-navigator

&lt;span class="c"&gt;# Deploy to Cloud Run&lt;/span&gt;
gcloud run deploy voice-navigator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--image&lt;/span&gt; us-central1-docker.pkg.dev/YOUR_PROJECT/voice-navigator-repo/voice-navigator &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--platform&lt;/span&gt; managed &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8080 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set-env-vars&lt;/span&gt; &lt;span class="s2"&gt;"GEMINI_API_KEY=your_key,GOOGLE_GENAI_USE_VERTEXAI=False"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Lessons Learned (The Hard Way)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. ADK directory structure is strict
&lt;/h3&gt;

&lt;p&gt;ADK's web loader scans ALL subdirectories of the agents folder looking for agent packages. I had a &lt;code&gt;tools/&lt;/code&gt; subfolder inside my &lt;code&gt;app/&lt;/code&gt; agent directory — ADK tried to load it as a separate agent and threw:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;No root_agent found for 'tools'.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Move all tools into the main &lt;code&gt;agent.py&lt;/code&gt; file, removing any subdirectories inside the agent package.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Not all Gemini models support Live API
&lt;/h3&gt;

&lt;p&gt;I wasted time with &lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt; (doesn't exist), &lt;code&gt;gemini-1.5-flash&lt;/code&gt; (no live support), and &lt;code&gt;gemini-2.0-flash&lt;/code&gt; (no live support). Only &lt;strong&gt;&lt;code&gt;gemini-2.0-flash-live-001&lt;/code&gt;&lt;/strong&gt; works with ADK's &lt;code&gt;/run_live&lt;/code&gt; WebSocket for real-time audio.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Cloud Build uses the Compute Engine service account
&lt;/h3&gt;

&lt;p&gt;When &lt;code&gt;gcloud builds submit&lt;/code&gt; fails with permission denied on Artifact Registry, the fix is NOT granting the Cloud Build service account — it's granting the &lt;strong&gt;Compute Engine default service account&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud artifacts repositories add-iam-policy-binding voice-navigator-repo &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/artifactregistry.writer"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. gcr.io is deprecated
&lt;/h3&gt;

&lt;p&gt;Google Container Registry (&lt;code&gt;gcr.io&lt;/code&gt;) is being replaced by Artifact Registry (&lt;code&gt;pkg.dev&lt;/code&gt;). Use Artifact Registry for new projects — &lt;code&gt;gcr.io&lt;/code&gt; pushes will fail silently on newer GCP projects.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Separate the vision call from the live audio session
&lt;/h3&gt;

&lt;p&gt;Initially I tried to have the live model handle both audio streaming AND vision analysis simultaneously. This caused instability. The cleaner pattern: make a &lt;strong&gt;separate synchronous &lt;code&gt;gemini-2.0-flash&lt;/code&gt; call&lt;/strong&gt; inside the tool for image analysis, while the live session stays focused on audio I/O.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Surprised Me About Gemini Vision
&lt;/h2&gt;

&lt;p&gt;I expected to need DOM access or accessibility APIs to understand UI elements. I was wrong.&lt;/p&gt;

&lt;p&gt;Given just a raw screenshot, &lt;code&gt;gemini-2.0-flash&lt;/code&gt; correctly identified:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Button labels and their positions on screen&lt;/li&gt;
&lt;li&gt;Navigation menus and their items&lt;/li&gt;
&lt;li&gt;Form fields and their purposes&lt;/li&gt;
&lt;li&gt;The page's primary content and intent&lt;/li&gt;
&lt;li&gt;Actionable next steps the user could take&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This opens up a genuinely powerful use case: &lt;strong&gt;an AI that works on ANY screen&lt;/strong&gt; — web apps, desktop software, mobile screenshots — without needing any special integration or API access.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Browser extension&lt;/strong&gt; — automatically capture screenshots without manual attachment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action execution&lt;/strong&gt; — integrate with Playwright to actually perform the suggested navigation steps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-turn screen memory&lt;/strong&gt; — remember previous screenshots to understand navigation flow over time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile support&lt;/strong&gt; — accept screenshots from phone cameras for on-device assistance&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/Kamaumbugua-dev/GEMINI_CODING_CHALLENGE.git
&lt;span class="nb"&gt;cd&lt;/span&gt; &lt;span class="s2"&gt;"GEMINI_CODING_CHALLENGE/ADK-STREAMING"&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt

&lt;span class="c"&gt;# Add your Gemini API key to app/.env&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"GEMINI_API_KEY=your_key_here"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; app/.env
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"GOOGLE_GENAI_USE_VERTEXAI=False"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; app/.env

adk web &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--no-reload&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;code&gt;http://localhost:8000&lt;/code&gt;, attach a screenshot, and ask the agent what it sees.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Kamaumbugua-dev/GEMINI_CODING_CHALLENGE/blob/master/ADK-STREAMING/deploy.sh" rel="noopener noreferrer"&gt;https://github.com/Kamaumbugua-dev/GEMINI_CODING_CHALLENGE/blob/master/ADK-STREAMING/deploy.sh&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/Kamaumbugua-dev/GEMINI_CODING_CHALLENGE/blob/master/ADK-STREAMING/cloudbuild.yaml" rel="noopener noreferrer"&gt;https://github.com/Kamaumbugua-dev/GEMINI_CODING_CHALLENGE/blob/master/ADK-STREAMING/cloudbuild.yaml&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;Google ADK Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/live" rel="noopener noreferrer"&gt;Gemini Live API Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://google.github.io/adk-docs/get-started/streaming/quickstart-streaming/" rel="noopener noreferrer"&gt;ADK Streaming Quickstart&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/run/docs" rel="noopener noreferrer"&gt;Google Cloud Run Docs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built for the Gemini Live Agent Challenge. #GeminiLiveAgentChallenge&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you found this useful, drop a ❤️ and follow for more AI agent builds!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>googleai</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>showdev</category>
    </item>
    <item>
      <title>My BCG X GenAI Job Simulation: Building a Financial Analysis Chatbot &amp; Key Learnings</title>
      <dc:creator>Kamaumbugua-dev</dc:creator>
      <pubDate>Thu, 27 Nov 2025 16:23:07 +0000</pubDate>
      <link>https://dev.to/kamaumbuguadev/my-bcg-x-genai-job-simulation-building-a-financial-analysis-chatbot-key-learnings-1kh3</link>
      <guid>https://dev.to/kamaumbuguadev/my-bcg-x-genai-job-simulation-building-a-financial-analysis-chatbot-key-learnings-1kh3</guid>
      <description>&lt;h4&gt;
  
  
  How a virtual internship sharpened my skills in data wrangling, business logic, and user-centric AI development.
&lt;/h4&gt;




&lt;h3&gt;
  
  
  From Theory to (Simulated) Practice
&lt;/h3&gt;

&lt;p&gt;We all have projects in our portfolios, but how do we know if they truly reflect the skills needed in a real-world, high-stakes environment? That was my goal when I completed the &lt;strong&gt;BCG X GenAI Job Simulation&lt;/strong&gt; on Forage.&lt;/p&gt;

&lt;p&gt;The task was classic BCG: practical, business-focused, and impactful. I was challenged to build a functional prototype of a &lt;strong&gt;Generative AI tool for financial statement analysis&lt;/strong&gt;. This wasn't just about writing code; it was about creating a solution that a consultant could use to quickly derive insights from complex company data.&lt;/p&gt;

&lt;p&gt;In this article, I'll walk you through the project and, more importantly, the core skills I honed along the way.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Project: A Financial Query Bot
&lt;/h3&gt;

&lt;p&gt;The core deliverable was a Python-based chatbot that could answer natural language questions about a dataset of company financials. The dataset was in a long format, meaning financial terms (like 'Total_Revenue' and 'Net_Income') were rows, not columns.&lt;/p&gt;

&lt;p&gt;Here's a high-level overview of the solution I built:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 1. Data Wrangling: From Long to Wide Format
&lt;/span&gt;&lt;span class="n"&gt;df_wide&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pivot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Financial_Term&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Core Logic &amp;amp; Metric Calculation
&lt;/span&gt;&lt;span class="n"&gt;total_revenue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_wide&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Total_Revenue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;net_income_change&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_wide_sorted&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Net_Income&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;df_wide_sorted&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Net_Income&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;iloc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;highest_revenue_company&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_wide&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Company&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Total_Revenue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;idxmax&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 3. The Chatbot Interface
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;simple_chatbot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_query&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# ... NLP logic to match queries with pre-calculated answers
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;what is the total revenue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The total revenue is &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total_revenue&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="c1"&gt;# ... other queries
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final product was an interactive command-line tool that could instantly answer questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  "What is the total revenue?"&lt;/li&gt;
&lt;li&gt;  "How has net income changed over the last year?"&lt;/li&gt;
&lt;li&gt;  "Which company has the highest revenue?"&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Skills Forged in the (Simulated) Fire
&lt;/h3&gt;

&lt;p&gt;While the code is crucial, the simulation forced me to think and operate like a BCG X developer. Here are the key skills I developed:&lt;/p&gt;

&lt;h4&gt;
  
  
  1. &lt;strong&gt;Data Engineering &amp;amp; Wrangling with &lt;code&gt;pandas&lt;/code&gt;&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;The raw data wasn't ready for analysis. My first step was to &lt;strong&gt;pivot the DataFrame&lt;/strong&gt; from a long to a wide format. This is a common, critical task in data science. I solidified my understanding of &lt;code&gt;pandas&lt;/code&gt; operations like &lt;code&gt;pivot()&lt;/code&gt;, &lt;code&gt;groupby()&lt;/code&gt;, and &lt;code&gt;sort_values()&lt;/code&gt; to structure data for efficient computation.&lt;/p&gt;

&lt;h4&gt;
  
  
  2. &lt;strong&gt;Translating Business Logic into Code&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;This was the heart of the simulation. It wasn't enough to just calculate a sum; I had to understand &lt;em&gt;what&lt;/em&gt; to calculate and &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;"How has net income changed?"&lt;/strong&gt; required me to sort the data by year and calculate the difference between the two most recent entries. This honed my ability to implement &lt;strong&gt;time-series analysis&lt;/strong&gt; logic.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;"Which company has the highest revenue?"&lt;/strong&gt; involved a &lt;code&gt;groupby&lt;/code&gt; operation to aggregate data by company before identifying the maximum. This reinforced &lt;strong&gt;data aggregation&lt;/strong&gt; techniques for business intelligence.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. &lt;strong&gt;Prototyping with Generative AI Principles&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;While the chatbot used a rule-based approach (pattern matching), its design embodies a core principle of GenAI: creating a &lt;strong&gt;natural language interface&lt;/strong&gt; for complex systems. I learned to structure a system where a user's unstructured query is mapped to a structured data operation, which is the foundational concept behind many sophisticated AI tools.&lt;/p&gt;

&lt;h4&gt;
  
  
  4. &lt;strong&gt;User-Centric Development &amp;amp; UX&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;A tool is useless if no one can use it. I built an &lt;strong&gt;interactive loop&lt;/strong&gt; with clear user prompts, a &lt;code&gt;help&lt;/code&gt; menu, and graceful handling of exit commands. Focusing on the user experience, even in a CLI, taught me to think beyond the algorithm and consider the human interacting with my code.&lt;/p&gt;

&lt;h4&gt;
  
  
  5. &lt;strong&gt;Code Readability and Maintainability&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;I made sure to write clean, commented, and well-structured code. Using functions, clear variable names (&lt;code&gt;net_income_change&lt;/code&gt; instead of &lt;code&gt;nic&lt;/code&gt;), and f-strings for formatted output are essential practices for collaborating in a professional environment, just as one would at a firm like BCG X.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Takeaways for My Developer Journey
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Business Acumen is a Feature:&lt;/strong&gt; The most elegant code is worthless if it doesn't solve a real business problem. This simulation was a constant exercise in aligning technical execution with business needs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Start Simple, Then Scale:&lt;/strong&gt; A rule-based chatbot was the perfect prototype. It proved the concept's value before needing the complexity of LLM API calls, prompt engineering, and associated costs.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;The "Why" Matters as Much as the "How":&lt;/strong&gt; Understanding &lt;em&gt;why&lt;/em&gt; a consultant needs to see net income change is what led me to implement the correct sorting logic. Context is everything.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Final Thoughts
&lt;/h3&gt;

&lt;p&gt;The BCG X GenAI Job Simulation was more than a certificate for my LinkedIn profile. It was a rigorous, practical test of my ability to deliver an end-to-end solution under realistic constraints. It pushed me to merge data skills with business thinking, and it gave me a tangible project that demonstrates my readiness to contribute in a tech-driven, strategic role.&lt;/p&gt;

&lt;p&gt;If you're a student or a developer looking to break into the tech/consulting space, I highly recommend seeking out these simulations. They are a powerful way to bridge the gap between academic knowledge and industry application.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Have you completed a similar job simulation or built a project that taught you unexpected skills? Share your experience in the comments below! Let's learn from each other.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>career</category>
      <category>ai</category>
      <category>learning</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Revolutionizing Loan Risk Assessment: How I Built a Smarter Default Prediction Model That Actually Understands Finance</title>
      <dc:creator>Kamaumbugua-dev</dc:creator>
      <pubDate>Wed, 19 Nov 2025 19:08:03 +0000</pubDate>
      <link>https://dev.to/kamaumbuguadev/revolutionizing-loan-risk-assessment-how-i-built-a-smarter-default-prediction-model-that-actually-59e7</link>
      <guid>https://dev.to/kamaumbuguadev/revolutionizing-loan-risk-assessment-how-i-built-a-smarter-default-prediction-model-that-actually-59e7</guid>
      <description>&lt;h2&gt;
  
  
  The $5 Million Problem That Broke My Model
&lt;/h2&gt;

&lt;p&gt;It was supposed to be a straightforward machine learning project: build a loan default prediction model. I had the algorithms, I had the data, and I had the code. But then I tested a scenario that should have been a no-brainer a borrower with a $5 million annual income applying for a $15,000 loan. My model panicked and flagged it as "HIGH RISK."&lt;/p&gt;

&lt;p&gt;That's when I realized: most machine learning models understand data, but they don't understand finance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond the Algorithm: When Math Meets Reality
&lt;/h2&gt;

&lt;p&gt;The initial approach was technically sound logistic regression combined with decision trees, proper normalization, all the ML best practices. But the real world doesn't care about technical purity. A billionaire applying for a car loan isn't high-risk, no matter what the raw numbers say.&lt;/p&gt;

&lt;p&gt;The breakthrough came when I stopped treating this as purely a machine learning problem and started treating it as a financial intelligence problem.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;apply_business_rules&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_prediction&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;The secret sauce: common sense meets machine learning&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;income&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;input_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;income&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;loan_amount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;input_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;loanamount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;credit_score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;input_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;creditscore&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;base_prob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model_prediction&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;avg_prob&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;adjusted_prob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;base_prob&lt;/span&gt;

    &lt;span class="c1"&gt;# Rule 1: Debt-to-Income Ratio Reality Check
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;income&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;dti&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;loan_amount&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;income&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;dti&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# Tiny loan for this income level
&lt;/span&gt;            &lt;span class="n"&gt;adjusted_prob&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;  &lt;span class="c1"&gt;# Drastically reduce risk
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Architecture That Actually Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Dual-Layer Intelligence
&lt;/h3&gt;

&lt;p&gt;Most models stop at the algorithm. Ours has two brain hemispheres:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The Machine Learning Brain&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Logistic regression for linear patterns&lt;/li&gt;
&lt;li&gt;Decision trees for complex interactions&lt;/li&gt;
&lt;li&gt;Ensemble averaging for stability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. The Financial Expert Brain&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Debt-to-income ratio analysis&lt;/li&gt;
&lt;li&gt;Income tier adjustments
&lt;/li&gt;
&lt;li&gt;Credit score reality checks&lt;/li&gt;
&lt;li&gt;Employment stability factors
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This simple ratio check fixes 80% of "obvious" errors
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;income&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;loan_amount&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;income&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;adjusted_prob&lt;/span&gt; &lt;span class="o"&gt;*=&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;  &lt;span class="c1"&gt;# Halve the risk for tiny relative loans
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Smart Data Agnosticism
&lt;/h3&gt;

&lt;p&gt;The biggest headache in financial ML? Every dataset has different column names. Instead of forcing users to reformat their data, I built a detective:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;detect_column_types&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Speaks the language of finance, not just data science&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;feature_patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;income&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;income&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;salary&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;annual&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;wage&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;earnings&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;loansoutstanding&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;loan&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;outstanding&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;current&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;existing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="c1"&gt;# ... and so on for other financial concepts
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The "Aha!" Moments That Transformed the Model
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Moment 1: The Debt-to-Income Epiphany
&lt;/h3&gt;

&lt;p&gt;I was so focused on absolute numbers that I missed the most basic concept in lending: relative capacity. A $15,000 loan means completely different things to someone making $50,000 versus $5,000,000.&lt;/p&gt;

&lt;h3&gt;
  
  
  Moment 2: The Credit Score Reality Check
&lt;/h3&gt;

&lt;p&gt;Credit scores follow predictable patterns. Excellent credit (750+) isn't just slightly better than good credit (700-750)—it's a fundamentally different risk category that needed exponential, not linear, adjustment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Moment 3: The Employment Stability Insight
&lt;/h3&gt;

&lt;p&gt;Two years at a job isn't the same as twenty years. The model needed to understand that employment duration has diminishing returns on risk reduction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Innovation: Making Complex Simple
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Performance That Doesn't Compromise Accuracy
&lt;/h3&gt;

&lt;p&gt;The initial model took minutes to train. The final version? Seconds. Here's how:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;train_logistic_regression_fast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Vectorized operations instead of Python loops&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;
    &lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;zeros&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Vectorized forward pass - 100x faster than loops
&lt;/span&gt;        &lt;span class="n"&gt;z&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weights&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;bias&lt;/span&gt;
        &lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sigmoid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;z&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Vectorized backward pass
&lt;/span&gt;        &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;predictions&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;
        &lt;span class="n"&gt;dw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;

        &lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;learning_rate&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dw&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Error Resilience That Actually Works
&lt;/h3&gt;

&lt;p&gt;Instead of crashing on missing data, the model adapts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# If default column not found, create reasonable defaults
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;prepared_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;prepared_data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;default&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;st&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warning&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No default column found. Using dummy values for model training.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Impact: From Theoretical to Practical
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Before Business Rules:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;$5M income + $15K loan = "HIGH RISK" (30% PD)&lt;/li&gt;
&lt;li&gt;Recent graduate with good credit = "MODERATE RISK"&lt;/li&gt;
&lt;li&gt;Long-term employee with minor credit issues = "HIGH RISK"&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  After Business Rules:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;$5M income + $15K loan = "VERY LOW RISK" (2% PD)&lt;/li&gt;
&lt;li&gt;Recent graduate with good credit = "LOW RISK" &lt;/li&gt;
&lt;li&gt;Long-term employee with minor credit issues = "MODERATE RISK"&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Streamlit Revolution: Democratizing Financial AI
&lt;/h2&gt;

&lt;p&gt;What makes this project truly powerful isn't just the model it's the accessibility. With Streamlit, we transformed complex financial modeling into:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;One-click setup&lt;/strong&gt; - No installation headaches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automatic data understanding&lt;/strong&gt; - Upload any CSV format&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time explanations&lt;/strong&gt; - Not just predictions, but reasoning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Professional risk assessment&lt;/strong&gt; - Actionable insights, not just percentages
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Transparent risk factors that build trust
&lt;/span&gt;&lt;span class="n"&gt;factors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;income&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;200000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;factors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ High income level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;credit_score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;750&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;factors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Excellent credit score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;loan_amount&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;income&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;factors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅ Low debt-to-income ratio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Lessons for the Next Generation of Financial ML
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Domain Knowledge Beats Algorithm Complexity
&lt;/h3&gt;

&lt;p&gt;The business rules layer provided more value than any sophisticated algorithm ever could.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Performance Matters for Adoption
&lt;/h3&gt;

&lt;p&gt;A model that trains in 30 seconds gets used. One that takes 5 minutes gets abandoned.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Explainability Builds Trust
&lt;/h3&gt;

&lt;p&gt;Showing the "why" behind predictions makes the model credible to financial professionals.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Resilience Beats Perfection
&lt;/h3&gt;

&lt;p&gt;A model that works with imperfect data is more valuable than one that only works with perfect data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Future Is Adaptive Intelligence
&lt;/h2&gt;

&lt;p&gt;This project proved something crucial: the next breakthrough in financial technology won't come from better algorithms alone. It will come from models that understand the context, the nuances, and the real-world logic of finance.&lt;/p&gt;

&lt;p&gt;The code is open, the approach is proven, and the results speak for themselves. We're not just predicting defaults anymore—we're building financial intelligence that actually understands what it means to lend money.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Want to see the model in action or implement these concepts in your organization? The complete code is available on [&lt;a href="https://github.com/Kamaumbugua-dev/Loan-Default-Prediction-Model" rel="noopener noreferrer"&gt;https://github.com/Kamaumbugua-dev/Loan-Default-Prediction-Model&lt;/a&gt;], and I'm always open to discussing how adaptive financial intelligence can transform your risk assessment processes.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The future of financial ML isn't smarter algorithms it's algorithms that understand finance.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>devjournal</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>From JPMorgan's Trading Desk to Your Terminal: Building a Natural Gas Storage Valuation Engine</title>
      <dc:creator>Kamaumbugua-dev</dc:creator>
      <pubDate>Sun, 09 Nov 2025 21:54:03 +0000</pubDate>
      <link>https://dev.to/kamaumbuguadev/from-jpmorgans-trading-desk-to-your-terminal-building-a-natural-gas-storage-valuation-engine-1am9</link>
      <guid>https://dev.to/kamaumbuguadev/from-jpmorgans-trading-desk-to-your-terminal-building-a-natural-gas-storage-valuation-engine-1am9</guid>
      <description>&lt;p&gt;&lt;em&gt;How I reverse-engineered Wall Street's approach to energy trading and built a production-ready quantitative pricing system&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Billion-Dollar Problem
&lt;/h2&gt;

&lt;p&gt;Imagine you're an energy trader staring at a complex proposal: a client wants to store 1 million units of natural gas for 6 months. They'll inject in summer when prices are low and withdraw in winter when prices typically spike. The question every trading desk faces: &lt;strong&gt;"What's the fair price for this storage contract?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This isn't academic it's the exact challenge I tackled in a JPMorgan Chase quantitative research simulation. The result? A sophisticated valuation engine that bridges the gap between complex energy markets and executable trading decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;At its core, my system solves the fundamental equation of energy storage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Contract Value = (Withdrawal Revenue - Injection Costs) - (Storage + Fees + Transport)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But the devil is in the details. Here's how we tackled the complexity:&lt;/p&gt;

&lt;h3&gt;
  
  
  The Architecture Deep Dive
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;NaturalGasStorageValuation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_contract_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;injection_dates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;withdrawal_dates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                               &lt;span class="n"&gt;injection_volumes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;withdrawal_volumes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...):&lt;/span&gt;
        &lt;span class="c1"&gt;# 1. Validate physical constraints
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_validate_inputs&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;

        &lt;span class="c1"&gt;# 2. Create chronological operation timeline
&lt;/span&gt;        &lt;span class="n"&gt;operations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_create_operations_timeline&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;

        &lt;span class="c1"&gt;# 3. Calculate detailed cash flows
&lt;/span&gt;        &lt;span class="n"&gt;cash_flows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_calculate_cash_flows&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;

        &lt;span class="c1"&gt;# 4. Return comprehensive valuation
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;net_present_value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cash_flow_details&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;operations_summary&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Innovations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Multi-Period Scheduling&lt;/strong&gt;&lt;br&gt;
Unlike simple buy-low-sell-high models, our system handles complex schedules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple injection/withdrawal dates&lt;/li&gt;
&lt;li&gt;Varying volumes at each operation&lt;/li&gt;
&lt;li&gt;Storage level tracking across time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Real-World Constraints&lt;/strong&gt;&lt;br&gt;
We enforced physical realities that make or break deals:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Can't inject more than storage capacity
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current_storage&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;volume&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;max_capacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Exceeds storage capacity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Can't withdraw more than available
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current_storage&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;volume&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Insufficient inventory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Comprehensive Cost Modeling&lt;/strong&gt;&lt;br&gt;
Every dollar counts in energy trading:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Storage Costs&lt;/strong&gt;: Daily rental fees for the facility&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Injection/Withdrawal Fees&lt;/strong&gt;: Per-unit charges for moving gas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transport Costs&lt;/strong&gt;: Fixed fees for each delivery/pickup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Price Spread&lt;/strong&gt;: The core profit driver&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Real-World Impact
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Sample Trade Analysis
&lt;/h3&gt;

&lt;p&gt;Let's value a realistic storage contract:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;valuation_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_contract_value&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;injection_dates&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2024-06-15&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2024-07-15&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;withdrawal_dates&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2024-12-15&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;2025-01-15&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;injection_volumes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;500000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500000&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# 1M MMBtu total
&lt;/span&gt;    &lt;span class="n"&gt;withdrawal_volumes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;500000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500000&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;storage_cost_per_day&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;3333.33&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;       &lt;span class="c1"&gt;# $100K/month
&lt;/span&gt;    &lt;span class="n"&gt;transport_cost_per_trip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50000&lt;/span&gt;       &lt;span class="c1"&gt;# $50K per operation
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Contract NPV: $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;net_present_value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Output: Contract NPV: $589,966.67
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this matters&lt;/strong&gt;: That $589K isn't just a number, it's the difference between profitable trading and catastrophic losses.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Quant's Toolkit: Key Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Intelligent Price Interpolation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_price&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Handles dates between monthly settlement points
&lt;/span&gt;    &lt;span class="c1"&gt;# Uses linear interpolation for realistic pricing
&lt;/span&gt;    &lt;span class="c1"&gt;# Falls back gracefully when data is sparse
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Production-Grade Error Handling
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_contract_value&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="c1"&gt;# Trade with confidence
&lt;/span&gt;    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Valuation failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Never break the trading desk
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Comprehensive Reporting
&lt;/h3&gt;

&lt;p&gt;Every valuation returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Net Present Value&lt;/strong&gt;: The bottom line&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cash Flow Details&lt;/strong&gt;: Daily money movements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operation Summary&lt;/strong&gt;: Volume and efficiency metrics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Breakdown&lt;/strong&gt;: Where money is spent&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Technical Deep Dive: The Cash Flow Engine
&lt;/h2&gt;

&lt;p&gt;The heart of our system is the cash flow calculator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_calculate_cash_flows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;operations&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...):&lt;/span&gt;
    &lt;span class="n"&gt;cash_flows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;current_storage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;operation&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;operations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Calculate storage costs since last operation
&lt;/span&gt;        &lt;span class="n"&gt;storage_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_calculate_storage_cost&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;injection&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Purchase gas + pay injection fees + transport
&lt;/span&gt;            &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;volume&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                    &lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;volume&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;injection_fee&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                    &lt;span class="n"&gt;transport_cost&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;current_storage&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;volume&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# withdrawal
&lt;/span&gt;            &lt;span class="c1"&gt;# Sell gas - pay withdrawal fees - transport
&lt;/span&gt;            &lt;span class="n"&gt;revenue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;volume&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;withdrawal_fees&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;transport_cost&lt;/span&gt;
            &lt;span class="n"&gt;current_storage&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;volume&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="n"&gt;cash_flows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;operation&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;net_cash_flow&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;revenue&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;storage_cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;storage_level&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;current_storage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="c1"&gt;# ... detailed breakdown
&lt;/span&gt;        &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This granular approach means traders understand exactly when money moves and why.&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond the Code: The Trading Desk Impact
&lt;/h2&gt;

&lt;h3&gt;
  
  
  For Quantitative Analysts
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model Transparency&lt;/strong&gt;: Every calculation is traceable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scenario Analysis&lt;/strong&gt;: Test "what-if" scenarios instantly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk Identification&lt;/strong&gt;: Spot constraint violations before execution&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  For Energy Traders
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rapid Pricing&lt;/strong&gt;: Value complex contracts in milliseconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client Confidence&lt;/strong&gt;: Explain pricing with detailed breakdowns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk Management&lt;/strong&gt;: Avoid physically impossible operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  For Software Engineers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Production Ready&lt;/strong&gt;: Error handling, logging, validation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensible Architecture&lt;/strong&gt;: Easy to add new cost components&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Ready&lt;/strong&gt;: Structured for integration into larger systems&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Surprising Lessons from the Trading Desk
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Simple Beats Complex (Sometimes)
&lt;/h3&gt;

&lt;p&gt;I started with sophisticated stochastic models, but the clean, interpretable approach won. Trading desks need to understand &lt;em&gt;why&lt;/em&gt; a price is what it is, not just trust a black box.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Constraints Drive Value
&lt;/h3&gt;

&lt;p&gt;The most insightful moment was realizing that storage contracts aren't about predicting prices,they're about efficiently managing constraints. The money isn't made in forecasting; it's made in optimization.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Error Messages Are Risk Controls
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This isn't just coding, it's risk management
&lt;/span&gt;&lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Withdrawal would exceed available storage&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every validation check is a potential million-dollar save.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started with Energy Trading Analytics
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Basic Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Install dependencies
&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt;

&lt;span class="c1"&gt;# Load your market data
&lt;/span&gt;&lt;span class="n"&gt;price_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_natural_gas_prices&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the model
&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;NaturalGasStorageValuation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Start valuing contracts
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Example Analysis
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Test seasonal storage strategy
&lt;/span&gt;&lt;span class="n"&gt;summer_price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;10.50&lt;/span&gt;  &lt;span class="c1"&gt;# June injection
&lt;/span&gt;&lt;span class="n"&gt;winter_price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;12.00&lt;/span&gt;  &lt;span class="c1"&gt;# December withdrawal
&lt;/span&gt;&lt;span class="n"&gt;spread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;winter_price&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;summer_price&lt;/span&gt;  &lt;span class="c1"&gt;# $1.50/MMBtu
&lt;/span&gt;
&lt;span class="c1"&gt;# Our engine calculates if this spread covers:
# - 6 months of storage costs
# - Injection/withdrawal fees
# - Transport costs
# - And still leaves profit
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Visualization: Seeing the Money Flow
&lt;/h2&gt;

&lt;p&gt;We built a comprehensive dashboard system that shows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cash Flow Timeline&lt;/strong&gt;: When money moves in and out&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage Levels&lt;/strong&gt;: Inventory tracking across time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Breakdown&lt;/strong&gt;: Where expenses accumulate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sensitivity Analysis&lt;/strong&gt;: How NPV changes with key inputs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fvia.placeholder.com%2F800x400%2F374151%2FFFFFFF%3Ftext%3DStorage%2BContract%2BDashboard" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fvia.placeholder.com%2F800x400%2F374151%2FFFFFFF%3Ftext%3DStorage%2BContract%2BDashboard" alt="Storage Contract Dashboard" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for Your Career
&lt;/h2&gt;

&lt;p&gt;This project demonstrates the exact skills that separate good developers from great quantitative engineers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Financial Acumen&lt;/strong&gt;: Understanding trading economics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production Mindset&lt;/strong&gt;: Building robust, error-resistant systems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain Knowledge&lt;/strong&gt;: Speaking the language of energy markets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architectural Thinking&lt;/strong&gt;: Designing for scale and integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;The future of energy trading analytics includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real-time Market Integration&lt;/strong&gt;: Live price feeds and volatility modeling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Machine Learning&lt;/strong&gt;: Predictive models for optimal injection/withdrawal timing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blockchain&lt;/strong&gt;: Smart contracts for automated settlement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Deployment&lt;/strong&gt;: RESTful services for front-office integration&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Join the Discussion
&lt;/h2&gt;

&lt;p&gt;I'm curious to hear from the community:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Energy Professionals&lt;/strong&gt;: What other factors would you include in storage valuation?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quant Developers&lt;/strong&gt;: How would you enhance the modeling approach?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trading Desk Veterans&lt;/strong&gt;: What features would make this indispensable for daily use?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ML Engineers&lt;/strong&gt;: Where would machine learning provide the most value?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Check out the complete code on GitHub&lt;/strong&gt; and star the repo if you find this approach valuable for your own quantitative finance journey!&lt;/p&gt;




&lt;h2&gt;
  
  
  Ready to Build?
&lt;/h2&gt;

&lt;p&gt;Whether you're interested in quantitative finance, energy markets, or building production financial systems, this project offers a realistic starting point. The code is battle-tested, well-documented, and ready for extension.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bottom line&lt;/strong&gt;: We've taken the black magic out of energy storage valuation and replaced it with transparent, reproducible mathematics. And in today's volatile energy markets, that's not just good engineering it's good business.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What complex financial system will you build next?&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Tags
&lt;/h3&gt;

&lt;h1&gt;
  
  
  quantitativefinance #energytrading #python #financialengineering #jpmorgan #algorithmictrading #datascience #fintech #machinelearning #tradingSystems
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;This project was developed as part of a JPMorgan Chase quantitative research simulation, demonstrating real-world skills in financial modeling and software engineering.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>algorithms</category>
      <category>datascience</category>
      <category>showdev</category>
      <category>cli</category>
    </item>
    <item>
      <title>From JPMorgan's Trading Desk to Your GitHub: Building a Natural Gas Price Forecasting Engine</title>
      <dc:creator>Kamaumbugua-dev</dc:creator>
      <pubDate>Sun, 09 Nov 2025 20:24:39 +0000</pubDate>
      <link>https://dev.to/kamaumbuguadev/from-jpmorgans-trading-desk-to-your-github-building-a-natural-gas-price-forecasting-engine-539m</link>
      <guid>https://dev.to/kamaumbuguadev/from-jpmorgans-trading-desk-to-your-github-building-a-natural-gas-price-forecasting-engine-539m</guid>
      <description>&lt;h2&gt;
  
  
  &lt;em&gt;How I reverse-engineered Wall Street quantitative research and what it taught me about production ML systems&lt;/em&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  The Quant's Crystal Ball
&lt;/h2&gt;

&lt;p&gt;What if you could predict natural gas prices months in advance? What if you could build the same type of forecasting systems used by Wall Street energy traders? That's exactly what I did in a JPMorgan Chase quantitative research simulation, and I'm opening up the complete engine for everyone to see.&lt;/p&gt;

&lt;p&gt;This isn't just another ML tutorial this is a production-ready forecasting system that demonstrates how quantitative research meets MLOps in real-world financial applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Business Problem
&lt;/h2&gt;

&lt;p&gt;Energy companies and traders face a critical challenge: &lt;strong&gt;how to price long-term natural gas storage contracts&lt;/strong&gt; when prices fluctuate daily. The solution requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accurate price estimates for any historical date&lt;/li&gt;
&lt;li&gt;Reliable 12-month future forecasts&lt;/li&gt;
&lt;li&gt;Understanding of seasonal patterns and market trends&lt;/li&gt;
&lt;li&gt;A system robust enough for million-dollar decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Hybrid Forecasting Model
&lt;/h3&gt;

&lt;p&gt;The core innovation lies in combining multiple analytical approaches:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;NaturalGasPriceAnalyzer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build_prediction_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Polynomial regression captures market trends
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trend_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;poly&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;PolynomialFeatures&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;degree&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
            &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;linear&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;LinearRegression&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="c1"&gt;# Seasonal adjustments handle recurring patterns
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;calculate_seasonal_adjustments&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Secret Sauce: Trend + Seasonality
&lt;/h3&gt;

&lt;p&gt;Most forecasting tutorials stop at basic time series. Our approach mirrors professional quant systems:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Price_estimate = Trend_prediction + Seasonal_adjustment
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Trend Component&lt;/strong&gt;: Uses polynomial regression to capture long-term market movements, economic factors, and structural changes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Seasonal Component&lt;/strong&gt;: Identifies recurring monthly patterns winter heating demand spikes, summer price dips that repeat annually.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Technical Insights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Seasonal Pattern Discovery
&lt;/h3&gt;

&lt;p&gt;After analyzing 4 years of data, clear patterns emerged:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;analyze_seasonal_patterns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;monthly_avg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;month&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;High season: December ($&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;monthly_avg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Low season: May ($&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;monthly_avg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Finding&lt;/strong&gt;: Prices peak in winter (December-February) due to heating demand and dip in late spring (May-June) when demand is lowest.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Market Volatility Quantification
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;print_statistical_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;returns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;pct_change&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;volatility&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;returns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;std&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Annualized
&lt;/span&gt;    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Annualized volatility: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;volatility&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;: 7.8% annualized volatility moderate fluctuations that create both risk and opportunity for traders.&lt;/p&gt;

&lt;h2&gt;
  
  
  From Research to Production
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The MLOps Bridge
&lt;/h3&gt;

&lt;p&gt;This project demonstrates crucial MLOps principles:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Production Data Pipelines&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data_string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Parse financial data with proper error handling
&lt;/span&gt;    &lt;span class="n"&gt;dates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse_financial_format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data_string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_features&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dates&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prices&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Model Interpretability&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clear separation between trend and seasonal components&lt;/li&gt;
&lt;li&gt;Statistical summaries that business users understand&lt;/li&gt;
&lt;li&gt;Visualization that tells the price story intuitively&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. API-Ready Design&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;estimate_price&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_date&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Public method for integration into larger systems&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trend_prediction&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;seasonal_adjustment&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Surprising Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Simple Models Often Win
&lt;/h3&gt;

&lt;p&gt;I started with complex LSTM networks, but polynomial regression + seasonal adjustments provided better interpretability and nearly identical accuracy for this use case.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Domain Knowledge &amp;gt; Algorithm Complexity
&lt;/h3&gt;

&lt;p&gt;Understanding &lt;em&gt;why&lt;/em&gt; gas prices behave certain ways (winter demand, storage cycles) proved more valuable than sophisticated algorithms.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Financial-Grade Code Matters
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Proper datetime handling&lt;/li&gt;
&lt;li&gt;Scientific notation parsing&lt;/li&gt;
&lt;li&gt;Edge case management&lt;/li&gt;
&lt;li&gt;Statistical rigor&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Basic Usage
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Initialize and analyze
&lt;/span&gt;&lt;span class="n"&gt;analyzer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;NaturalGasPriceAnalyzer&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;your_price_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build_prediction_model&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Get price estimates
&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;estimate_price&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2025&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;January 2025 forecast: $&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Advanced Features
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 12-month forecast
&lt;/span&gt;&lt;span class="n"&gt;future_prices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extrapolate_future_prices&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Comprehensive visualization
&lt;/span&gt;&lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;visualize_analysis&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Seasonal pattern analysis
&lt;/span&gt;&lt;span class="n"&gt;seasonal_insights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;analyzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;analyze_seasonal_patterns&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Real-World Impact
&lt;/h2&gt;

&lt;p&gt;This system demonstrates skills that directly translate to financial technology roles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quantitative Research&lt;/strong&gt;: Statistical analysis, pattern recognition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk Management&lt;/strong&gt;: Volatility calculation, confidence intervals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trading Systems&lt;/strong&gt;: Price forecasting, market analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MLOps&lt;/strong&gt;: Production model deployment, monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why This Matters for Your Career
&lt;/h2&gt;

&lt;p&gt;As I discovered through this JPMorgan simulation, the bridge between academic ML and production financial systems requires:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Business Acumen&lt;/strong&gt;: Understanding the "why" behind the analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical Rigor&lt;/strong&gt;: Production-quality code and statistical validity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Communication Skills&lt;/strong&gt;: Explaining complex models to non-technical stakeholders&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's Next?
&lt;/h2&gt;

&lt;p&gt;Potential enhancements for the ambitious:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time data integration from market APIs&lt;/li&gt;
&lt;li&gt;Confidence intervals and probability distributions&lt;/li&gt;
&lt;li&gt;Multiple scenario analysis (bull/bear cases)&lt;/li&gt;
&lt;li&gt;Web dashboard with Streamlit or Dash&lt;/li&gt;
&lt;li&gt;Integration with trading platforms&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Join the Discussion
&lt;/h2&gt;

&lt;p&gt;I'm curious to hear from the community:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What forecasting challenges have you faced in your projects?&lt;/li&gt;
&lt;li&gt;How do you balance model complexity with interpretability?&lt;/li&gt;
&lt;li&gt;Have you worked with energy or financial time series data?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Check out the complete code on GitHub&lt;/strong&gt; and star the repo if you find it useful for your own learning journey!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This project was completed as part of a JPMorgan Chase quantitative research simulation, demonstrating real-world skills in financial analysis and machine learning operations.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Tags
&lt;/h3&gt;

&lt;h1&gt;
  
  
  machinelearning #quantitativefinance #datascience #python #mlops #timetSeries #forecasting #jpmorgan
&lt;/h1&gt;

</description>
      <category>datascience</category>
      <category>machinelearning</category>
      <category>opensource</category>
    </item>
    <item>
      <title>From Raw Data to HR Insights: My Journey Through Python-Powered Analytics</title>
      <dc:creator>Kamaumbugua-dev</dc:creator>
      <pubDate>Wed, 03 Sep 2025 19:15:41 +0000</pubDate>
      <link>https://dev.to/kamaumbuguadev/from-raw-data-to-hr-insights-my-journey-through-python-powered-analytics-n8k</link>
      <guid>https://dev.to/kamaumbuguadev/from-raw-data-to-hr-insights-my-journey-through-python-powered-analytics-n8k</guid>
      <description>&lt;p&gt;Over the past few weeks, I’ve taken a deep dive into HR analytics using Python. Starting with a dataset of employee records, I explored everything from basic data cleaning to advanced dimensionality reduction with PCA. This post is a reflection of what I’ve learned—broken down into four key stages: Exploratory Data Analysis (EDA), Business Analysis, Data Visualization, and PCA.&lt;/p&gt;

&lt;p&gt;Whether you're an aspiring data analyst or an HR professional curious about data-driven decision-making, this walkthrough will show you how Python can turn spreadsheets into strategy.&lt;/p&gt;




&lt;h2&gt;
  
  
  Part A: Basic Exploratory Data Analysis (EDA)
&lt;/h2&gt;

&lt;p&gt;Before diving into insights, I had to understand the data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Loaded the dataset using Pandas and previewed the first few rows&lt;/li&gt;
&lt;li&gt;Checked the shape to see how many rows and columns I was working with&lt;/li&gt;
&lt;li&gt;Inspected column types to identify numerical, categorical, and date fields&lt;/li&gt;
&lt;li&gt;Counted unique values to spot identifiers and categorical features&lt;/li&gt;
&lt;li&gt;Identified missing values using &lt;code&gt;.isnull()&lt;/code&gt; and planned data cleaning&lt;/li&gt;
&lt;li&gt;Described numerical columns with &lt;code&gt;.describe()&lt;/code&gt; to understand distributions&lt;/li&gt;
&lt;li&gt;Plotted salary distribution with Matplotlib to detect skewness&lt;/li&gt;
&lt;li&gt;Calculated average age from the DOB column using datetime operations&lt;/li&gt;
&lt;li&gt;Compared employment status (active vs terminated) using &lt;code&gt;.value_counts()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Identified largest departments using Seaborn’s &lt;code&gt;countplot()&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part B: Business Analysis
&lt;/h2&gt;

&lt;p&gt;Next, I tackled questions that HR teams care about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Average salary by department using &lt;code&gt;groupby()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Employment status breakdown with a pie chart&lt;/li&gt;
&lt;li&gt;Gender pay comparison using Seaborn’s &lt;code&gt;boxplot()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Top recruitment sources via &lt;code&gt;.value_counts()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Diversity Job Fair attendance calculated from a Boolean column&lt;/li&gt;
&lt;li&gt;Engagement scores by department with a barplot&lt;/li&gt;
&lt;li&gt;Race-based salary averages using &lt;code&gt;groupby()&lt;/code&gt; and &lt;code&gt;.mean()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Projects vs salary correlation visualized with a scatterplot&lt;/li&gt;
&lt;li&gt;Marital status and salary compared using a barplot&lt;/li&gt;
&lt;li&gt;Manager team sizes identified with &lt;code&gt;groupby().size()&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part C: Data Visualization
&lt;/h2&gt;

&lt;p&gt;To make the data speak visually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Salary histogram to show distribution&lt;/li&gt;
&lt;li&gt;Department headcount with a countplot&lt;/li&gt;
&lt;li&gt;Satisfaction scores by department using a barplot&lt;/li&gt;
&lt;li&gt;Termination trends over time with datetime plots&lt;/li&gt;
&lt;li&gt;Gender-based salary boxplot to highlight disparities&lt;/li&gt;
&lt;li&gt;Performance vs salary stripplot to spot trends&lt;/li&gt;
&lt;li&gt;Correlation heatmap to reveal relationships between variables&lt;/li&gt;
&lt;li&gt;Engagement vs satisfaction scatterplot to explore alignment&lt;/li&gt;
&lt;li&gt;Stacked bar chart of employment status across departments&lt;/li&gt;
&lt;li&gt;Absenteeism distribution with a histogram&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part D: PCA (Dimensionality Reduction)
&lt;/h2&gt;

&lt;p&gt;Finally, I explored Principal Component Analysis (PCA) to simplify the dataset:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standardized features using &lt;code&gt;StandardScaler()&lt;/code&gt; to prep for PCA&lt;/li&gt;
&lt;li&gt;Applied PCA and interpreted the first two components&lt;/li&gt;
&lt;li&gt;Plotted explained variance to understand dimensional importance&lt;/li&gt;
&lt;li&gt;Visualized PCA-reduced data colored by department&lt;/li&gt;
&lt;li&gt;Identified top contributing variables to PC1 and PC2&lt;/li&gt;
&lt;li&gt;Condensed engagement, satisfaction, and absences into one dimension&lt;/li&gt;
&lt;li&gt;Grouped employees by performance in PCA space&lt;/li&gt;
&lt;li&gt;Compared clustering before and after PCA using KMeans&lt;/li&gt;
&lt;li&gt;Created a PCA biplot to show feature loadings&lt;/li&gt;
&lt;li&gt;Discussed PCA use cases in HR—like simplifying survey data or improving clustering&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This journey taught me how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Clean and explore data with Pandas
&lt;/li&gt;
&lt;li&gt;Visualize insights with Seaborn and Matplotlib
&lt;/li&gt;
&lt;li&gt;Answer strategic HR questions with analytics
&lt;/li&gt;
&lt;li&gt;Simplify complexity using PCA
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;HR analytics isn’t just about dashboards—it’s about understanding people through data. Whether you're optimizing recruitment, improving engagement, or analyzing performance, Python gives you the tools to make smarter decisions.&lt;/p&gt;

&lt;p&gt;Thanks for reading! If you’ve worked with HR data or PCA, I’d love to hear your experiences. Drop a comment or share your favorite Python trick for workforce analytics.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Supervised Learning and the Power of Classification.</title>
      <dc:creator>Kamaumbugua-dev</dc:creator>
      <pubDate>Fri, 22 Aug 2025 00:37:03 +0000</pubDate>
      <link>https://dev.to/kamaumbuguadev/supervised-learning-and-the-power-of-classification-3lli</link>
      <guid>https://dev.to/kamaumbuguadev/supervised-learning-and-the-power-of-classification-3lli</guid>
      <description>&lt;p&gt;In the ever-evolving world of machine learning, supervised learning stands out as one of the most intuitive and widely used approaches. At its core, supervised learning is about teaching machines to learn from labeled data—just like a student learns from examples given by a teacher. The goal is to build models that can make predictions or decisions based on new, unseen data.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is Supervised Learning?
&lt;/h3&gt;

&lt;p&gt;Supervised learning involves training a model on a dataset that includes both input features and known output labels. The model learns the relationship between the inputs and outputs during training, and then applies that knowledge to predict outcomes for new data. It’s called “supervised” because the learning process is guided by the correct answers—like having an answer key during practice.&lt;/p&gt;

&lt;p&gt;There are two main types of supervised learning:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Regression&lt;/strong&gt;: Predicting continuous values (e.g., house prices).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Classification&lt;/strong&gt;: Predicting discrete categories (e.g., spam vs. not spam).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article focuses on classification, which is arguably the most practical and exciting branch of supervised learning.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Classification Works
&lt;/h3&gt;

&lt;p&gt;Classification is about sorting data into categories. For example, given a set of features about a student’s interaction with an AI tutor, can we predict whether they’ll use the system again? That’s a binary classification problem—yes or no.&lt;/p&gt;

&lt;p&gt;The process typically involves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Data Preparation&lt;/strong&gt;: Cleaning, encoding categorical variables, and scaling numerical features.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Training&lt;/strong&gt;: Feeding the labeled data into a classification algorithm.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation&lt;/strong&gt;: Measuring performance using metrics like accuracy, precision, recall, and F1-score.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prediction&lt;/strong&gt;: Applying the trained model to new data.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Models Used for Classification
&lt;/h3&gt;

&lt;p&gt;There’s no one-size-fits-all model. Each has its strengths depending on the data and the problem:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Logistic Regression&lt;/strong&gt;: Simple, interpretable, and surprisingly powerful for linearly separable data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decision Trees&lt;/strong&gt;: Easy to visualize and understand, but prone to overfitting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Random Forests&lt;/strong&gt;: An ensemble of decision trees that improves accuracy and reduces overfitting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Naive Bayes&lt;/strong&gt;: Fast and effective, especially for text classification.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;K-Nearest Neighbors (KNN)&lt;/strong&gt;: Classifies based on similarity to nearby data points.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gradient Boosting&lt;/strong&gt;: Builds models sequentially to correct previous errors—great for complex patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;XGBoost&lt;/strong&gt;: A high-performance version of gradient boosting, often winning machine learning competitions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  My Personal Views and Insights
&lt;/h3&gt;

&lt;p&gt;What fascinates me most about classification is its versatility. Whether you're predicting customer churn, diagnosing diseases, or filtering spam, classification models are everywhere. I’ve found that the real magic lies not just in choosing the right algorithm, but in understanding the data deeply. Feature engineering—creating meaningful inputs—is often more impactful than tweaking hyperparameters.&lt;/p&gt;

&lt;p&gt;I also appreciate how classification forces you to think critically about fairness and bias. A model that predicts loan approvals or job suitability must be scrutinized to ensure it doesn’t perpetuate discrimination. That ethical dimension makes classification not just technical, but profoundly human.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenges I’ve Faced
&lt;/h3&gt;

&lt;p&gt;Working with classification hasn’t always been smooth sailing. Some of the hurdles I’ve encountered include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Imbalanced Data&lt;/strong&gt;: When one class dominates, models tend to ignore the minority class. Techniques like SMOTE or adjusting class weights help, but it’s tricky.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overfitting&lt;/strong&gt;: Especially with decision trees, models can memorize the training data instead of generalizing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature Selection&lt;/strong&gt;: Including irrelevant features can confuse the model, while excluding important ones can cripple it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interpretability vs. Accuracy&lt;/strong&gt;: Complex models like XGBoost offer high accuracy but are harder to explain, which can be a problem in sensitive domains.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Despite these challenges, classification remains one of the most rewarding areas of machine learning. It’s where theory meets real-world impact, and every dataset tells a story waiting to be decoded.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Navigating the Trade-Off Between Type I and Type II Errors: A Medical Perspective</title>
      <dc:creator>Kamaumbugua-dev</dc:creator>
      <pubDate>Thu, 21 Aug 2025 19:03:42 +0000</pubDate>
      <link>https://dev.to/kamaumbuguadev/navigating-the-trade-off-between-type-i-and-type-ii-errors-a-medical-perspective-2dcp</link>
      <guid>https://dev.to/kamaumbuguadev/navigating-the-trade-off-between-type-i-and-type-ii-errors-a-medical-perspective-2dcp</guid>
      <description>&lt;p&gt;In the world of data science and machine learning, classification models are powerful tools for decision-making. However, every model comes with the risk of making mistakes—specifically, Type I and Type II errors. Understanding where to trade off between these errors is crucial, especially in high-stakes fields like medicine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Type I and Type II Errors
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Type I Error (False Positive):&lt;/strong&gt; The model incorrectly predicts a positive result when the truth is negative. In medical terms, this could mean diagnosing a healthy patient as sick.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type II Error (False Negative):&lt;/strong&gt; The model incorrectly predicts a negative result when the truth is positive. In medicine, this means failing to diagnose a sick patient.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Medical Scenario: Cancer Screening
&lt;/h2&gt;

&lt;p&gt;Imagine a classification model designed to detect cancer from patient data. The stakes are high—both errors have serious consequences, but their impacts differ.&lt;/p&gt;

&lt;h3&gt;
  
  
  Type I Error in Cancer Screening
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What happens?&lt;/strong&gt; A healthy patient is told they might have cancer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consequences:&lt;/strong&gt; Emotional distress, unnecessary further testing (which may be invasive or expensive), and potential side effects from unwarranted treatments.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Type II Error in Cancer Screening
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What happens?&lt;/strong&gt; A patient with cancer is told they are healthy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consequences:&lt;/strong&gt; Missed early treatment opportunities, disease progression, and potentially fatal outcomes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Where to Trade Off: The Decision
&lt;/h2&gt;

&lt;p&gt;The trade-off between Type I and Type II errors is often visualized using the &lt;strong&gt;confusion matrix&lt;/strong&gt; and controlled by adjusting the model’s &lt;strong&gt;decision threshold&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lowering the threshold&lt;/strong&gt; increases sensitivity (recall), reducing Type II errors but increasing Type I errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Raising the threshold&lt;/strong&gt; increases specificity, reducing Type I errors but increasing Type II errors.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  In Medical Practice
&lt;/h3&gt;

&lt;p&gt;In cancer screening, &lt;strong&gt;minimizing Type II errors is usually prioritized&lt;/strong&gt;. Missing a cancer diagnosis can be life-threatening, so the model is tuned to catch as many true cases as possible—even if it means more false alarms (Type I errors). This is why many screening tests are designed to be highly sensitive, accepting a higher rate of false positives to ensure that no true cases are missed.&lt;/p&gt;

&lt;p&gt;However, the balance isn’t always the same. For diseases where treatment is risky or expensive, or where false positives cause significant harm, the threshold may be adjusted to reduce Type I errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The trade-off between Type I and Type II errors is context-dependent. In medical scenarios like cancer screening, the cost of missing a diagnosis (Type II error) often outweighs the cost of a false alarm (Type I error). As data scientists and practitioners, it’s essential to understand the domain and collaborate with experts to set thresholds that best serve patient outcomes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://en.wikipedia.org/wiki/Confusion_matrix" rel="noopener noreferrer"&gt;Confusion Matrix Explained&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2636062/" rel="noopener noreferrer"&gt;Sensitivity and Specificity in Medical Testing&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;If you found this article helpful, follow me on &lt;a href="https://dev.to/"&gt;Dev.to&lt;/a&gt; for more insights on data science in healthcare!&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Predicting House Prices with Python: Data Cleaning, Modeling, and Feature Importance</title>
      <dc:creator>Kamaumbugua-dev</dc:creator>
      <pubDate>Thu, 21 Aug 2025 16:23:09 +0000</pubDate>
      <link>https://dev.to/kamaumbuguadev/predicting-house-prices-with-python-data-cleaning-modeling-and-feature-importance-2o4l</link>
      <guid>https://dev.to/kamaumbuguadev/predicting-house-prices-with-python-data-cleaning-modeling-and-feature-importance-2o4l</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this project, I tackled a classic machine learning problem: predicting house prices based on various property features. The journey involved real-world data cleaning, feature engineering, model building, and interpreting results using both standard regression metrics and ANOVA-based feature importance. Here’s a summary of my approach, key insights, and the skills I developed along the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Data Cleaning
&lt;/h3&gt;

&lt;p&gt;Real-world datasets are rarely perfect. My first step was to ensure the data was clean and consistent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standardized column names&lt;/strong&gt; by removing extra spaces, converting to lowercase, and replacing spaces with underscores.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handled missing values&lt;/strong&gt; by filling numeric columns with their mean values.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standardized categorical values&lt;/strong&gt; (like &lt;code&gt;location&lt;/code&gt;, &lt;code&gt;furnishing&lt;/code&gt;, and &lt;code&gt;house_condition&lt;/code&gt;) by correcting typos and ensuring consistent capitalization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Converted categorical variables&lt;/strong&gt; to numeric using one-hot encoding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ensured all features were numeric&lt;/strong&gt; and dropped or filled any remaining missing values.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Removed duplicate rows&lt;/strong&gt; to avoid bias in modeling.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Feature Engineering
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Derived new columns where useful (e.g., converting year built to house age).&lt;/li&gt;
&lt;li&gt;Prepared categorical features for modeling by encoding them numerically.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Model Building
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model Used:&lt;/strong&gt; Linear Regression from scikit-learn.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training/Test Split:&lt;/strong&gt; 80% of the data was used for training, 20% for testing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation Metrics:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Mean Squared Error (MSE)&lt;/li&gt;
&lt;li&gt;Root Mean Squared Error (RMSE)&lt;/li&gt;
&lt;li&gt;Mean Absolute Error (MAE)&lt;/li&gt;
&lt;li&gt;R² Score&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  Results:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MSE:&lt;/strong&gt; 7.80e-22 (almost zero)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RMSE:&lt;/strong&gt; 2.79e-11 (almost zero)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MAE:&lt;/strong&gt; 1.74e-11 (almost zero)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;R² Score:&lt;/strong&gt; 1.0 (perfect fit)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Such perfect results are rare in real-world scenarios and may indicate a very simple dataset or potential data leakage. Always double-check your data pipeline!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  4. Feature Importance with ANOVA
&lt;/h3&gt;

&lt;p&gt;To understand which features most influence house prices, I used ANOVA (Analysis of Variance) via &lt;code&gt;f_regression&lt;/code&gt; from scikit-learn. This provided F-values and p-values for each feature, highlighting their statistical significance.&lt;/p&gt;

&lt;h4&gt;
  
  
  Key Insights from ANOVA:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Most Important Predictors:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Converted_datatype_for_price($)&lt;/code&gt;, &lt;code&gt;Size_sqft&lt;/code&gt;, &lt;code&gt;Converted_datatype_for_size_sqft&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;House_condition_New&lt;/code&gt;, &lt;code&gt;House_condition_Old&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Has_pool&lt;/code&gt;, &lt;code&gt;Year_built&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Moderately Important:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Furnishing_Semi-Furnished&lt;/code&gt;, &lt;code&gt;Lot_size&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Not Significant:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Bath_rooms&lt;/code&gt;, &lt;code&gt;Garage_available&lt;/code&gt;, &lt;code&gt;Location_Urban&lt;/code&gt;, and others&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h4&gt;
  
  
  Visualization
&lt;/h4&gt;

&lt;p&gt;I visualized the F-values and p-values using a heatmap to quickly identify the most influential features:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;

&lt;span class="n"&gt;heatmap_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anova_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Feature&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;F_value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;p_value&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;figure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;sns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;heatmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;heatmap_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;annot&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;YlGnBu&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.2e&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ANOVA F-value and p-value Heatmap for Features&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Skills and Experience Gained
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data Cleaning:&lt;/strong&gt; Learned to handle missing values, standardize data, and ensure consistency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature Engineering:&lt;/strong&gt; Gained experience in transforming and encoding features for machine learning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Evaluation:&lt;/strong&gt; Used multiple regression metrics to assess model performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Statistical Analysis:&lt;/strong&gt; Applied ANOVA to interpret feature importance and guide model refinement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visualization:&lt;/strong&gt; Created clear plots to communicate results and insights.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Critical Thinking:&lt;/strong&gt; Recognized the importance of checking for data leakage and overfitting.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This project was a comprehensive exercise in the end-to-end machine learning workflow, from raw data to actionable insights. The experience reinforced the importance of data preparation, careful model evaluation, and statistical interpretation in building robust predictive models.&lt;/p&gt;

&lt;p&gt;**Thanks for reading! If you have questions or want to discuss more about data science and machine learning, feel free to&lt;/p&gt;

</description>
    </item>
    <item>
      <title>AI Meets Cloud: My Experience Passing Oracle’s AI Foundations Associate Exam</title>
      <dc:creator>Kamaumbugua-dev</dc:creator>
      <pubDate>Thu, 21 Aug 2025 16:21:55 +0000</pubDate>
      <link>https://dev.to/kamaumbuguadev/ai-meets-cloud-my-experience-passing-oracles-ai-foundations-associate-exam-1f3a</link>
      <guid>https://dev.to/kamaumbuguadev/ai-meets-cloud-my-experience-passing-oracles-ai-foundations-associate-exam-1f3a</guid>
      <description>&lt;p&gt;&lt;a href="https://i.postimg.cc/ydgfZpnt/oci-exam-pass-badge.png" rel="noopener noreferrer"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;👋 Hi Dev Community!&lt;/p&gt;

&lt;p&gt;I’m thrilled to share that I recently passed the &lt;strong&gt;Oracle Cloud Infrastructure (OCI) 2025 AI Foundations Associate Exam (1Z0-1122-25)&lt;/strong&gt; with a score of &lt;strong&gt;88%&lt;/strong&gt;, well above the passing threshold of 65%! 🎉&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🆔 &lt;strong&gt;Oracle Testing ID&lt;/strong&gt;: OC6613094
&lt;/li&gt;
&lt;li&gt;📅 &lt;strong&gt;Exam Date&lt;/strong&gt;: August 5, 2025
&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Result&lt;/strong&gt;: Pass
&lt;/li&gt;
&lt;li&gt;📊 &lt;strong&gt;Score&lt;/strong&gt;: 88% (Passing Score: 65%)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This certification wasn’t just a badge to collect — it was a deep dive into the growing convergence between &lt;strong&gt;cloud computing, artificial intelligence, and machine learning&lt;/strong&gt;. As a developer and data science enthusiast, the journey equipped me with technical insights and practical knowledge that I’m already applying in real-world scenarios.&lt;/p&gt;

&lt;p&gt;What the Certification Covers&lt;/p&gt;

&lt;p&gt;The Oracle Cloud Infrastructure AI Foundations Associate exam** focuses on critical AI and ML concepts within the Oracle Cloud environment. Here are the core areas I mastered during preparation and testing:&lt;/p&gt;

&lt;p&gt;OCI Generative AI Services&lt;br&gt;
Learned how to use Oracle’s Generative AI APIs to build intelligent apps that can understand, summarize, translate, and generate human-like content. These services are key for modern applications that rely on &lt;strong&gt;LLMs (Large Language Models)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Use Case: Automating content creation, chatbots, document analysis, and code generation.&lt;/p&gt;

&lt;p&gt;🔎 OCI AI Services Overview&lt;br&gt;
Explored Oracle’s prebuilt AI services, which include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Language and Speech&lt;/li&gt;
&lt;li&gt;Vision (Image Analysis)&lt;/li&gt;
&lt;li&gt;Anomaly Detection&lt;/li&gt;
&lt;li&gt;Forecasting&lt;/li&gt;
&lt;li&gt;Document Understanding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These allow developers to integrate AI into their applications without needing to build models from scratch**.&lt;/p&gt;

&lt;p&gt;OCI ML Services Overview&lt;br&gt;
Gained insight into OCI's machine learning lifecycle: from data ingestion and preprocessing to model training, deployment, and monitoring. The platform supports both automated ML (AutoML) and custom model development.&lt;/p&gt;

&lt;p&gt;Oracle Vector Search&lt;br&gt;
This was a particularly exciting topic! Vector Search enables semantic search by matching the meaning of queries with the content — not just the keywords. It's critical for applications like recommendation systems, search engines, and AI chat assistants.&lt;/p&gt;

&lt;p&gt;Supervised Learning Fundamentals&lt;br&gt;
The foundation of many AI systems — I reviewed core principles of:&lt;br&gt;
Regression – Predicting numerical values (e.g., stock prices, weather).&lt;br&gt;
Classification – Categorizing data (e.g., spam vs. not spam, fraud detection).&lt;/p&gt;

&lt;p&gt;These are essential concepts for any data scientist or machine learning practitioner.&lt;/p&gt;

&lt;p&gt;Why This Matters&lt;/p&gt;

&lt;p&gt;Cloud-native AI and ML tools are becoming essential for building scalable, intelligent applications. Earning this certification confirms my capabilities not just in theory, but in applying these technologies using Oracle's enterprise-grade infrastructure.&lt;/p&gt;

&lt;p&gt;Whether you're a developer breaking into AI or a cloud architect expanding your toolkit, understanding how services like OCI AI, ML, and Generative AI work together is a huge advantage.&lt;/p&gt;

&lt;p&gt;What’s Next?&lt;/p&gt;

&lt;p&gt;This certification is just one step in my learning journey. I'm now focused on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Experimenting with Oracle’s Generative AI SDKs&lt;/li&gt;
&lt;li&gt;Building real-world applications using &lt;strong&gt;Vector Search&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Exploring advanced ML model deployment in OCI&lt;/li&gt;
&lt;li&gt;Contributing more AI and ML content here on Dev.to&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Final Thoughts&lt;/p&gt;

&lt;p&gt;The world is moving fast — and those of us in tech need to move with it. This certification has given me both the &lt;strong&gt;confidence and competence&lt;/strong&gt; to build in the AI space, and I encourage anyone interested to explore the Oracle Cloud learning path.&lt;/p&gt;

&lt;p&gt;Have questions about the exam or want to discuss AI/cloud careers? Let’s chat in the comments! &lt;/p&gt;

&lt;p&gt;Let’s connect:&lt;br&gt;&lt;br&gt;
🔗 &lt;a href="http://www.linkedin.com/in/steven-mbugua-kamau" rel="noopener noreferrer"&gt;www.linkedin.com/in/steven-mbugua-kamau&lt;/a&gt;  &lt;/p&gt;

&lt;h1&gt;
  
  
  oracle #cloudcomputing #AI #MachineLearning #OCI #GenerativeAI #certification #career #datascience #developers
&lt;/h1&gt;

</description>
    </item>
    <item>
      <title>From Data to Predictions: My Journey Building a California Housing Price Model.</title>
      <dc:creator>Kamaumbugua-dev</dc:creator>
      <pubDate>Wed, 13 Aug 2025 11:24:22 +0000</pubDate>
      <link>https://dev.to/kamaumbuguadev/from-data-to-predictions-my-journey-building-a-california-housing-price-model-49nd</link>
      <guid>https://dev.to/kamaumbuguadev/from-data-to-predictions-my-journey-building-a-california-housing-price-model-49nd</guid>
      <description>&lt;p&gt;Over the past few weeks, I’ve been diving deep into machine learning by working on a project that predicts California housing prices. This hands-on journey not only strengthened my technical skills but also gave me a clearer understanding of the workflow that turns raw data into actionable insights.&lt;/p&gt;

&lt;p&gt;In this article, I’ll walk you through:&lt;/p&gt;

&lt;p&gt;What I built&lt;/p&gt;

&lt;p&gt;The skills I gained&lt;/p&gt;

&lt;p&gt;Why these skills matter in the real world&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Project Overview&lt;/strong&gt;&lt;/em&gt;&lt;br&gt;
The goal was to build a regression model that could predict median house prices in California using the California Housing dataset.&lt;/p&gt;

&lt;p&gt;Here’s the process I followed:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Loading the dataset&lt;/strong&gt;&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;housing = datasets.fetch_california_housing()
x = housing.data
y = housing.target
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This dataset contains information such as median income, house age, and average rooms per household.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Feature Engineering&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
I expanded the dataset using Polynomial Features to capture more complex relationships between the variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;poly = PolynomialFeatures()
x = poly.fit_transform(x)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This generated 37 additional features essentially combinations and squared values of the original features giving the model more information to learn from.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Train-Test Split&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
To ensure the model could generalize, I split the data into training (80%) and testing (20%) sets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Model Optimization&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
I experimented with different learning rates and iteration counts using the HistGradientBoostingRegressor, a powerful gradient boosting algorithm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;model = HistGradientBoostingRegressor(
    max_iter=350,
    learning_rate=0.05
)
model.fit(x_train, y_train)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Evaluation&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
I measured model performance using the R² score:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;r2 = r2_score(y_test, y_pred)
print(r2)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This score reflects how well the model explains the variation in housing prices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Model Deployment&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
I saved the trained model using joblib so it can be reused in future applications without retraining:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;joblib.dump(model, "housing_price_model.joblib")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Key Skills I Gained&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Data Preprocessing &amp;amp; Feature Engineering&lt;/p&gt;

&lt;p&gt;Learned how to transform raw datasets into forms that machine learning models can better understand.&lt;/p&gt;

&lt;p&gt;Understood the importance of feature interactions through polynomial feature expansion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Model Selection &amp;amp; Optimization&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Experimented with different learning rates, iteration counts, and model architectures.&lt;/p&gt;

&lt;p&gt;Gained experience in tuning hyperparameters to balance accuracy and computational efficiency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Model Evaluation&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Applied the R² score to assess model performance.&lt;/p&gt;

&lt;p&gt;Learned how to interpret evaluation metrics in a real-world context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Model Persistence&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Used joblib to save and load trained models — a critical skill for deploying ML solutions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Why These Skills Matter&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
These skills aren’t just academic exercises — they’re exactly what data scientists and machine learning engineers use in real-world projects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Feature engineering&lt;/em&gt;&lt;/strong&gt; is the backbone of improving model performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Hyperparameter tuning&lt;/em&gt;&lt;/strong&gt; can make the difference between an okay model and a production-ready one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Model evaluation&lt;/em&gt;&lt;/strong&gt; ensures you’re building something that works beyond your own dataset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Model persistence&lt;/em&gt;&lt;/strong&gt; bridges the gap between experimentation and real-world application.&lt;/p&gt;

&lt;p&gt;With these capabilities, I can confidently approach real-world datasets, build predictive models, and prepare them for production environments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Next Steps&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
This project has been a solid step forward in my machine learning journey. My plan is to:&lt;/p&gt;

&lt;p&gt;Experiment with ensemble models to further improve performance.&lt;/p&gt;

&lt;p&gt;Deploy the trained model via an API so it can be used in web applications.&lt;/p&gt;

&lt;p&gt;Apply similar workflows to other datasets, such as sales forecasting and recommendation systems.&lt;/p&gt;

&lt;p&gt;If you’re a developer or employer looking for someone who can turn data into decisions, this project is a small window into how I approach machine learning challenges in a way that is methodical, curious, and results-driven.&lt;/p&gt;

&lt;p&gt;I’d love to hear your thoughts on  how would you have improved this model?&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
