<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: maryu0</title>
    <description>The latest articles on DEV Community by maryu0 (@maryu0).</description>
    <link>https://dev.to/maryu0</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1902798%2F745433e7-d6aa-4f0d-973d-fb5bcacf2b4b.png</url>
      <title>DEV Community: maryu0</title>
      <link>https://dev.to/maryu0</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/maryu0"/>
    <language>en</language>
    <item>
      <title>I built an AI debugging assistant with Llama 3.3 — here's what actually worked</title>
      <dc:creator>maryu0</dc:creator>
      <pubDate>Fri, 15 May 2026 19:13:56 +0000</pubDate>
      <link>https://dev.to/maryu0/i-built-an-ai-debugging-assistant-with-llama-33-heres-what-actually-worked-ind</link>
      <guid>https://dev.to/maryu0/i-built-an-ai-debugging-assistant-with-llama-33-heres-what-actually-worked-ind</guid>
      <description>&lt;p&gt;Every developer has been there. It's 2am, your CI pipeline is red, and you're staring at a wall of error logs trying to figure out which of the 47 things that could be wrong is actually wrong.&lt;/p&gt;

&lt;p&gt;That pain is what made me build &lt;strong&gt;FailSense&lt;/strong&gt; — an AI debugging assistant that ingests error logs and returns ranked, actionable fixes using Llama 3.3. Here's an honest breakdown of what I built, the mistakes I made, and what I'd do differently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;~40% reduction in debugging time · ~99% uptime on AWS · 2 services, one pipeline&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem with debugging + LLMs
&lt;/h2&gt;

&lt;p&gt;The naive approach is obvious: dump the error into ChatGPT and hope for the best. It kind of works. But it breaks down quickly when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your error spans multiple files and stack frames&lt;/li&gt;
&lt;li&gt;The root cause is buried 3 levels deep in a dependency&lt;/li&gt;
&lt;li&gt;You need ranked fixes, not a monologue&lt;/li&gt;
&lt;li&gt;You want this in your own pipeline, not a chat UI
So I decided to build something purpose-built for error log analysis — with structured output, confidence-ranked fixes, and a real deployment.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Architecture: keep it boring
&lt;/h2&gt;

&lt;p&gt;The stack is deliberately simple. Two services. One job each.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Next.js (Frontend) → FastAPI (Backend) → Llama 3.3 via Groq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Next.js frontend handles log input and renders ranked fixes. The FastAPI backend owns all the prompt logic, output parsing, and error handling. Llama 3.3 runs on Groq for low-latency inference — this matters more than you'd think when users are already frustrated.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Lesson learned:&lt;/strong&gt; Don't add a third service just because you can. Every hop between services is a new failure point, a new auth layer, and a new thing to monitor at 2am.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The prompt that actually works
&lt;/h2&gt;

&lt;p&gt;This took the most iteration. The first version just said "here's an error, fix it." The output was verbose, unstructured, and hard to parse programmatically. Here's the version that works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
You are a senior software engineer debugging production errors.
Given an error log, return ONLY a JSON array of fixes, ranked by likelihood.
Each fix must have:
  - rank (int): 1 = most likely cause
  - cause (str): one sentence root cause
  - fix (str): exact steps to resolve
  - confidence (float): 0.0 to 1.0

Return nothing else. No preamble. No markdown. Raw JSON only.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things made this work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Explicit output format&lt;/strong&gt; — telling the model to return raw JSON (not markdown-wrapped JSON) saved me a ton of parsing headaches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Role framing&lt;/strong&gt; — "senior software engineer" shifts the model toward precise, opinionated output over safe hedging&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  3. &lt;strong&gt;Ranked by likelihood&lt;/strong&gt; — forcing a ranking means the most actionable fix is always first, which is what a tired developer actually wants
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Parsing LLM output without going insane
&lt;/h2&gt;

&lt;p&gt;LLMs are not deterministic JSON machines. Sometimes Llama 3.3 returns perfect JSON. Sometimes it adds a sentence before it. Sometimes the confidence is a string instead of a float. Here's the defensive parsing layer I built:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_fixes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Strip markdown fences if present
&lt;/span&gt;    &lt;span class="n"&gt;clean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;```

(?:json)?|

```&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;fixes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Try to extract the JSON array from within a larger string
&lt;/span&gt;        &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\[.*\]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;clean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DOTALL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;fixes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="c1"&gt;# Normalize confidence to float
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fixes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fixes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Hot take:&lt;/strong&gt; If you're not writing a fallback parser for LLM output, you're writing a bug. Models drift, prompts drift, and what works today breaks next month.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Deployment: boring is good
&lt;/h2&gt;

&lt;p&gt;Next.js on Vercel. FastAPI on Railway. Both wired up with GitHub Actions for CI/CD. Every push to main triggers a deploy. The whole thing costs under $5/month to run.&lt;/p&gt;

&lt;p&gt;The ~99% uptime wasn't magic — it was just not doing anything clever. No custom load balancers, no exotic infra. Just two managed services that restart themselves when they crash.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Add evals from day one.&lt;/strong&gt; I had no systematic way to know if a prompt change made things better or worse. I was eyeballing it. Don't eyeball it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stream the response.&lt;/strong&gt; Waiting 3-4 seconds for the full JSON response feels slow. Streaming partial results — even just a loading state with intermediate tokens — makes it feel snappy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  - &lt;strong&gt;Log everything.&lt;/strong&gt; What errors are users pasting in? What fixes are they ignoring? This data is gold for improving the prompt and I threw it away by not logging it.
&lt;/h2&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;Building production AI tools is less about the model and more about the scaffolding around it. The prompt, the output parser, the fallback handling, the latency — that's where the real engineering happens.&lt;/p&gt;

&lt;p&gt;FailSense isn't magic. It's a well-prompted LLM with a defensive parser and a boring deployment. That's enough to cut debugging time by ~40% and actually ship something people use.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Check out the full source on &lt;a href="https://github.com/maryu0/FailSense.git" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · Built with Next.js, FastAPI, Groq, and Llama 3.3&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>productivity</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
