<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ahmad Humayun</title>
    <description>The latest articles on DEV Community by Ahmad Humayun (@ahmadhumayun).</description>
    <link>https://dev.to/ahmadhumayun</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3971399%2F9da70778-1a1f-44bd-bf65-48eda361c7ff.jpg</url>
      <title>DEV Community: Ahmad Humayun</title>
      <link>https://dev.to/ahmadhumayun</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ahmadhumayun"/>
    <language>en</language>
    <item>
      <title>Why I Don’t Let the LLM Decide Issue State</title>
      <dc:creator>Ahmad Humayun</dc:creator>
      <pubDate>Sat, 06 Jun 2026 17:23:56 +0000</pubDate>
      <link>https://dev.to/ahmadhumayun/why-i-dont-let-the-llm-decide-issue-state-dip</link>
      <guid>https://dev.to/ahmadhumayun/why-i-dont-let-the-llm-decide-issue-state-dip</guid>
      <description>&lt;p&gt;When you build an AI system for marketing performance monitoring, one tempting idea is to let the LLM decide everything.&lt;/p&gt;

&lt;p&gt;Campaign pacing is off.&lt;/p&gt;

&lt;p&gt;Creative frequency is too high.&lt;/p&gt;

&lt;p&gt;A product category is spending inefficiently.&lt;/p&gt;

&lt;p&gt;So the natural thought is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Let’s send the current issue and previous issue to the LLM and ask if this is new, recurring, worsening, or improving.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Last week this advertiser had issue: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Previous metric: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prev_value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Current metric: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Is this issue new, recurring, worsening, or improving?
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks fine in a demo.&lt;/p&gt;

&lt;p&gt;But I would not use this in production.&lt;/p&gt;

&lt;p&gt;The issue state is not a language problem. It is a data/history problem.&lt;/p&gt;

&lt;p&gt;If the same issue existed last week and the metric is now 25% worse, the issue is worsening. If it disappeared, it is resolved. If it appeared for the first time, it is new.&lt;/p&gt;

&lt;p&gt;There is no reason to spend tokens or accept non-determinism for that.&lt;/p&gt;

&lt;h2&gt;
  
  
  The mistake: using the LLM for decisions that already have an algorithmic answer
&lt;/h2&gt;

&lt;p&gt;LLMs are useful when the output needs language, judgment, or explanation.&lt;/p&gt;

&lt;p&gt;But deciding whether an issue is &lt;code&gt;NEW&lt;/code&gt;, &lt;code&gt;RECURRING&lt;/code&gt;, or &lt;code&gt;WORSENING&lt;/code&gt; should not depend on prompt wording.&lt;/p&gt;

&lt;p&gt;In a monitoring system, this becomes a real problem.&lt;/p&gt;

&lt;p&gt;If the same campaign issue is classified differently on different runs, the whole weekly comparison becomes noisy. You cannot reliably tell whether performance is actually getting worse or the LLM just described it differently this time.&lt;/p&gt;

&lt;p&gt;That is why I keep the issue lifecycle deterministic.&lt;/p&gt;

&lt;p&gt;The LLM can write the explanation later.&lt;/p&gt;

&lt;p&gt;It should not decide the state.&lt;/p&gt;

&lt;h2&gt;
  
  
  The state machine I use
&lt;/h2&gt;

&lt;p&gt;For each detected issue, I track it across analysis periods using a small state machine.&lt;/p&gt;

&lt;p&gt;The states are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;NEW
RECURRING
WORSENING
IMPROVING
RESOLVED
STALE
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The meaning is simple.&lt;/p&gt;

&lt;h3&gt;
  
  
  NEW
&lt;/h3&gt;

&lt;p&gt;The issue is detected for the first time for this advertiser or entity.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Creative fatigue was not present last week, but it is detected this week.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  RECURRING
&lt;/h3&gt;

&lt;p&gt;The same issue is still present, but the metric did not move enough to call it better or worse.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Frequency was high last week and is still high this week, but only changed by 3%.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  WORSENING
&lt;/h3&gt;

&lt;p&gt;The issue is still present and the underlying metric degraded beyond a threshold.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;CPA was already high, and now it is 28% higher than the previous period.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  IMPROVING
&lt;/h3&gt;

&lt;p&gt;The issue is still present, but the metric moved in the right direction.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Frequency is still above the target range, but it dropped by 22%.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  RESOLVED
&lt;/h3&gt;

&lt;p&gt;The issue was present before, but it is no longer detected.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Budget pacing was behind last week, but spend is now back within the expected range.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  STALE
&lt;/h3&gt;

&lt;p&gt;This one is useful in real systems.&lt;/p&gt;

&lt;p&gt;Sometimes an issue does not get resolved cleanly. Data is missing, an account stops syncing, or a campaign disappears from the source data.&lt;/p&gt;

&lt;p&gt;In those cases, I do not always want to mark the issue as resolved immediately.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;STALE&lt;/code&gt; means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This issue was previously active, but we have not seen enough recent evidence to confidently call it resolved.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This avoids false “good news” when the real problem is a data gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core logic
&lt;/h2&gt;

&lt;p&gt;The basic version is not complicated.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_issue_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;current_value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;previous_record&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IssueRecord&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;threshold_pct&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;IssueState&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;previous_record&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;IssueState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NEW&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;previous_record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;IssueState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RESOLVED&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;IssueState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NEW&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;IssueState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RECURRING&lt;/span&gt;

    &lt;span class="n"&gt;previous_value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;previous_record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_value&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;previous_value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;previous_value&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold_pct&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;IssueState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WORSENING&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;threshold_pct&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;IssueState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IMPROVING&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;IssueState&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RECURRING&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives me a predictable result every time.&lt;/p&gt;

&lt;p&gt;Same input.&lt;/p&gt;

&lt;p&gt;Same previous state.&lt;/p&gt;

&lt;p&gt;Same threshold.&lt;/p&gt;

&lt;p&gt;Same output.&lt;/p&gt;

&lt;p&gt;That matters a lot more than people realize.&lt;/p&gt;

&lt;h2&gt;
  
  
  Boolean issues need different handling
&lt;/h2&gt;

&lt;p&gt;Not every issue has a numeric severity value.&lt;/p&gt;

&lt;p&gt;Some issues are just present or absent.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tracking pixel missing
Creative disapproved
Campaign has no active ads
Required targeting field missing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are boolean issues.&lt;/p&gt;

&lt;p&gt;They cannot really become “20% worse.”&lt;/p&gt;

&lt;p&gt;A creative is either disapproved or it is not.&lt;/p&gt;

&lt;p&gt;So for boolean issues, the lifecycle is simpler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;first detection  -&amp;gt; NEW
detected again   -&amp;gt; RECURRING
not detected     -&amp;gt; RESOLVED
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For numeric issues, I can compare magnitude.&lt;/p&gt;

&lt;p&gt;Examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;High CPM
Creative fatigue
Budget pacing gap
CPA increase
ROAS drop
Frequency increase
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These can move up or down, so they can be worsening or improving.&lt;/p&gt;

&lt;p&gt;One simple way to model this is to keep the issue kind inside the enum.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;enum&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Enum&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;IssueKind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;NUMERIC&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;numeric&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;BOOLEAN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;boolean&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;IssueType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;HIGH_CPM&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high_cpm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IssueKind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;CREATIVE_FATIGUE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;creative_fatigue&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IssueKind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;BUDGET_PACING_GAP&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;budget_pacing_gap&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IssueKind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;TRACKING_FAILURE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tracking_failure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IssueKind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BOOLEAN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;CREATIVE_DISAPPROVED&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;creative_disapproved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IssueKind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BOOLEAN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;TARGETING_ERROR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;targeting_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;IssueKind&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BOOLEAN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;IssueKind&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;slug&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;slug&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kind&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kind&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the state transition does not need to guess what kind of issue it is dealing with.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thresholds should not be global
&lt;/h2&gt;

&lt;p&gt;I usually do not like using one fixed threshold for every issue type.&lt;/p&gt;

&lt;p&gt;A 20% change can mean different things depending on the metric.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A 20% CPA increase may be meaningful.&lt;/li&gt;
&lt;li&gt;A 20% CPM change may be normal in some accounts.&lt;/li&gt;
&lt;li&gt;A 10% frequency increase may already matter if the audience is small.&lt;/li&gt;
&lt;li&gt;A small budget pacing gap may not be worth escalating.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the threshold should be configurable by issue type.&lt;/p&gt;

&lt;p&gt;Something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ISSUE_THRESHOLDS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;IssueType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HIGH_CPM&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IssueType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CREATIVE_FATIGUE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;IssueType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BUDGET_PACING_GAP&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps the state machine simple, but still lets each metric behave differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the LLM actually belongs
&lt;/h2&gt;

&lt;p&gt;I still use the LLM.&lt;/p&gt;

&lt;p&gt;Just not for the lifecycle decision.&lt;/p&gt;

&lt;p&gt;Once the issue state is already computed, I pass structured context to the LLM and ask it to write the recommendation.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Issue type: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;issue_type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;slug&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
State: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Current value: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Previous value: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;previous_value&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
Delta: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delta_pct&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

Write a concise recommendation for this issue.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the LLM is doing the part it is actually good at:&lt;/p&gt;

&lt;p&gt;Turning structured data into a readable explanation.&lt;/p&gt;

&lt;p&gt;It can say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Creative fatigue is worsening. Frequency increased by 24% compared to the previous period, while CTR continued to decline. Consider rotating in fresh creatives or reducing spend on the affected ad set.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But it did not decide that the issue was worsening.&lt;/p&gt;

&lt;p&gt;The system already knew that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this separation matters
&lt;/h2&gt;

&lt;p&gt;This design gives a few practical benefits.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The system is auditable
&lt;/h3&gt;

&lt;p&gt;If someone asks why an issue was marked as worsening, I can show the exact values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;previous value: 1.20
current value: 1.55
delta: +29.1%
threshold: 20%
state: WORSENING
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No prompt guessing.&lt;/p&gt;

&lt;p&gt;No “the model thought it was worse.”&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The system is consistent
&lt;/h3&gt;

&lt;p&gt;The same input produces the same state every time.&lt;/p&gt;

&lt;p&gt;This is important for weekly monitoring, dashboards, alerts, and Slack summaries.&lt;/p&gt;

&lt;p&gt;If the state changes, it changed because the data changed.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. It is cheaper
&lt;/h3&gt;

&lt;p&gt;There is no reason to call an LLM thousands of times just to classify a numeric delta.&lt;/p&gt;

&lt;p&gt;That cost adds up quickly if you are checking many advertisers, campaigns, creatives, products, or funnel stages.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Debugging is easier
&lt;/h3&gt;

&lt;p&gt;If the state assignment is wrong, I know where to look.&lt;/p&gt;

&lt;p&gt;Maybe the threshold is too sensitive.&lt;/p&gt;

&lt;p&gt;Maybe the previous period logic is wrong.&lt;/p&gt;

&lt;p&gt;Maybe the metric should be inverted because lower is better.&lt;/p&gt;

&lt;p&gt;These are engineering problems.&lt;/p&gt;

&lt;p&gt;They are much easier to fix than prompt behavior.&lt;/p&gt;

&lt;h2&gt;
  
  
  The general rule I follow
&lt;/h2&gt;

&lt;p&gt;For AI systems, I try to separate the work into two layers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Deterministic layer:
- detection
- thresholds
- state transitions
- deduplication
- severity scoring
- history checks

LLM layer:
- explanation
- summarization
- recommendation wording
- stakeholder-friendly language
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The deterministic layer decides what happened.&lt;/p&gt;

&lt;p&gt;The LLM explains it.&lt;/p&gt;

&lt;p&gt;That separation makes the system much easier to trust.&lt;/p&gt;

&lt;p&gt;And in production, trust matters more than making the architecture look “more AI.”&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>dataengineering</category>
      <category>marketinganalytics</category>
    </item>
  </channel>
</rss>
