<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gehini Busarapalli</title>
    <description>The latest articles on DEV Community by Gehini Busarapalli (@gehini).</description>
    <link>https://dev.to/gehini</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3938045%2F9a2c4cb1-eab9-4bf7-8409-f987e0bfd655.webp</url>
      <title>DEV Community: Gehini Busarapalli</title>
      <link>https://dev.to/gehini</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gehini"/>
    <language>en</language>
    <item>
      <title>I Used Hindsight to Make My Groq Agent Decisions Auditable — Here's What That Actually Looks Like</title>
      <dc:creator>Gehini Busarapalli</dc:creator>
      <pubDate>Mon, 18 May 2026 12:44:59 +0000</pubDate>
      <link>https://dev.to/gehini/i-used-hindsight-to-make-my-groq-agent-decisions-auditable-heres-what-that-actually-looks-like-3nnj</link>
      <guid>https://dev.to/gehini/i-used-hindsight-to-make-my-groq-agent-decisions-auditable-heres-what-that-actually-looks-like-3nnj</guid>
      <description>&lt;p&gt;The hardest part of running LLMs inside a production pipeline isn't the inference. It's figuring out, three hours later, why the model classified a specific user as AT_RISK when you expected POWER_USER, and what that decision caused downstream. Groq gives you fast inference. It doesn't give you memory. I added &lt;a href="https://github.com/vectorize-io/hindsight" rel="noopener noreferrer"&gt;Hindsight&lt;/a&gt; to fill that gap, and the difference between debugging with it and without it is large enough that I'd wire it in before writing a single prompt.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitt4ciewxydev4wtqvu3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitt4ciewxydev4wtqvu3.png" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Pipeline Looks Like From the LLM's Perspective
&lt;/h2&gt;

&lt;p&gt;VORTEX uses Groq (llama3-70b-8192) in two places: Agent 2 classifies user intent and produces a score from 0–100, and Agent 3 generates a personalized email draft. Both agents receive a structured activity atom as input and return structured JSON as output.&lt;/p&gt;

&lt;p&gt;Agent 2's prompt looks roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Agent 2 — Intent Architect: Groq call&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`
You are an intent classification engine for a B2B SaaS product.

Given this user activity atom, return a JSON object with:
- intent_score: integer 0-100
- tier: "POWER_USER" | "AT_RISK" | "PASSIVE"  
- urgency: "HIGH" | "MEDIUM" | "LOW"
- primary_pain: string describing the core friction point

Activity atom:
&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;atom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;

Return only valid JSON. No explanation.
`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://api.groq.com/openai/v1/chat/completions&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;Authorization&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Bearer &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;GROQ_API_KEY&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;llama3-70b-8192&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// low temperature for consistent scoring&lt;/span&gt;
    &lt;span class="na"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;temperature: 0.1&lt;/code&gt; is the most important parameter here. Intent scoring needs to be deterministic — the same activity atom should produce the same score on repeated runs. High temperature introduces variance that makes the scores unreliable as routing inputs. With 0.1, the model is consistent enough that Agent 7's threshold logic (&lt;code&gt;≥ 80 = HOT_LEAD&lt;/code&gt;) produces predictable outcomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Hindsight Solves
&lt;/h2&gt;

&lt;p&gt;Before &lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;Hindsight&lt;/a&gt;, when a lead didn't get a Slack alert, my debugging process was:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check Firestore &lt;code&gt;leads&lt;/code&gt; collection — see the current &lt;code&gt;status&lt;/code&gt; field&lt;/li&gt;
&lt;li&gt;Check &lt;code&gt;activity_feed&lt;/code&gt; collection — see which agents fired&lt;/li&gt;
&lt;li&gt;Check &lt;code&gt;agent_logs&lt;/code&gt; collection — read each agent's output by timestamp&lt;/li&gt;
&lt;li&gt;Manually reconstruct the sequence from three separate documents&lt;/li&gt;
&lt;li&gt;Realize the timestamps are in different formats across agents&lt;/li&gt;
&lt;li&gt;Give up and re-trigger the event to watch it live&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This process took 15–20 minutes for a simple routing failure. The root issue was that I had logs, not memory. Logs tell you what happened in isolation. &lt;a href="https://vectorize.io/what-is-agent-memory" rel="noopener noreferrer"&gt;Agent memory&lt;/a&gt; tells you what happened in sequence, causally linked, for a specific input.&lt;/p&gt;

&lt;p&gt;Hindsight stores each agent's contribution keyed by lead ID, in order, with the full input and output at each step. Reconstructing the chain for any lead is one query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Retrieve full decision chain for a lead&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;hindsight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;leadId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Returns ordered array of agent decisions:&lt;/span&gt;
&lt;span class="c1"&gt;// [&lt;/span&gt;
&lt;span class="c1"&gt;//   { agent: 'behavioral_scout', input: atom, output: atom, timestamp },&lt;/span&gt;
&lt;span class="c1"&gt;//   { agent: 'intent_architect', input: atom, output: { intent_score: 91, tier: 'POWER_USER' }, timestamp },&lt;/span&gt;
&lt;span class="c1"&gt;//   { agent: 'persona_scriptwriter', input: scoredLead, output: { subject, body }, timestamp },&lt;/span&gt;
&lt;span class="c1"&gt;//   { agent: 'executive_router', input: scoredLead, output: { tier: 'HOT_LEAD', actions: ['slack', 'email'] }, timestamp }&lt;/span&gt;
&lt;span class="c1"&gt;// ]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the full reasoning chain, in order, with inputs and outputs at every step. What used to take 20 minutes takes seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Debate Log Actually Shows
&lt;/h2&gt;

&lt;p&gt;The Hindsight chain surfaces in the dashboard as the Debate Log — a terminal-style view that replays each agent's contribution for a selected lead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// AgentActivity.jsx — maps Hindsight chain to display lines&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;allLines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;DEBATE_LOG&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatMap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;block&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;isHeader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="nx"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agentName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;time&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="nx"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;time&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="nx"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a lead with &lt;code&gt;intent_score: 91&lt;/code&gt; — a user who hit their API export limit after a 47-minute session with 3 teammates invited — the Debate Log renders this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;[14:03:01] &amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;AGT-01 — BEHAVIORAL SCOUT
&lt;span class="go"&gt;  Event: api_limit_hit · Session: 47 min
  API calls today: 98 / 100
  Teammates invited: 3 — adoption signal detected
  → Routing to Executive Router

&lt;/span&gt;&lt;span class="gp"&gt;[14:03:02] &amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;AGT-02 — INTENT ARCHITECT  
&lt;span class="go"&gt;  Invoking Groq llama3-70b-8192...
  Classification: POWER_USER
  Intent score: 91 / 100 · Urgency: HIGH
  Primary pain: USAGE_LIMIT

&lt;/span&gt;&lt;span class="gp"&gt;[14:03:03] &amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;AGT-03 — PERSONA SCRIPTWRITER
&lt;span class="go"&gt;  Subject: You hit the Data Export limit — here's how to unblock your team
  Body: 178 words · Personalization tokens: 4

&lt;/span&gt;&lt;span class="gp"&gt;[14:03:05] &amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;AGT-07 — EXECUTIVE ROUTER
&lt;span class="go"&gt;  Score 91 ≥ 80 → HOT_LEAD tier
&lt;/span&gt;&lt;span class="gp"&gt;  ✓ Slack fired → #&lt;/span&gt;sales
&lt;span class="go"&gt;  ✓ Email queued for approval
  Pipeline complete · 4.2s
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every line in this output came from Hindsight. The sales rep reading the Slack alert can pull up this log and see exactly why they're being notified about this lead, what the model saw, and what it decided.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feksr4ev9b4o4nxffge2h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feksr4ev9b4o4nxffge2h.png" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Mistake: Storing Predictions Instead of Inputs
&lt;/h2&gt;

&lt;p&gt;My first Hindsight integration stored the model's output, not its input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Wrong — stores the prediction, not what produced it&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;hindsight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;leadId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;intent_architect&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;intent_score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;intent_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;tier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useless for debugging. When the score is wrong, you need to know what the model saw — the full activity atom — not just what it returned. Storing only the output tells you the model was wrong. It doesn't tell you why.&lt;/p&gt;

&lt;p&gt;The correct version stores the full input alongside the output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Correct — stores input + output so you can reconstruct the reasoning&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;hindsight&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;leadId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;intent_architect&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;atom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;               &lt;span class="c1"&gt;// what the model received&lt;/span&gt;
    &lt;span class="na"&gt;prompt_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;completion_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completion_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;intent_score&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;intent_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;tier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tier&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;urgency&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;urgency&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;primary_pain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;primary_pain&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;llama3-70b-8192&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now when a score looks wrong, I can pull the input atom and re-run the prompt manually against Groq to reproduce the result. Without the input stored, reproduction is impossible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Groq Specifically
&lt;/h2&gt;

&lt;p&gt;The latency profile matters for this architecture. Agent 7 calls Agent 2 and Agent 3 synchronously — it waits for both before updating Firestore and firing Slack. If each LLM call takes 8–10 seconds (typical for a large model on a slower provider), the total pipeline time for a HOT lead is 20+ seconds. That's too slow to feel real-time on the dashboard.&lt;/p&gt;

&lt;p&gt;Groq's inference for llama3-70b-8192 runs at roughly 800 tokens per second. Agent 2's completion is under 200 tokens. Agent 3's email draft is under 250 tokens. End-to-end LLM time is around 1.5–2 seconds per call, which keeps total pipeline time under 5 seconds.&lt;/p&gt;

&lt;p&gt;The tradeoff: Groq is fast but the model selection is limited compared to providers like Anthropic or OpenAI. For intent classification and email generation at low temperature, llama3-70b-8192 is sufficient. For tasks requiring more nuanced reasoning or longer context, you'd want to evaluate other options — but you'd also be accepting higher latency.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zbk2rxw6i1nugxfzk5m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zbk2rxw6i1nugxfzk5m.png" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Prompt Engineering for Consistent JSON Output
&lt;/h2&gt;

&lt;p&gt;Getting Groq to return valid, parseable JSON consistently required a few specific practices:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explicit schema in the prompt.&lt;/strong&gt; Describing the exact field names and types in the prompt reduces hallucinated field names:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Good — explicit schema&lt;/span&gt;
&lt;span class="s2"&gt;`Return a JSON object with exactly these fields:
{ "intent_score": &amp;lt;integer 0-100&amp;gt;, "tier": &amp;lt;"POWER_USER"|"AT_RISK"|"PASSIVE"&amp;gt;, ... }`&lt;/span&gt;

&lt;span class="c1"&gt;// Bad — vague&lt;/span&gt;
&lt;span class="s2"&gt;`Analyze this and return a JSON with the user's intent score.`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;"Return only valid JSON. No explanation."&lt;/strong&gt; Without this instruction, llama3-70b-8192 frequently wraps the JSON in markdown code fences or adds a preamble sentence. The instruction eliminates both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Low temperature (0.1).&lt;/strong&gt; Reduces variance in field naming and value ranges. At higher temperatures, the model occasionally returns &lt;code&gt;"score"&lt;/code&gt; instead of &lt;code&gt;"intent_score"&lt;/code&gt;, or returns a string where you expected an integer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Parse defensively in the Code node:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Agent 2 — Parse Intent (n8n Code node)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;$input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Strip markdown fences if present despite instructions&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/``&lt;/span&gt;&lt;span class="err"&gt;`
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;json&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="nx"&gt;n&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nx"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="s2"&gt;```/g, '').trim();

let parsed;
try {
  parsed = JSON.parse(cleaned);
} catch (e) {
  // Log to Hindsight before throwing
  await hindsight.store({ key: leadId, agent: 'intent_architect', data: { error: e.message, raw } });
  throw new Error(`&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt; &lt;span class="nx"&gt;parse&lt;/span&gt; &lt;span class="nx"&gt;failed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`);
}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Logging parse failures to Hindsight before throwing means failed runs are still queryable. Without this, a JSON parse error produces a gap in the decision chain that's invisible unless you're watching the n8n execution log in real time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Add Next
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Hindsight namespaces per agent.&lt;/strong&gt; Currently all entries go into a single workspace queryable by lead ID. Fleet-level analytics — how often does Agent 2 return each tier, what's the score distribution, how has it changed as the prompt evolved — require per-agent namespaces. That's the next thing I'd add.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Input hashing for prompt caching.&lt;/strong&gt; Groq supports prompt caching for repeated prefixes. If two leads have identical activity atoms — same event type, same feature, same score — the Groq call is redundant. Hashing the input atom and checking a cache before calling Groq would reduce both latency and API costs for common event patterns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured evals against the Hindsight log.&lt;/strong&gt; Every stored input/output pair in Hindsight is a potential eval case. Running the current prompt against historical inputs and comparing outputs is how you know whether a prompt change improved or regressed classification quality. Right now that comparison is manual.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Store inputs, not just outputs.&lt;/strong&gt; The output tells you what the model decided. The input tells you why. Without both, failed runs are not reproducible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Low temperature is not optional for scoring pipelines.&lt;/strong&gt; If the model's output drives routing decisions, variance is a bug. &lt;code&gt;temperature: 0.1&lt;/code&gt; keeps the scoring consistent enough to trust the thresholds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Log parse failures to memory before throwing.&lt;/strong&gt; A JSON parse error is not just a code failure — it's a data point about prompt reliability. Storing it in &lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;Hindsight&lt;/a&gt; means you can query how often it happens and under what inputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit trails are not optional when LLMs make decisions.&lt;/strong&gt; A sales rep acting on a Slack alert needs to understand why they're being alerted. A developer debugging a misclassified lead needs to reproduce the model's reasoning. &lt;a href="https://vectorize.io/what-is-agent-memory" rel="noopener noreferrer"&gt;Agent memory&lt;/a&gt; is what makes both possible.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Fast inference without observability is just fast failure. Groq gets the pipeline under 5 seconds end-to-end. Hindsight makes every decision in that pipeline inspectable, reproducible, and queryable by lead ID. The combination is what makes the system trustworthy enough to act on — not just fast enough to impress in a demo.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>monitoring</category>
    </item>
  </channel>
</rss>
