<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Navaneeth K</title>
    <description>The latest articles on DEV Community by Navaneeth K (@cyberkunju).</description>
    <link>https://dev.to/cyberkunju</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3940391%2Fac680bd1-b89a-46c5-a611-64ac0fc33c15.png</url>
      <title>DEV Community: Navaneeth K</title>
      <link>https://dev.to/cyberkunju</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/cyberkunju"/>
    <language>en</language>
    <item>
      <title>Four modes, one cockpit: how I designed graceful degradation up front</title>
      <dc:creator>Navaneeth K</dc:creator>
      <pubDate>Tue, 19 May 2026 15:59:24 +0000</pubDate>
      <link>https://dev.to/cyberkunju/four-modes-one-cockpit-how-i-designed-graceful-degradation-up-front-3a00</link>
      <guid>https://dev.to/cyberkunju/four-modes-one-cockpit-how-i-designed-graceful-degradation-up-front-3a00</guid>
      <description>&lt;p&gt;The cockpit I built has to work in four different operating modes&lt;br&gt;
without code changes. Live cloud memory plus live model. Live cloud&lt;br&gt;
memory plus deterministic model. Local fallback memory plus live&lt;br&gt;
model. Local fallback memory plus deterministic model. Every&lt;br&gt;
combination has to render correctly, produce a complete audit&lt;br&gt;
trace, and not lie to the user about what's happening.&lt;/p&gt;

&lt;p&gt;This is the design choice I'm proudest of, because I made it on day&lt;br&gt;
one and it kept paying back through every demo, every CI run, and&lt;br&gt;
every connectivity hiccup since.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why four modes
&lt;/h2&gt;

&lt;p&gt;The two axes are the two external services the system depends on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory&lt;/strong&gt; — the agent memory layer is&lt;br&gt;
&lt;a href="https://hindsight.vectorize.io/" rel="noopener noreferrer"&gt;Hindsight&lt;/a&gt;, a managed cloud store&lt;br&gt;
for retain/recall/reflect operations. When the cloud is reachable&lt;br&gt;
and the API key is valid, recall returns memory matches. When it&lt;br&gt;
isn't, the system falls back to a local JSON store that ships with&lt;br&gt;
seed memories.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model&lt;/strong&gt; — the language model is called through cascadeflow's&lt;br&gt;
Groq adapter. With &lt;code&gt;CASCADEFLOW_LIVE_GROQ=true&lt;/code&gt; and a key, calls&lt;br&gt;
are real. With it false or no key, the same code path returns&lt;br&gt;
deterministic, prerecorded RCA output. No exceptions, no degraded&lt;br&gt;
shape, just the same response structure with &lt;code&gt;live_call=False&lt;/code&gt; set.&lt;/p&gt;

&lt;p&gt;Multiply the two and you get four quadrants:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Live model&lt;/th&gt;
&lt;th&gt;Deterministic model&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloud memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Live demo path&lt;/td&gt;
&lt;td&gt;Demo without API spend&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Local fallback&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Disconnected dev&lt;/td&gt;
&lt;td&gt;Hermetic CI&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each quadrant has a distinct failure mode if you don't plan for it.&lt;br&gt;
The hermetic CI quadrant is the one most people skip — they assume&lt;br&gt;
the cloud will be up — and that's the one that bites you the night&lt;br&gt;
before a demo.&lt;/p&gt;
&lt;h2&gt;
  
  
  The connection probe
&lt;/h2&gt;

&lt;p&gt;Both adapters are constructed in Streamlit's &lt;code&gt;@st.cache_resource&lt;/code&gt;&lt;br&gt;
scope, which means once-per-session. Each one runs a probe on&lt;br&gt;
construction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# incident_agent/memory.py
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;IncidentMemory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;base_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HINDSIGHT_BASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HINDSIGHT_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;bank_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HINDSIGHT_BANK_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openrecall&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fallback_mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;disconnected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;base_url&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_flip_to_fallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;missing HINDSIGHT_BASE_URL or HINDSIGHT_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_build_hindsight_client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="c1"&gt;# Cheap probe: list banks or fetch bank metadata
&lt;/span&gt;            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_probe&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;connected to Hindsight Cloud at &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; / bank &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bank_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_flip_to_fallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;probe failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;!s}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The probe does one cheap call. If it succeeds, the cloud client&lt;br&gt;
sticks around for the session. If it fails, &lt;code&gt;_flip_to_fallback&lt;/code&gt;&lt;br&gt;
fires.&lt;/p&gt;
&lt;h2&gt;
  
  
  The flip-to-fallback contract
&lt;/h2&gt;

&lt;p&gt;This is a single function with a single job, and it is the most&lt;br&gt;
important function in the memory module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# incident_agent/memory.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_flip_to_fallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;close&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getattr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;close&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;callable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;pass&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fallback_mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;local fallback active — &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things happen, in order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The cloud client gets &lt;code&gt;close()&lt;/code&gt; called if it exposes one.&lt;/strong&gt;
This releases the underlying HTTP connection cleanly so the
socket isn't sitting in TIME_WAIT.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The client reference is dropped to &lt;code&gt;None&lt;/code&gt;.&lt;/strong&gt; No retry on the
same client. If the cloud comes back mid-session, the system
stays on local fallback until the next session. This is
deliberate — flapping between cloud and local would produce
inconsistent memory recall results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The status string and the &lt;code&gt;fallback_mode&lt;/code&gt; flag are updated.&lt;/strong&gt;
The cockpit reads both.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The contract on this is encoded as a requirement: any failed&lt;br&gt;
&lt;code&gt;client.recall&lt;/code&gt; or &lt;code&gt;client.retain&lt;/code&gt; call must call this helper. No&lt;br&gt;
silent retries. No "let me just try the cloud one more time."&lt;br&gt;
The first failure flips the mode for the whole session.&lt;/p&gt;
&lt;h2&gt;
  
  
  What every code path has to look like
&lt;/h2&gt;

&lt;p&gt;Once you commit to this contract, every cloud call has the same&lt;br&gt;
shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MemoryMatch&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fallback_mode&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_recall_local&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bank_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_bank_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_to_match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_flip_to_fallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recall failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;!s}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_recall_local&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pattern is identical for retain. Try the cloud; on failure,&lt;br&gt;
flip and fall back; never let the exception escape to the workflow&lt;br&gt;
layer. The workflow layer can stay ignorant of which mode it's in,&lt;br&gt;
and that's the point — the bypass logic, the cost curve, the audit&lt;br&gt;
trace, none of them branch on cloud-vs-local.&lt;/p&gt;
&lt;h2&gt;
  
  
  The badges in the cockpit
&lt;/h2&gt;

&lt;p&gt;The cockpit renders three runtime badges in the hero card:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Memory state&lt;/strong&gt; — &lt;code&gt;Hindsight connected&lt;/code&gt; (green) or
&lt;code&gt;Local fallback active&lt;/code&gt; (amber)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Routing&lt;/strong&gt; — &lt;code&gt;Standard route&lt;/code&gt; (gray) or &lt;code&gt;Escalated route&lt;/code&gt;
(purple) — set by the most recent triage decision&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model output&lt;/strong&gt; — &lt;code&gt;Live Groq calls enabled&lt;/code&gt; (blue) or
&lt;code&gt;Deterministic model output&lt;/code&gt; (slate)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each badge reads one piece of session state. The first reads&lt;br&gt;
&lt;code&gt;mem.fallback_mode&lt;/code&gt;. The second reads the last &lt;code&gt;RouteTrace&lt;/code&gt; step's&lt;br&gt;
&lt;code&gt;route&lt;/code&gt;. The third reads the env flag plus the most recent&lt;br&gt;
&lt;code&gt;live_call&lt;/code&gt; field. The user can see in one glance which of the&lt;br&gt;
four quadrants they're operating in:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhzbvuapz95zbmz38kk2t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhzbvuapz95zbmz38kk2t.png" alt=" " width="799" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The screenshot above shows the cloud-memory + deterministic-model&lt;br&gt;
combination. If the cloud were unreachable, the first badge would&lt;br&gt;
be amber and say "Local fallback active." If the model env flag&lt;br&gt;
flipped to true, the third badge would say "Live Groq calls&lt;br&gt;
enabled."&lt;/p&gt;
&lt;h2&gt;
  
  
  The Demo Mode end-to-end test
&lt;/h2&gt;

&lt;p&gt;The most useful test in the suite checks that the hermetic&lt;br&gt;
quadrant — local memory, deterministic model — produces the same&lt;br&gt;
shape of output as the live quadrant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tests/property/test_workflow.py
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_demo_mode_end_to_end&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Feature: openrecall, Demo_Mode end-to-end pipeline.

    Local memory + deterministic model produces a complete
    AnalysisResult with full route_trace and audit_trace, with
    len(audit_trace) == len(route_trace).
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CASCADEFLOW_LIVE_GROQ&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;false&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HINDSIGHT_BASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:9&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# unreachable
&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_workflow_for_demo_mode&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;analyze&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SAMPLE_CRASHLOOP_ALERT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;incident&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error_type&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;route_trace&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;audit_trace&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;route_trace&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;live_call&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;step&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;route_trace&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This test runs in CI on every commit. The Hindsight base URL&lt;br&gt;
points at port 9 (the discard port; nothing is listening), so the&lt;br&gt;
probe fails fast and the system flips to local fallback. The Groq&lt;br&gt;
flag is off, so the model adapter returns deterministic output.&lt;br&gt;
Both axes are forced to their fallback positions and the workflow&lt;br&gt;
still has to produce a complete result.&lt;/p&gt;

&lt;p&gt;The first time I ran this test, it failed because I had a code&lt;br&gt;
path that returned an empty audit trace when the Groq adapter was&lt;br&gt;
deterministic. The fix was a one-liner — emit the audit entry&lt;br&gt;
unconditionally — but I would have shipped without it if the&lt;br&gt;
hermetic quadrant test hadn't existed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing I almost got wrong
&lt;/h2&gt;

&lt;p&gt;My first instinct was to make the &lt;code&gt;_flip_to_fallback&lt;/code&gt; helper&lt;br&gt;
attempt one retry against the cloud before giving up. I deleted&lt;br&gt;
that idea after thinking about what it would mean for the audit&lt;br&gt;
trace.&lt;/p&gt;

&lt;p&gt;If the system retries silently, the user has no way to know whether&lt;br&gt;
they're seeing cloud memory or local memory in their results. The&lt;br&gt;
audit trace would show &lt;code&gt;Hindsight Cloud&lt;/code&gt; recall steps that&lt;br&gt;
sometimes succeeded and sometimes returned local results, with no&lt;br&gt;
indication of which. That's a worse user experience than a clean&lt;br&gt;
flip.&lt;/p&gt;

&lt;p&gt;The clean flip costs you the cloud for the rest of the session if&lt;br&gt;
the cloud has a transient blip. That's a real cost. But it buys&lt;br&gt;
you a session-wide invariant: every recall in this session came&lt;br&gt;
from the same source. The user can read the badge and trust the&lt;br&gt;
output.&lt;/p&gt;

&lt;p&gt;I picked the invariant. If a session-wide flip is the wrong choice&lt;br&gt;
for someone else's use case, the helper is a single function in a&lt;br&gt;
single module and they can swap in retry logic. But for a triage&lt;br&gt;
co-pilot where the analyst has to be able to trace decisions, I&lt;br&gt;
wanted the predictability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;When your agent depends on two external services, you have four&lt;br&gt;
modes whether you planned for them or not. Plan for them. Pick a&lt;br&gt;
fallback contract — close, drop, flag — and apply it identically&lt;br&gt;
to every external call. Render the mode in the UI so the user&lt;br&gt;
always knows which quadrant they're in. Test the worst-case&lt;br&gt;
quadrant in CI on every commit.&lt;/p&gt;

&lt;p&gt;The&lt;br&gt;
&lt;a href="https://github.com/vectorize-io/hindsight" rel="noopener noreferrer"&gt;Hindsight&lt;/a&gt; memory&lt;br&gt;
client and the&lt;br&gt;
&lt;a href="https://github.com/lemony-ai/cascadeflow" rel="noopener noreferrer"&gt;cascadeflow&lt;/a&gt; routing&lt;br&gt;
layer both expose enough metadata to do this cleanly. The&lt;br&gt;
&lt;a href="https://vectorize.io/what-is-agent-memory" rel="noopener noreferrer"&gt;Vectorize agent-memory overview&lt;/a&gt;&lt;br&gt;
explains why memory has to be auditable in the first place; the&lt;br&gt;
&lt;a href="https://docs.cascadeflow.ai/" rel="noopener noreferrer"&gt;cascadeflow docs&lt;/a&gt; cover the live-&lt;br&gt;
vs-deterministic switch on the model side.&lt;/p&gt;

&lt;p&gt;Code is at &lt;a href="https://github.com/Dawn-Fighter/openrecall" rel="noopener noreferrer"&gt;https://github.com/Dawn-Fighter/openrecall&lt;/a&gt;&lt;br&gt;
The four-quadrant mode matrix is&lt;br&gt;
documented in &lt;code&gt;docs/ARCHITECTURE.md&lt;/code&gt; and the&lt;br&gt;
&lt;code&gt;_flip_to_fallback&lt;/code&gt; helper is in &lt;code&gt;incident_agent/memory.py&lt;/code&gt;. If&lt;br&gt;
you copy one pattern from this project for your own agent, copy&lt;br&gt;
that one.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>agents</category>
      <category>monitoring</category>
      <category>python</category>
    </item>
  </channel>
</rss>
