<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: vishal-dehurdle</title>
    <description>The latest articles on DEV Community by vishal-dehurdle (@vishaldehurdle).</description>
    <link>https://dev.to/vishaldehurdle</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3964596%2F428fa4af-da0e-4e15-8b7d-c7e926e86cd1.jpeg</url>
      <title>DEV Community: vishal-dehurdle</title>
      <link>https://dev.to/vishaldehurdle</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vishaldehurdle"/>
    <language>en</language>
    <item>
      <title>I Used Lyapunov Stability Theory to Monitor LLM Agents — Here's What Actually Worked and What Didn't</title>
      <dc:creator>vishal-dehurdle</dc:creator>
      <pubDate>Tue, 02 Jun 2026 12:12:56 +0000</pubDate>
      <link>https://dev.to/vishaldehurdle/i-used-lyapunov-stability-theory-to-monitor-llm-agents-heres-what-actually-worked-and-what-didnt-4i71</link>
      <guid>https://dev.to/vishaldehurdle/i-used-lyapunov-stability-theory-to-monitor-llm-agents-heres-what-actually-worked-and-what-didnt-4i71</guid>
      <description>&lt;h2&gt;
  
  
  The Elephant in the Room: "Isn't This Just max_iterations?"
&lt;/h2&gt;

&lt;p&gt;Let me address this up front.&lt;/p&gt;

&lt;p&gt;If you're building a ReAct loop with a single LLM and 10 tool calls, you do not need a physics-inspired monitoring library. Set &lt;code&gt;max_iterations=10&lt;/code&gt;, add a budget cap, and move on. LangGraph, CrewAI, and every modern agent framework already support this natively.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;state-harness&lt;/strong&gt; because I ran into a problem that &lt;code&gt;max_iterations&lt;/code&gt; doesn't solve. And after benchmarking it across 2,367 runs, I also learned what it &lt;em&gt;can't&lt;/em&gt; do — which I'll be equally transparent about.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem That max_iterations Doesn't Solve
&lt;/h2&gt;

&lt;p&gt;There are two specific scenarios where simple iteration caps fall short:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Search-Tree Agents (MCTS, Beam Search)
&lt;/h3&gt;

&lt;p&gt;Advanced coding agents — the kind that solve SWE-bench tasks, or the architecture behind tools like Devin — don't run a flat loop. They explore a &lt;strong&gt;search tree&lt;/strong&gt;. Each node branches into multiple candidate solutions. A node that spirals doesn't just waste one turn; it inflates every downstream branch.&lt;/p&gt;

&lt;p&gt;In a 50-node search tree, you can't set &lt;code&gt;max_iterations=50&lt;/code&gt; and call it a day. The agent isn't iterating — it's &lt;em&gt;branching&lt;/em&gt;. Token usage grows quadratically. A single stuck branch can burn thousands of tokens before the tree-level budget cap even notices, because the per-branch cost looks normal in isolation.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Failure Pattern Aggregation at Scale
&lt;/h3&gt;

&lt;p&gt;If you run 100 agent tasks a day, you open LangSmith, look at the traces of the 5 that failed, and debug them manually. That works.&lt;/p&gt;

&lt;p&gt;If you run &lt;strong&gt;10,000+ tasks a day&lt;/strong&gt;, manual trace inspection is impossible. Your observability bill alone (storing and indexing millions of multi-turn traces) becomes significant. What you actually need is: classify the failure pattern at the edge, at zero cost, and export it as a structured attribute to your metrics pipeline. Then your Grafana dashboard shows: &lt;em&gt;"This week, 40% of failures are retry storms on the SQL tool → add exponential backoff."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's not something &lt;code&gt;max_iterations&lt;/code&gt; gives you. It's not something LangSmith gives you (at least not without paying for indexing every trace). It's what state-harness was designed for.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Insight: Growth-Ratio Normalization
&lt;/h2&gt;

&lt;p&gt;In physics, &lt;a href="https://en.wikipedia.org/wiki/Lyapunov_stability" rel="noopener noreferrer"&gt;Lyapunov stability&lt;/a&gt; determines whether a dynamical system will return to equilibrium or diverge.&lt;/p&gt;

&lt;p&gt;I modeled LLM agent token consumption as a dynamical system where the "energy" V(k) is a function of cumulative token growth. The stability criterion is straightforward: if the energy derivative ΔV ≥ 0 for consecutive steps, the system is diverging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; In any multi-turn conversation, token usage grows monotonically because the context window accumulates history. A naive Lyapunov monitor would trip on every healthy conversation — you'd get 100% false positives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The solution:&lt;/strong&gt; Instead of monitoring raw token counts, normalize each turn against a running baseline to compute a &lt;strong&gt;growth ratio&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Growth ratio ≈ 1.0 → the agent is consuming tokens at its expected rate (stable)&lt;/li&gt;
&lt;li&gt;Growth ratio &amp;gt; 2.0× for 3+ consecutive turns → the agent is consuming disproportionately more each turn (diverging)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This normalization is the key insight. It's analogous to the distinction between &lt;a href="https://en.wikipedia.org/wiki/Intensive_and_extensive_properties" rel="noopener noreferrer"&gt;intensive and extensive quantities&lt;/a&gt; in thermodynamics — monitoring density (ratio) rather than mass (absolute count).&lt;/p&gt;




&lt;h2&gt;
  
  
  Integration: 5 Lines of Code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;state_harness&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GrowthRatioGuard&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FailureReport&lt;/span&gt;

&lt;span class="n"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GrowthRatioGuard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token_budget&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;50_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;turn&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agent_loop&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;record_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tokens_used&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_tokens&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FailureReport&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_guard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the guard trips, the diagnostic report classifies the failure pattern — at zero cost, with no LLM calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;⚠️  STABILITY TRIPPED at turn 12

Pattern: Context Accumulation Spiral (confidence: 92%)
  • Last 5 turns all exceeded 1.5× baseline (4/4 were accelerating).
  • Peak growth ratio: 5.2× baseline.
  • Without intervention, projected cost was $0.0396 (actual: $0.0039).

Suggested actions:
  🔴 1. Enable history compression in your agent loop.
  🟡 2. Lower the growth ratio threshold to 1.8×.
  🟢 3. Add a sliding-window context strategy.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The classified pattern and suggested actions export cleanly to OpenTelemetry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;opentelemetry&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;
&lt;span class="n"&gt;span&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_current_span&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attributes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_otel_attributes&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="c1"&gt;# Adds: state_harness.pattern, state_harness.confidence, etc.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Framework Integrations
&lt;/h3&gt;

&lt;p&gt;LangGraph:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.prebuilt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_react_agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;state_harness.adapters&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;monitor_graph&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_react_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;calculate&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;safe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;monitor_graph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;token_budget&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;safe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Fix the login bug&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]})&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;safe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CrewAI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;state_harness.adapters&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CrewAICallback&lt;/span&gt;

&lt;span class="n"&gt;callback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CrewAICallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token_budget&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;crew&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Crew&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...],&lt;/span&gt; &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...],&lt;/span&gt; &lt;span class="n"&gt;step_callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;step_callback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;kickoff&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Benchmarks: What Worked and What Didn't
&lt;/h2&gt;

&lt;p&gt;We evaluated state-harness across &lt;strong&gt;2,367 total runs&lt;/strong&gt; with a 5-condition ablation study on three benchmarks.&lt;/p&gt;

&lt;h3&gt;
  
  
  What worked: Zero false positives on stable tasks
&lt;/h3&gt;

&lt;p&gt;Across &lt;strong&gt;1,136 MINT runs&lt;/strong&gt; (short-loop reasoning) and &lt;strong&gt;750 τ³-bench runs&lt;/strong&gt; (medium-loop customer service), state-harness &lt;strong&gt;never tripped once&lt;/strong&gt;. The growth-ratio normalization correctly identified these as stable conversations and introduced &amp;lt;2% token overhead.&lt;/p&gt;

&lt;p&gt;This is the most important result. A monitoring tool that interferes with healthy agents is worse than no monitoring at all.&lt;/p&gt;

&lt;h3&gt;
  
  
  What worked: Compute savings on search trees
&lt;/h3&gt;

&lt;p&gt;On &lt;strong&gt;SWE-bench Verified&lt;/strong&gt; (37 Django instances, Moatless-tools SearchTree agent, Gemini 2.5 Flash):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Condition&lt;/th&gt;
&lt;th&gt;Compute (nodes)&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A. Baseline (no monitoring)&lt;/td&gt;
&lt;td&gt;945&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;B. + Lyapunov monitor only&lt;/td&gt;
&lt;td&gt;620&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;34.4%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D. Full-stack (Lyapunov+RG+VSA)&lt;/td&gt;
&lt;td&gt;580&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;38.6%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The monitor eliminated all max-budget burnout events (7 tasks hitting the 50-node ceiling → 0) and reduced wall time by 30%.&lt;/p&gt;

&lt;h3&gt;
  
  
  What didn't work: Improving resolve rates
&lt;/h3&gt;

&lt;p&gt;This is the honest part that most open-source projects would hide.&lt;/p&gt;

&lt;p&gt;We ran 3 independent trials per condition (333 total runs) to measure nondeterminism:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Condition&lt;/th&gt;
&lt;th&gt;Mean ± σ&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;A. Baseline&lt;/td&gt;
&lt;td&gt;44.1% ± 4.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;D. Full-stack&lt;/td&gt;
&lt;td&gt;40.5% ± 2.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E. Naive Cap&lt;/td&gt;
&lt;td&gt;45.9% ± 5.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;A naive budget cap achieves comparable resolve rates.&lt;/strong&gt; The cross-condition variance (2.9%) is smaller than the within-condition nondeterminism (4.1%). state-harness doesn't make agents smarter — it makes failures &lt;em&gt;diagnosable&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Bonus finding: The nondeterminism floor
&lt;/h3&gt;

&lt;p&gt;Both τ³-bench and SWE-bench converged on a &lt;strong&gt;~4–5% intrinsic nondeterminism floor&lt;/strong&gt; for Gemini 2.5 Flash on code tasks. This means any single-run benchmark comparison reporting performance deltas under 8% is statistically unreliable. If you see a paper claiming "our agent is 6% better," ask them how many trials they ran.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three Mechanisms (and an Honest Ablation)
&lt;/h2&gt;

&lt;p&gt;state-harness has three components, all written in Rust (via PyO3):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Lyapunov Monitor&lt;/strong&gt; (~1μs/step): The growth-ratio energy function described above.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RG Decimator&lt;/strong&gt; (~100μs/compress): TF-IDF-based history compression inspired by Renormalization Group theory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Holographic Engine&lt;/strong&gt; (~10μs/check): VSA-based semantic drift detection using 10,000-dimensional bipolar vectors.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The honest ablation result:&lt;/strong&gt; Lyapunov alone delivers &lt;strong&gt;~90% of the total benefit&lt;/strong&gt; (34.4% out of 38.6%). RG and VSA add incremental value. If you want maximum simplicity, just use the &lt;code&gt;GrowthRatioGuard&lt;/code&gt; with default settings and ignore the rest.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who Should (and Shouldn't) Use This
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;If you're...&lt;/th&gt;
&lt;th&gt;Use state-harness?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Building a chatbot or RAG pipeline&lt;/td&gt;
&lt;td&gt;❌ No. These don't spiral.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Running a simple ReAct agent (&amp;lt;10 turns)&lt;/td&gt;
&lt;td&gt;❌ No. &lt;code&gt;max_iterations&lt;/code&gt; is enough.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Running coding/DevOps agents with search trees&lt;/td&gt;
&lt;td&gt;✅ Yes. Branch explosion is real.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Running 1000+ agent tasks/day in production&lt;/td&gt;
&lt;td&gt;✅ Yes. Edge-classified failure patterns at zero cost.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Benchmarking agents and publishing results&lt;/td&gt;
&lt;td&gt;✅ Yes. The nondeterminism floor matters.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/vishal-dehurdle/state-harness" rel="noopener noreferrer"&gt;github.com/vishal-dehurdle/state-harness&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyPI:&lt;/strong&gt; &lt;code&gt;pip install state-harness&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research Paper:&lt;/strong&gt; &lt;a href="https://vishalvermalabs.com/papers/empirical-lyapunov-stability-agent-failure" rel="noopener noreferrer"&gt;vishalvermalabs.com/papers/empirical-lyapunov-stability-agent-failure&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Built as a research project exploring whether control theory can provide useful runtime guarantees for stochastic software. If you're running agents at scale and want zero-cost failure diagnostics — or if you're just curious about applying physics to AI systems — I'd love your feedback.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>rust</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
