<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jiří Joneš</title>
    <description>The latest articles on DEV Community by Jiří Joneš (@ghostfactory).</description>
    <link>https://dev.to/ghostfactory</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3978256%2Fa0880912-9473-45ec-b8fe-4cc0c7bc6e03.png</url>
      <title>DEV Community: Jiří Joneš</title>
      <link>https://dev.to/ghostfactory</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ghostfactory"/>
    <language>en</language>
    <item>
      <title>How to debug AI agent failures for $0 - VCR Cassette Replay explained</title>
      <dc:creator>Jiří Joneš</dc:creator>
      <pubDate>Tue, 16 Jun 2026 10:28:01 +0000</pubDate>
      <link>https://dev.to/ghostfactory/how-to-debug-ai-agent-failures-for-0-vcr-cassette-replay-explained-58ec</link>
      <guid>https://dev.to/ghostfactory/how-to-debug-ai-agent-failures-for-0-vcr-cassette-replay-explained-58ec</guid>
      <description>&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Your agent failed. So you run it again with the exact same inputs.&lt;/p&gt;

&lt;p&gt;It succeeds. Or it fails differently. You are chasing a Heisenbug.&lt;/p&gt;

&lt;p&gt;This is the fundamental problem with debugging AI systems. LLM sampling is stochastic. Tool calls hit live APIs. External state changes between runs. Re-running is not replaying. The original execution context is gone forever.&lt;/p&gt;

&lt;p&gt;There is also the cost trap. Every debug retry is another live LLM API call. If your agent makes ten reasoning steps and calls GPT-4o or Claude Sonnet five times per session, debugging a single logic error burns real API credits with zero guarantee you will reproduce the original failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The VCR Cassette pattern
&lt;/h2&gt;

&lt;p&gt;To escape the cost trap and defeat non-determinism, stop re-running live models. Use the VCR Cassette pattern instead.&lt;/p&gt;

&lt;p&gt;A cassette is a database-backed, immutable snapshot of the exact payload stream from a real agent run. When your agent executes in production, it generates a tree of spans — LLM calls, tool executions, reasoning steps. A cassette captures all raw payloads: exact inputs sent to the LLM, exact outputs returned, millisecond precision.&lt;/p&gt;

&lt;p&gt;The pattern splits debugging into two phases: &lt;strong&gt;record&lt;/strong&gt; and &lt;strong&gt;replay&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;During replay, the system feeds the exact historical payload stream back from the database. The live LLM is never called. External tools are never executed. The result is strictly deterministic — same span data, stochastic model completely bypassed.&lt;/p&gt;

&lt;p&gt;The analogy is a VCR from the 1980s. When you record a sports match, you capture the state of reality. You can rewind and replay it as many times as you want. The athletes are not playing again — you are watching the deterministic tape. Cassette replay brings this exact mechanic to AI agent debugging.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Span Chain implements it
&lt;/h2&gt;

&lt;p&gt;In Span Chain, the VCR Cassette pattern is a first-class citizen of the architecture, implemented in the Elixir/OTP backend.&lt;/p&gt;

&lt;p&gt;Recording is handled by &lt;code&gt;Cassettes.record(run_id)&lt;/code&gt;. The backend reads all payload rows for the specified run from the Ledger, ordered by &lt;code&gt;epoch_id&lt;/code&gt; and &lt;code&gt;seq&lt;/code&gt;, and inserts a &lt;code&gt;%Cassette{}&lt;/code&gt; snapshot. Span Chain uses a payload-first principle: raw, unadulterated payload maps — no truncation, no loss of nested data.&lt;/p&gt;

&lt;p&gt;The replay path uses the exact same ingestion pipeline as live traffic. Data flows through &lt;code&gt;SessionGenServer&lt;/code&gt; (which computes the hash), into the &lt;code&gt;BufferProducer&lt;/code&gt; queue, through the Broadway pipeline, and into &lt;code&gt;Ledger.insert_batch&lt;/code&gt;. Because it runs under a brand new &lt;code&gt;run_id&lt;/code&gt;, it computes a fresh SHA-256 hash chain. Once replay finishes, the system calls &lt;code&gt;verify_ledger&lt;/code&gt; automatically. If the ingestion is clean, you get &lt;code&gt;hash_valid: true&lt;/code&gt; — cryptographic proof that the replay is structurally sound.&lt;/p&gt;

&lt;p&gt;Finally, the &lt;code&gt;Evals.Comparator&lt;/code&gt; performs a structural tree-diff between the source run and the replay. It pairs spans by name and sibling position, flags &lt;code&gt;span_added&lt;/code&gt;, &lt;code&gt;span_removed&lt;/code&gt;, &lt;code&gt;duration_diff&lt;/code&gt;, and marks the exact &lt;code&gt;deviation_point&lt;/code&gt; — the first divergent span in every branch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Instrumenting your agent
&lt;/h2&gt;

&lt;p&gt;Use the Span Chain Python SDK. It is intentionally dumb — just an OTLP exporter. All cryptographic sequencing happens server-side.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ghostfactory&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;gf&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;gf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:4000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GF_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@gf.trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agent_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;span&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_attributes&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once spans are flushed, trigger a replay:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:4001/api/cassettes/&amp;lt;cassette_id&amp;gt;/replay &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;your_token&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response is an immediate &lt;code&gt;202 Accepted&lt;/code&gt; with a &lt;code&gt;job_id&lt;/code&gt;. Poll until &lt;code&gt;completed&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"completed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"result"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"run_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"replay-abc-123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"span_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"hash_valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"diff"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"duration_diff"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"span_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llm_call"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"deviation_point"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"val_a"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"val_b"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The difference
&lt;/h2&gt;

&lt;p&gt;Without cassette replay: every retry is a live LLM call, non-deterministic, costs money, and structural deviations between runs are invisible. You are reading plain JSON logs and guessing.&lt;/p&gt;

&lt;p&gt;With Span Chain: replay reads from the historical cassette. No live APIs hit. Cost is $0. The Comparator gives you an explicit structural diff with the exact &lt;code&gt;deviation_point&lt;/code&gt;. And because replay flows through the real pipeline, it generates its own SHA-256 hash chain — &lt;code&gt;hash_valid: true&lt;/code&gt;. Your debug session leaves a tamper-evident audit trail.&lt;/p&gt;

&lt;p&gt;Stop guessing what your agent did. Record the reality, replay it for free, prove it cryptographically.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Span Chain is MIT licensed and self-hosted. &lt;code&gt;git clone&lt;/code&gt;, set &lt;code&gt;POSTGRES_PASSWORD&lt;/code&gt; and &lt;code&gt;GF_API_KEY&lt;/code&gt; in &lt;code&gt;.env&lt;/code&gt;, then &lt;code&gt;docker compose up&lt;/code&gt;. The repo is at &lt;a href="https://github.com/ghostfactory-art/spanchain" rel="noopener noreferrer"&gt;github.com/ghostfactory-art/spanchain&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: &lt;code&gt;spanchain&lt;/code&gt; will be on PyPI shortly. For now: &lt;code&gt;pip install ./sdk/python&lt;/code&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>python</category>
      <category>opensource</category>
      <category>ai</category>
    </item>
    <item>
      <title>Why your AI agent logs are not evidence and what to do about it</title>
      <dc:creator>Jiří Joneš</dc:creator>
      <pubDate>Fri, 12 Jun 2026 14:51:47 +0000</pubDate>
      <link>https://dev.to/ghostfactory/why-your-ai-agent-logs-are-not-evidence-and-what-to-do-about-it-4j61</link>
      <guid>https://dev.to/ghostfactory/why-your-ai-agent-logs-are-not-evidence-and-what-to-do-about-it-4j61</guid>
      <description>&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Your agent failed in production. You look at the logs. They don't give you the full picture. So you run the agent again with the exact same inputs. It succeeds. Or it fails differently. Classic.&lt;/p&gt;

&lt;p&gt;LLM calls, time-dependent code, tool side effects, and stochastic sampling mean "same inputs, same outputs" is completely false for AI systems. You have no idea what actually happened in the first run. The original context is gone, and re-running is not replaying.&lt;/p&gt;

&lt;p&gt;This is the problem Span Chain was built to solve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Logs vs evidence
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Logs are claims. Not evidence.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A standard log or trace is just a JSON blob. A buggy retention job can orphan a span. An attacker can rewrite it. The agent itself might hallucinate and log bad data.&lt;/p&gt;

&lt;p&gt;If your trace data is mutable, it is not evidence. It is a claim about what happened, written after the fact. Span Chain treats every event as an immutable, cryptographically sealed record. You cannot rewrite history without breaking the chain.&lt;/p&gt;

&lt;h2&gt;
  
  
  What tamper-evident means in practice
&lt;/h2&gt;

&lt;p&gt;Span Chain uses a SHA-256 hash chain. Every event during an agent session is appended to an immutable ledger. The hash input covers the sequence, the previous hash, the exact payload, the parent span, the run ID, and the epoch. Change one byte of an old span and the chain breaks.&lt;/p&gt;

&lt;p&gt;This is what separates Span Chain from standard LLM observability tools like LangSmith or Langfuse. Those show you what happened. Span Chain lets you prove it.&lt;/p&gt;

&lt;p&gt;Verification is a single API call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl http://localhost:4001/api/runs/your-run-id/verify &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &amp;lt;token&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"span_count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"valid"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"chain_broken_at_seq"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One changed byte anywhere in history. You know immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  The replay cost trap
&lt;/h2&gt;

&lt;p&gt;Debugging by re-running the agent is a trap. Every retry is another live LLM call. That costs money and latency.&lt;/p&gt;

&lt;p&gt;Span Chain solves this with VCR-style cassette replay. It reads the exact payload stream from the database and feeds it back to the system. No LLM, no API credits. Replay costs $0.&lt;/p&gt;

&lt;p&gt;Here is how you instrument an agent with the Span Chain Python SDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;spanchain&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;gf&lt;/span&gt;

&lt;span class="n"&gt;gf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:4000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent-run-001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@gf.trace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agent_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llm_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;span&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool_call&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Span Chain SDK is intentionally dumb. It exports spans as OTLP to the backend and nothing else. All cryptographic sequencing happens server-side. The client cannot forge a clean chain even if it tries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model upgrades
&lt;/h2&gt;

&lt;p&gt;When you swap models, your agent's behavior changes. How do you know what broke?&lt;/p&gt;

&lt;p&gt;Span Chain lets you replay old cassettes through the new model and run a structural comparison. The comparator flags the exact span where behavior diverged. Not just "Run B was slower" but the first point where the two runs split. If the new model added a tool call or skipped a step, you see it immediately. You stop guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I got here
&lt;/h2&gt;

&lt;p&gt;I kept running into the same wall: agent fails, logs tell you nothing useful, you re-run and get a different result. Existing tools were not built for this. They produce mutable data with no replay capability.&lt;/p&gt;

&lt;p&gt;So I built Span Chain, an auditable harness for production AI agents. The backend runs on Elixir/OTP, where every agent session gets its own isolated BEAM process (~2 KB heap). A crash in one agent does not touch the others. That is how you get 1,000 concurrent agents, 10,000 spans, 571 spans/sec, and 0 corrupted chains.&lt;/p&gt;

&lt;p&gt;Span Chain is MIT licensed and self-hosted. Edit .env and set POSTGRES_PASSWORD and GF_API_KEY, then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ghostfactory-art/spanchain
&lt;span class="nb"&gt;cd &lt;/span&gt;spanchain
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
docker compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The repo is at &lt;a href="https://github.com/ghostfactory-art/spanchain" rel="noopener noreferrer"&gt;github.com/ghostfactory-art/spanchain&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;*Footnote: EU AI Act Article 12 requires automatic event logging and traceability for high-risk AI systems (Annex III obligations expected from December 2027, pending formal adoption of the AI Omnibus agreed in May 2026). The law does not mandate tamper-evidence, but a log that can be silently rewritten is hard to defend as traceability. Span Chain gives you evidence-grade records that stand up to scrutiny.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: &lt;code&gt;spanchain&lt;/code&gt; will be on PyPI shortly. For now, install from source:&lt;br&gt;
&lt;code&gt;pip install ./sdk/python&lt;/code&gt;*&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>agents</category>
      <category>python</category>
      <category>opensource</category>
      <category>elixir</category>
    </item>
  </channel>
</rss>
