<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: 林子超（子超）</title>
    <description>The latest articles on DEV Community by 林子超（子超） (@tznthou).</description>
    <link>https://dev.to/tznthou</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4016020%2F28afc7c4-63e1-4068-b0c4-17599c1d2e89.png</url>
      <title>DEV Community: 林子超（子超）</title>
      <link>https://dev.to/tznthou</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tznthou"/>
    <language>en</language>
    <item>
      <title>Parsing Claude Code's JSONL: patterns for a schema that keeps moving</title>
      <dc:creator>林子超（子超）</dc:creator>
      <pubDate>Sun, 05 Jul 2026 10:01:55 +0000</pubDate>
      <link>https://dev.to/tznthou/parsing-claude-codes-jsonl-patterns-for-a-schema-that-keeps-moving-2dcj</link>
      <guid>https://dev.to/tznthou/parsing-claude-codes-jsonl-patterns-for-a-schema-that-keeps-moving-2dcj</guid>
      <description>&lt;p&gt;Every conversation you have with Claude Code is written to disk as JSONL, under &lt;code&gt;~/.claude/projects/&lt;/code&gt;. Your decisions, your dead ends, the bug hunt that took three sessions: it is all there. You have probably never opened one.&lt;/p&gt;

&lt;p&gt;The catch: the format is an internal implementation detail. No documentation, no version field, no stability guarantee. The schema changes whenever the CLI updates, which is, at the current pace, almost daily.&lt;/p&gt;

&lt;p&gt;The patterns below come from building a read-only replay and search tool on top of these files, and from keeping it alive through a dozen CLI releases. Each pattern survived contact with real data. Four were learned the hard way, from bugs worth retelling.&lt;/p&gt;

&lt;h2&gt;
  
  
  The ground rule: you are an archaeologist, not a validator
&lt;/h2&gt;

&lt;p&gt;A parser for someone else's internal format has a different job than a parser for your own API. Rejecting malformed input is not an option: the input is the historical record, and whatever is on disk is all there will ever be.&lt;/p&gt;

&lt;p&gt;The contract that follows from this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A bad line never kills the file.&lt;/strong&gt; Skip it, count it, move on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An unknown shape is preserved, not dropped.&lt;/strong&gt; You can re-parse later; you cannot un-drop data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normalize at the boundary, once.&lt;/strong&gt; Downstream code (search index, UI, export) should never see the mess.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything below is one of these rules meeting reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 1: tolerant line parsing with an explicit whitelist
&lt;/h2&gt;

&lt;p&gt;The naive loop (&lt;code&gt;JSON.parse&lt;/code&gt; each line, switch on &lt;code&gt;type&lt;/code&gt;) works on day one. The question is what happens when a CLI update introduces a &lt;code&gt;type&lt;/code&gt; nobody has seen before. This is not hypothetical; a real batch of them appears at the end of this post.&lt;/p&gt;

&lt;p&gt;The approach that holds up: keep an explicit whitelist of known types, and treat everything outside it as "parse failed, but preserved":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;KNOWN_MESSAGE_TYPES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;system&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;queue-operation&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;last-prompt&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;progress&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;attachment&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;file-history-snapshot&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;permission-mode&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;custom-title&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai-title&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;agent-name&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;pr-link&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;parseLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;ParsedLine&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;

  &lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;obj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="c1"&gt;// malformed line: skip, never throw&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;unknown&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;parseFailed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;KNOWN_MESSAGE_TYPES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="c1"&gt;// unknown type → keep the raw JSON string for later re-parse&lt;/span&gt;
  &lt;span class="c1"&gt;// known type → extracted fields are enough, raw can be dropped&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The whitelist does double duty as a storage policy. For known types, the extracted columns are sufficient and the raw JSON can be discarded; that alone reclaims most of the disk space. For unknown types, the raw line goes into an archive table untouched. When a future version of the parser learns the new shape, the evidence is still there.&lt;/p&gt;

&lt;p&gt;One more detail that pays off: cap the length of identifier fields (&lt;code&gt;uuid&lt;/code&gt;, &lt;code&gt;requestId&lt;/code&gt;) at something sane like 128 chars before trusting them. Parsing files you do not control calls for a little paranoia at the boundary, and it is cheap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern 2: version your derived data, not just your schema
&lt;/h2&gt;

&lt;p&gt;Preserving unknowns only matters if you can act on them later. The mechanism is a &lt;code&gt;SUMMARY_VERSION&lt;/code&gt; integer stored per session. When the parser learns new tricks, bump the version; the indexer sees stale versions and re-parses those sessions automatically on the next sync.&lt;/p&gt;

&lt;p&gt;This turns "the schema changed again" from a migration crisis into a routine: extend the parser, bump the version, let the backfill run. No manual steps, no data loss, no "please delete your index and start over" release notes.&lt;/p&gt;

&lt;h2&gt;
  
  
  War story 1: the lone surrogates
&lt;/h2&gt;

&lt;p&gt;One day the indexer started producing strings that crashed downstream consumers. The cause: some JSONL lines contained &lt;em&gt;unpaired UTF-16 surrogates&lt;/em&gt;. Half an emoji, lurking in a tool-error message.&lt;/p&gt;

&lt;p&gt;How does half an emoji end up on disk? Older Claude Code versions (up to around 2.1.132, judging by the archived sessions) truncated long tool outputs by byte length, and the cut sometimes landed mid-emoji. &lt;code&gt;JSON.stringify&lt;/code&gt; happily writes the lone surrogate as a &lt;code&gt;\udXXX&lt;/code&gt; escape, the file looks like clean ASCII, and &lt;code&gt;JSON.parse&lt;/code&gt; faithfully reconstructs the broken string at read time. The corruption stays invisible until something refuses it: SQLite, an IPC bridge, a &lt;code&gt;TextEncoder&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The fix is one line, &lt;em&gt;if&lt;/em&gt; it lands in the right place:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// at the parser's exit boundary, applied to every extracted string&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;ensureWellFormed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toWellFormed&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;// lone surrogates → U+FFFD&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The placement is the actual lesson. Normalize once, at ingestion, and every consumer downstream (search index, renderer, Markdown export) gets to assume well-formed strings forever. Unicode normalization (NFC/NFD) deliberately stays out of this step: it would change user-visible text, which an archival tool has no business doing. Fix what is broken, touch nothing else.&lt;/p&gt;

&lt;p&gt;(&lt;code&gt;String.prototype.toWellFormed()&lt;/code&gt; needs Node.js 20+. Before that, the surrogate scan has to be written by hand.)&lt;/p&gt;

&lt;h2&gt;
  
  
  War story 2: the tokens that counted themselves twice
&lt;/h2&gt;

&lt;p&gt;The tool's token dashboard once reported usage numbers roughly &lt;strong&gt;2.3× higher&lt;/strong&gt; than reality, measured across a few hundred real sessions. The cause is a JSONL quirk worth knowing even if you never touch tokens:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One API response can become several JSONL lines.&lt;/strong&gt; When a response contains multiple content blocks (text plus tool calls, for instance), Claude Code writes one &lt;code&gt;assistant&lt;/code&gt; entry per block, and each entry carries a &lt;em&gt;copy of the same&lt;/em&gt; &lt;code&gt;usage&lt;/code&gt; object. Sum them naively and every multi-block turn is counted once per block.&lt;/p&gt;

&lt;p&gt;The entries share a &lt;code&gt;requestId&lt;/code&gt;, which is the dedup key. But there is a trap inside the trap: it is tempting to &lt;em&gt;merge&lt;/em&gt; the entries into one logical message. Don't: entries of different requests can interleave on disk (streaming order), and merging would scramble the conversation. The entries themselves are fine; only the usage is duplicated.&lt;/p&gt;

&lt;p&gt;So: keep every entry, zero out the usage on all but the last entry per &lt;code&gt;requestId&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;deduplicateTokensByRequestId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ParsedLine&lt;/span&gt;&lt;span class="p"&gt;[]):&lt;/span&gt; &lt;span class="nx"&gt;ParsedLine&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lastIndex&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requestId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;lastIndex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requestId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;assistant&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requestId&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;lastIndex&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;requestId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;
      &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;inputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;outputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;cacheReadTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;cacheCreationTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;line&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The general lesson: &lt;strong&gt;one JSONL line is not one logical event.&lt;/strong&gt; Never assume a 1:1 mapping between physical lines and semantic units in a format you do not own.&lt;/p&gt;

&lt;h2&gt;
  
  
  War story 3: resumed sessions replay the past
&lt;/h2&gt;

&lt;p&gt;When a Claude Code session is resumed, the new JSONL file starts with copies of messages from the original session: same &lt;code&gt;uuid&lt;/code&gt;, same content, written again. Index both files naively and every resumed conversation shows up with duplicated history.&lt;/p&gt;

&lt;p&gt;The remedy is UUID-level dedup against what is already indexed. The trap hiding inside that fix: the dedup query must &lt;strong&gt;exclude the session currently being indexed&lt;/strong&gt;. Otherwise, re-indexing an existing session matches its own previously-indexed messages, concludes that every line is a duplicate, and quietly drops the entire session. A dedup check that can self-match is a data-loss machine with good intentions.&lt;/p&gt;

&lt;h2&gt;
  
  
  War story 4: screenshots will eat your index
&lt;/h2&gt;

&lt;p&gt;Claude Code conversations can contain images: screenshots pasted into the prompt, arriving as content blocks with base64 data inline. Store message content verbatim and a handful of screenshots will outweigh thousands of text messages in the database.&lt;/p&gt;

&lt;p&gt;The pattern: strip the payload, keep the shape.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;image&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;block&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;block&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[base64-stripped]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The block structure survives, so the UI can still render an "image was here" placeholder at the right position, and a &lt;code&gt;has_image&lt;/code&gt; flag stays queryable. Only the megabytes are gone. Same archaeology principle as everywhere else: preserve the &lt;em&gt;evidence of structure&lt;/em&gt;, not necessarily every byte of payload.&lt;/p&gt;

&lt;h2&gt;
  
  
  The schema will move again
&lt;/h2&gt;

&lt;p&gt;In case "the schema keeps moving" sounds abstract, here is what diffing real session files before and after one CLI release (v2.1.168, June 2026) turned up: top-level attribution fields on assistant entries (which skill, plugin, or MCP server produced a reply), &lt;code&gt;image&lt;/code&gt; content blocks, structured &lt;code&gt;system&lt;/code&gt; subtypes carrying API error status codes, and an &lt;code&gt;edited_text_file&lt;/code&gt; attachment type. Four schema extensions, zero announcements. A normal month.&lt;/p&gt;

&lt;p&gt;With the patterns above, absorbing that release was: extend the whitelist, extract the new fields, bump &lt;code&gt;SUMMARY_VERSION&lt;/code&gt;, ship. The sessions written before the parser update backfilled themselves on the next sync. Nothing was lost in the weeks where the parser did not yet understand the new shapes: the unknown parts were sitting in the archive table, waiting.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Skip bad lines, never throw.&lt;/strong&gt; The file is the historical record; the parser's opinion of it is irrelevant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Whitelist known shapes; archive unknown ones raw.&lt;/strong&gt; Storage policy and forward compatibility in one mechanism.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version your derived data.&lt;/strong&gt; Re-parsing should be a routine background event, not a migration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normalize at the ingestion boundary, exactly once&lt;/strong&gt;, and only what is actually broken (&lt;code&gt;toWellFormed&lt;/code&gt;, yes; NFC, no).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distrust the line/event mapping.&lt;/strong&gt; Duplicated &lt;code&gt;usage&lt;/code&gt; across entries, replayed messages across files: physical lines lie.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To see these patterns in production context, the tool is open source: &lt;a href="https://github.com/tznthou/ccRewind" rel="noopener noreferrer"&gt;ccRewind on GitHub&lt;/a&gt;, a read-only, offline replay and search tool for Claude Code history. It never writes a byte to &lt;code&gt;~/.claude/&lt;/code&gt;. Why that constraint exists, and what it cost, is the next post.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Disclosure: this post was drafted with Claude and edited by the human who debugged every story in it. The drafting sessions are, naturally, JSONL files under &lt;code&gt;~/.claude/projects/&lt;/code&gt; now.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>typescript</category>
      <category>parsing</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
