<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ivan Maksimov</title>
    <description>The latest articles on DEV Community by Ivan Maksimov (@jazzmax).</description>
    <link>https://dev.to/jazzmax</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3995597%2Fb7b7c66b-b8bd-4b11-bf65-30a189f97993.jpg</url>
      <title>DEV Community: Ivan Maksimov</title>
      <link>https://dev.to/jazzmax</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jazzmax"/>
    <language>en</language>
    <item>
      <title>How I tracked down a 36GB memory leak in a Claude Code memory server</title>
      <dc:creator>Ivan Maksimov</dc:creator>
      <pubDate>Mon, 22 Jun 2026 05:45:29 +0000</pubDate>
      <link>https://dev.to/jazzmax/how-i-tracked-down-a-36gb-memory-leak-in-a-claude-code-memory-server-27bo</link>
      <guid>https://dev.to/jazzmax/how-i-tracked-down-a-36gb-memory-leak-in-a-claude-code-memory-server-27bo</guid>
      <description>&lt;p&gt;A debugging story about heap snapshots, native memory that &lt;code&gt;--max-old-space-size&lt;/code&gt; can't touch, and a WebAssembly filesystem quietly hoarding files.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;I run a small service that gives a team of Claude Code users one shared memory store. Mechanically it's a Node/Express proxy that wraps a stdio MCP server (&lt;code&gt;ruflo&lt;/code&gt;) and exposes it over HTTP. You don't need the product to follow the bug — just one fact: a long-lived Node process serves memory operations, and underneath it uses &lt;strong&gt;sql.js&lt;/strong&gt; (SQLite compiled to WebAssembly) to hold the store.&lt;/p&gt;

&lt;p&gt;One instance in production kept growing. Not spiking — &lt;em&gt;creeping&lt;/em&gt;. ~36 GB RSS over six weeks, then the cgroup OOM-killer would reap it and the clock reset. Classic leak shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: is it even my code?
&lt;/h2&gt;

&lt;p&gt;The proxy and the wrapped MCP child are separate processes. &lt;code&gt;ps&lt;/code&gt; settled it fast: the proxy sat flat at ~60 MB; the &lt;code&gt;ruflo mcp start&lt;/code&gt; child was the one ballooning. So the leak was below my code, in the wrapped process. Good — narrower problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: the heap that wasn't
&lt;/h2&gt;

&lt;p&gt;First instinct on a Node leak is the V8 heap. So I looked at &lt;code&gt;process.memoryUsage()&lt;/code&gt; on the live child:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rss            1385 MB
heapTotal        24 MB
heapUsed         21 MB
external       1286 MB
arrayBuffers    995 MB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the whole story in five numbers. &lt;code&gt;heapTotal&lt;/code&gt; — the V8 JS heap — is flat at 24 MB. The growth is entirely in &lt;strong&gt;&lt;code&gt;external&lt;/code&gt; / &lt;code&gt;arrayBuffers&lt;/code&gt;&lt;/strong&gt;: native memory backing &lt;code&gt;ArrayBuffer&lt;/code&gt;s, &lt;em&gt;outside&lt;/em&gt; the GC'd JS heap.&lt;/p&gt;

&lt;p&gt;That immediately kills two "obvious" fixes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;--max-old-space-size&lt;/code&gt;&lt;/strong&gt; does nothing — it bounds the old space (21 MB here), not native buffers.&lt;/li&gt;
&lt;li&gt;Forcing GC does nothing either, &lt;em&gt;if&lt;/em&gt; something still references those buffers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So: what holds ~1 GB of &lt;code&gt;ArrayBuffer&lt;/code&gt;s?&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: the heap snapshot
&lt;/h2&gt;

&lt;p&gt;I opened the inspector on the live process (&lt;code&gt;kill -USR1 &amp;lt;pid&amp;gt;&lt;/code&gt;, then connected over the WebSocket — Node 22 has a global &lt;code&gt;WebSocket&lt;/code&gt;, so a 30-line script does it) and took a &lt;code&gt;HeapProfiler.takeHeapSnapshot&lt;/code&gt;. The snapshot was only ~18 MB, which is itself a clue: if the leak were &lt;em&gt;hundreds of thousands of small&lt;/em&gt; JS objects, the graph would be huge. A small graph holding a lot of bytes means &lt;strong&gt;a few big buffers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Parsing the snapshot (the format is just &lt;code&gt;nodes&lt;/code&gt; / &lt;code&gt;edges&lt;/code&gt; / &lt;code&gt;strings&lt;/code&gt; arrays), the top retained objects were unambiguous:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;203 × native:system / JSArrayBufferData @ 11.0 MB = 2233 MB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;203 buffers, &lt;strong&gt;11 MB each&lt;/strong&gt;. And 11 MB was exactly the size of the on-disk &lt;code&gt;memory.db&lt;/code&gt;. The retainer chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;JSArrayBufferData (11 MB)
  &amp;lt;- ArrayBuffer
  &amp;lt;- Buffer
  &amp;lt;- (MEMFS file node).contents
  &amp;lt;- FS.nodes  (an Array)
  &amp;lt;- Context  (the sql.js Emscripten module — has WebAssembly.Memory, HEAPF32, createNode, /dev/tty…)
  &amp;lt;- SqlJsBackend.db
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That &lt;code&gt;Context&lt;/code&gt; with &lt;code&gt;createNode&lt;/code&gt;, &lt;code&gt;/dev/tty&lt;/code&gt;, and a &lt;code&gt;WebAssembly.Memory&lt;/code&gt; is the tell: it's &lt;strong&gt;Emscripten's in-memory filesystem (MEMFS)&lt;/strong&gt;. The file names confirmed it — each buffer was a MEMFS file called &lt;code&gt;dbfile_&amp;lt;random&amp;gt;&lt;/code&gt;, and there were ~200 of them, each a full copy of the database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: the root cause
&lt;/h2&gt;

&lt;p&gt;Here's the mechanism. sql.js's &lt;code&gt;Database&lt;/code&gt; constructor writes its input bytes into a MEMFS file (&lt;code&gt;dbfile_&amp;lt;random&amp;gt;&lt;/code&gt;) via &lt;code&gt;FS.createDataFile&lt;/code&gt;. &lt;code&gt;Database.prototype.close()&lt;/code&gt; is what removes it (&lt;code&gt;FS.unlink&lt;/code&gt;). And the sql.js module is a &lt;strong&gt;process-wide singleton&lt;/strong&gt; — one MEMFS shared by every &lt;code&gt;Database&lt;/code&gt; you ever open.&lt;/p&gt;

&lt;p&gt;The backend opened the database like this, per operation path, with no caching:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;SQL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readFileSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt; &lt;span class="c1"&gt;// loads the whole 11MB image&lt;/span&gt;
&lt;span class="c1"&gt;// ...used, then the wrapper goes out of scope&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When that JS &lt;code&gt;Database&lt;/code&gt; wrapper is dropped, V8 garbage-collects the &lt;em&gt;wrapper object&lt;/em&gt; — but &lt;strong&gt;GC has no idea about the MEMFS file&lt;/strong&gt; it created inside the WASM module. Only an explicit &lt;code&gt;close()&lt;/code&gt; unlinks it. No &lt;code&gt;close()&lt;/code&gt; → the 11 MB &lt;code&gt;dbfile_&amp;lt;random&amp;gt;&lt;/code&gt; lives in MEMFS forever. One leaked DB image per open. Multiply by traffic and you get 36 GB.&lt;/p&gt;

&lt;p&gt;This is the trap in one sentence: &lt;strong&gt;garbage-collecting a JS handle does not free native/WASM memory it allocated.&lt;/strong&gt; The GC sees a tiny wrapper; the cost is in a buffer the GC doesn't manage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: two fixes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Containment (ship today).&lt;/strong&gt; I added an RSS watchdog to the proxy: it reads the child's RSS from &lt;code&gt;/proc/&amp;lt;pid&amp;gt;/status&lt;/code&gt;, and when it crosses a threshold it gracefully respawns the child once it's idle (reusing an existing single-flight reconnect path — kill the old child, spawn a fresh one). A respawn drops the entire bloated MEMFS at once. Symptomatic, but it bounds memory with zero dropped requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause (fix it properly).&lt;/strong&gt; Cache the backend per database path so the DB opens &lt;strong&gt;once&lt;/strong&gt; and is reused, instead of a fresh &lt;code&gt;SQL.Database&lt;/code&gt; per call. No repeated loads → no new &lt;code&gt;dbfile_*&lt;/code&gt;. I bake this as a build-time patch into the image and filed it upstream with the snapshot.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bonus disaster: a corrupted database
&lt;/h2&gt;

&lt;p&gt;The earlier hard OOM-kills had interrupted a sql.js write mid-flight and left one &lt;code&gt;memory.db&lt;/code&gt; corrupted — &lt;code&gt;database disk image is malformed&lt;/code&gt;, busted overflow pages in the B-tree. Recovery turned into its own adventure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.recover&lt;/code&gt; (SQLite's salvage mode) reconstructed the bulk of the rows by walking the B-tree fragments.&lt;/li&gt;
&lt;li&gt;But the &lt;em&gt;newest&lt;/em&gt; writes weren't in the main file — they lived in the &lt;strong&gt;WAL&lt;/strong&gt; (&lt;code&gt;-wal&lt;/code&gt;), which &lt;code&gt;.recover&lt;/code&gt; doesn't replay, and some sat on the corrupted pages. I ended up parsing WAL frames by hand (apply page images by page number) and carving SQLite leaf-page records directly to recover the rest.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lesson burned in: &lt;strong&gt;a WAL-mode SQLite backup is three files&lt;/strong&gt; — &lt;code&gt;db&lt;/code&gt; + &lt;code&gt;-wal&lt;/code&gt; + &lt;code&gt;-shm&lt;/code&gt;. Copy only the &lt;code&gt;.db&lt;/code&gt; and you get exactly that "malformed" error, because the latest committed state is still in the WAL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Split RSS by origin first.&lt;/strong&gt; &lt;code&gt;heapTotal&lt;/code&gt; flat + &lt;code&gt;external&lt;/code&gt;/&lt;code&gt;arrayBuffers&lt;/code&gt; rising = native leak. Don't reach for &lt;code&gt;--max-old-space-size&lt;/code&gt;; it can't help.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GC ≠ free for native/WASM memory.&lt;/strong&gt; Anything backed by a WASM heap, an Emscripten MEMFS, or a native addon needs an explicit close/free. Dropping the JS handle isn't enough.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Heap snapshots find native retainers too.&lt;/strong&gt; The &lt;code&gt;JSArrayBufferData&lt;/code&gt; nodes and their retainer chain pointed straight at the owning structure. A small snapshot holding big bytes = few large buffers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;WAL backups are three files.&lt;/strong&gt; Or your backup is unrecoverable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open source means you can actually fix the dependency.&lt;/strong&gt; The leak was three layers down in someone else's package. I snapshotted it, found the cause, patched it locally, and sent it upstream — instead of filing a ticket into a void.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Upstream writeup with the full retainer trace: &lt;a href="https://github.com/ruvnet/ruflo/issues/2432" rel="noopener noreferrer"&gt;&lt;code&gt;ruvnet/ruflo#2432&lt;/code&gt;&lt;/a&gt;. The wrapper itself, if you're curious: &lt;a href="https://github.com/jazz-max/ruflo-hub" rel="noopener noreferrer"&gt;&lt;code&gt;jazz-max/ruflo-hub&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>debugging</category>
      <category>node</category>
      <category>webassembly</category>
      <category>sqlite</category>
    </item>
  </channel>
</rss>
