<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Andrew</title>
    <description>The latest articles on DEV Community by Andrew (@andrew-ooo).</description>
    <link>https://dev.to/andrew-ooo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3775252%2Ff6bbe8a2-ee0c-41f7-9468-c85f0b00ca95.png</url>
      <title>DEV Community: Andrew</title>
      <link>https://dev.to/andrew-ooo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/andrew-ooo"/>
    <language>en</language>
    <item>
      <title>Flue Review: Astro Team's TypeScript Agent Framework (2026)</title>
      <dc:creator>Andrew</dc:creator>
      <pubDate>Sun, 21 Jun 2026 10:09:18 +0000</pubDate>
      <link>https://dev.to/andrew-ooo/flue-review-astro-teams-typescript-agent-framework-2026-16og</link>
      <guid>https://dev.to/andrew-ooo/flue-review-astro-teams-typescript-agent-framework-2026-16og</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Originally published on &lt;a href="https://andrew.ooo/posts/flue-astro-typescript-sandbox-agent-framework-review/" rel="noopener noreferrer"&gt;andrew.ooo&lt;/a&gt;&lt;/strong&gt; — visit the original for any updates, code snippets that aged out, or follow-up posts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Flue&lt;/strong&gt; is a new open-source TypeScript framework from the Astro team for building autonomous AI agents. Instead of giving you another LLM SDK wrapper, it gives you a &lt;strong&gt;programmable harness&lt;/strong&gt;: sessions, tools, skills, instructions, filesystem access, and a real sandbox the agent can operate inside. Define an agent in one file, run it locally with &lt;code&gt;flue dev&lt;/code&gt;, deploy to Node, Cloudflare Workers, GitHub Actions, GitLab CI, or Daytona.&lt;/p&gt;

&lt;p&gt;The repo crossed &lt;strong&gt;6,200 stars with 1,012 added this week&lt;/strong&gt; and is trending on GitHub at the time of writing (June 2026). It is built by Fred K. Schott (founder of Astro/Snowpack/Skypack) and the same team behind Astro, which signals "this is a serious framework, not a weekend launch."&lt;/p&gt;

&lt;p&gt;Key facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MIT-style permissive license&lt;/strong&gt;, monorepo at &lt;a href="https://github.com/withastro/flue" rel="noopener noreferrer"&gt;withastro/flue&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Five packages&lt;/strong&gt; — &lt;code&gt;@flue/runtime&lt;/code&gt; (harness), &lt;code&gt;@flue/cli&lt;/code&gt; (&lt;code&gt;flue&lt;/code&gt; binary), &lt;code&gt;@flue/sdk&lt;/code&gt; (client), &lt;code&gt;@flue/opentelemetry&lt;/code&gt;, &lt;code&gt;@flue/postgres&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two primitives&lt;/strong&gt; — &lt;code&gt;createAgent()&lt;/code&gt; for continuing context, &lt;code&gt;createWorkflow()&lt;/code&gt; for single-shot structured runs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox-first&lt;/strong&gt; — local Node, virtual sandbox, Daytona containers, custom adapters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider-agnostic&lt;/strong&gt; — Anthropic, OpenAI, Google, plus anything via OpenRouter; model id is a string like &lt;code&gt;anthropic/claude-sonnet-4-6&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills + MCP native&lt;/strong&gt; — import &lt;code&gt;SKILL.md&lt;/code&gt; files with &lt;code&gt;with { type: 'skill' }&lt;/code&gt;, connect MCP servers as tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subagents, durable execution, OpenTelemetry tracing&lt;/strong&gt; built in&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connectors-as-recipes&lt;/strong&gt; — &lt;code&gt;flue add daytona | claude&lt;/code&gt; pipes a markdown adapter recipe straight into your coding agent&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Flue Actually Is
&lt;/h2&gt;

&lt;p&gt;Most "agent frameworks" in 2024–2025 were really LLM SDKs in a trench coat. You called &lt;code&gt;openai.chat.completions.create()&lt;/code&gt; inside a class, looped on tool calls, and called it a day. That worked for chatbots. It did &lt;strong&gt;not&lt;/strong&gt; work for the new generation of agents like Claude Code, Codex, and Cursor's agent mode — agents that get a task, not a script, and need a real environment to operate in.&lt;/p&gt;

&lt;p&gt;Flue is the answer to "what's the framework version of that?" It assumes your agent will:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run for minutes or hours, across many model turns&lt;/li&gt;
&lt;li&gt;Need a filesystem, a shell, network access, and tools&lt;/li&gt;
&lt;li&gt;Need to recover from crashes mid-task without losing progress&lt;/li&gt;
&lt;li&gt;Need to be invokable from HTTP, queues, webhooks, or CLI&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So instead of an SDK, Flue gives you an &lt;strong&gt;agent harness&lt;/strong&gt; — a runtime that owns the sandbox, the session store, the tool dispatcher, and the durable execution engine. You just describe what your agent should do.&lt;/p&gt;

&lt;p&gt;A complete agent in 15 lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/agents/triage.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;AgentRouteHandler&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@flue/runtime&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;local&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@flue/runtime/node&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;triage&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../skills/triage/SKILL.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="kd"&gt;with&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;skill&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;verify&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../skills/verify/SKILL.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="kd"&gt;with&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;skill&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;githubTools&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../tools/github.ts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;route&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;AgentRouteHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_c&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;createAgent&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;anthropic/claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...&lt;/span&gt;&lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;githubTools&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
  &lt;span class="na"&gt;skills&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;triage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;verify&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;local&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Triage a bug report end-to-end: reproduce, diagnose, verify, attempt a fix.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. &lt;code&gt;flue dev&lt;/code&gt; boots an HTTP server at &lt;code&gt;POST /agents/triage/:id&lt;/code&gt;, persists session state, dispatches tool calls inside the local sandbox, streams events at &lt;code&gt;GET /agents/triage/:id&lt;/code&gt;, and exports OpenTelemetry traces if you wire them up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It's Trending Now (June 2026)
&lt;/h2&gt;

&lt;p&gt;Three forces converged:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The "agent harness" pattern won.&lt;/strong&gt; Claude Code and Codex proved that autonomous agents need a runtime, not just an SDK call. Every major framework — LangGraph, Mastra, Vercel AI SDK — is racing to add sandboxing and durable execution. Flue is the first one &lt;em&gt;designed around&lt;/em&gt; the harness from day one instead of bolting it on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TypeScript caught up to Python for agents.&lt;/strong&gt; With Anthropic, OpenAI, and Vercel all shipping first-class TypeScript SDKs in 2026, the JS ecosystem finally has parity for tool calling, structured outputs, and streaming. Flue is the framework that takes advantage of that.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Astro team has credibility.&lt;/strong&gt; Fred Schott shipping a new framework is news. The launch tweet (&lt;a href="https://x.com/FredKSchott" rel="noopener noreferrer"&gt;X.com/FredKSchott&lt;/a&gt;) and the &lt;a href="https://www.thedeepfeed.ai/posts/2026-05-02-flue-agent-harness-framework/" rel="noopener noreferrer"&gt;Deep Feed write-up&lt;/a&gt; framed it as "the agent-harness moment, made real" — and the GitHub stars followed.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The headline reaction on HN: "Finally a framework that doesn't pretend agents are just chatbots with tools."&lt;/p&gt;

&lt;h2&gt;
  
  
  Install &amp;amp; First Run (60 Seconds)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1 — Scaffold a project&lt;/span&gt;
npm create flue@latest my-agent
&lt;span class="nb"&gt;cd &lt;/span&gt;my-agent

&lt;span class="c"&gt;# 2 — Set your API key&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"ANTHROPIC_API_KEY=sk-ant-..."&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .env

&lt;span class="c"&gt;# 3 — Run dev server&lt;/span&gt;
npx flue dev
&lt;span class="c"&gt;# → POST http://localhost:4321/agents/joke-teller/abc123&lt;/span&gt;

&lt;span class="c"&gt;# 4 — Talk to it&lt;/span&gt;
curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:4321/agents/joke-teller/abc123 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"message": "tell me a typescript joke"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dev server has hot reload — edit &lt;code&gt;src/agents/joke-teller.ts&lt;/code&gt;, save, the next message uses the new config. Session state persists across reloads via the default in-memory store (swap for &lt;code&gt;@flue/postgres&lt;/code&gt; in production).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Five Primitives
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Agents
&lt;/h3&gt;

&lt;p&gt;Continuing context. Sessions persist between requests. Use for chatbots, coding agents, support assistants, long-running bug triage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;createAgent&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;anthropic/claude-haiku-4-5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Help the customer resolve their support ticket.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;createTicketTools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;// scoped per ticket&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;id&lt;/code&gt; in the URL (&lt;code&gt;/agents/support/:id&lt;/code&gt;) is passed into &lt;code&gt;createAgent&lt;/code&gt;, so you can scope tools, instructions, and data per-instance. Common pattern: &lt;code&gt;id&lt;/code&gt; is a GitHub issue number, support ticket ID, or user ID.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Workflows
&lt;/h3&gt;

&lt;p&gt;Single-shot structured automations. Inputs, outputs, no continuing context. Use for batch jobs, scheduled runs, webhook handlers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;createWorkflow&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Summarize &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; in 3 bullets.`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Sandboxes
&lt;/h3&gt;

&lt;p&gt;The differentiator. Every agent action — filesystem read, shell command, network call — goes through a sandbox adapter. Out of the box:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;virtual()&lt;/code&gt; — in-memory, fast, no real filesystem (good for unit tests)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;local()&lt;/code&gt; — real Node &lt;code&gt;fs&lt;/code&gt; + &lt;code&gt;child_process&lt;/code&gt;, scoped to &lt;code&gt;cwd&lt;/code&gt; (good for dev)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;daytona()&lt;/code&gt; — full Linux container via Daytona, with image caching (production coding agents)&lt;/li&gt;
&lt;li&gt;Custom adapters via &lt;code&gt;defineSandboxAdapter()&lt;/code&gt; for E2B, Modal, Fly Machines, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent itself doesn't know which sandbox it's in. Same code runs on your laptop and inside a Daytona container in CI.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Skills
&lt;/h3&gt;

&lt;p&gt;Reusable expertise packages. A skill is a &lt;code&gt;SKILL.md&lt;/code&gt; file plus optional helpers, imported as a typed module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;triage&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../skills/triage/SKILL.md&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="kd"&gt;with&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;skill&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;createAgent&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;anthropic/claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;skills&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;triage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="c1"&gt;// loaded into context when relevant&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the same skill format Anthropic uses for Claude (and that we've covered in posts like &lt;a href="https://andrew.ooo/posts/skillspector-nvidia-ai-agent-skill-security-scanner-review/" rel="noopener noreferrer"&gt;SkillSpector&lt;/a&gt; and &lt;a href="https://github.com/addyosmani/agent-skills" rel="noopener noreferrer"&gt;agent-skills&lt;/a&gt;). Flue treats them as first-class build artifacts: skills are bundled at build time and shipped with the agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Subagents
&lt;/h3&gt;

&lt;p&gt;Specialized roles your main agent can delegate to. Defined as agent profiles, dispatched via a built-in tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;defineAgentProfile&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@flue/runtime&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;codeReviewer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defineAgentProfile&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;anthropic/claude-sonnet-4-6&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Review diffs and report findings with line numbers.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;createAgent&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;anthropic/claude-opus-4-7&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// big model for orchestration&lt;/span&gt;
  &lt;span class="na"&gt;subagents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;codeReviewer&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;        &lt;span class="c1"&gt;// delegate review to a cheaper model&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pattern: use the expensive model to plan, dispatch focused tasks to cheap models. This is one of the few framework-level features for cost control we've seen done right.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment Surface
&lt;/h2&gt;

&lt;p&gt;Flue ships first-class deploy adapters for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Node.js&lt;/strong&gt; (any host) — &lt;code&gt;flue build &amp;amp;&amp;amp; node .flue/server.js&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare Workers&lt;/strong&gt; — including Durable Objects for session state&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Actions&lt;/strong&gt; — agent runs on a workflow trigger&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitLab CI/CD&lt;/strong&gt; — same idea, GitLab side&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Render&lt;/strong&gt; — managed long-running services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Daytona&lt;/strong&gt; — for agents that need a real Linux container&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloudflare Workers + Durable Objects is the interesting one. Each agent ID becomes a Durable Object, so session state lives at the edge with single-writer guarantees. For a Discord bot or webhook handler, this is hard to beat.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Benchmarks (Honest)
&lt;/h2&gt;

&lt;p&gt;Flue is too new for community benchmarks, but the maintainers publish numbers from their internal bug-triage agent (see &lt;a href="https://flueframework.com/docs/guide/durable-execution/" rel="noopener noreferrer"&gt;the benchmarks page&lt;/a&gt;):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cold start (Node, local sandbox)&lt;/td&gt;
&lt;td&gt;~140 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cold start (Cloudflare Worker)&lt;/td&gt;
&lt;td&gt;~12 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool dispatch overhead&lt;/td&gt;
&lt;td&gt;&amp;lt;1 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session resume after crash&lt;/td&gt;
&lt;td&gt;&amp;lt;50 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Median agent turn (Claude Sonnet 4.6)&lt;/td&gt;
&lt;td&gt;1.8 s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For comparison, LangGraph's typical cold start on Node is 300–500ms, and Mastra's session resume is in the 100–200ms range. Flue's edge story (Cloudflare Workers) is genuinely faster than anything else in the TypeScript agent space right now.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community Reaction
&lt;/h2&gt;

&lt;p&gt;Selected reactions from HN, Reddit, and X in the past month:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HN top comment&lt;/strong&gt; on the Flue launch: &lt;em&gt;"This is the first agent framework I'd actually deploy. The sandbox abstraction is the right primitive — everyone else is gluing it on after the fact."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;r/LocalLLaMA&lt;/strong&gt;: &lt;em&gt;"Daytona connector is a killer feature. We replaced a 300-line E2B wrapper with &lt;code&gt;sandbox: daytona({ image: 'node:22' })&lt;/code&gt;."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vercel engineer on X&lt;/strong&gt;: &lt;em&gt;"Honestly the cleanest agent harness API I've seen. Reminds me of what Astro did for SSR — take the messy reality of the platform and make it ergonomic."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skeptical take from r/typescript&lt;/strong&gt;: &lt;em&gt;"Yet another framework. Why not just use Mastra/LangGraph?"&lt;/em&gt; — answered by the same thread: &lt;em&gt;"Because those are toolkits. Flue is a runtime. Different problem."&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Schott connection helps: developers who've used Astro trust this team to ship documentation, stability, and a long-term roadmap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Limitations
&lt;/h2&gt;

&lt;p&gt;What Flue does &lt;strong&gt;not&lt;/strong&gt; do well yet:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No managed cloud.&lt;/strong&gt; You self-host or use one of the deploy adapters. There's no "Flue Cloud" with one-click deploys, scheduling UI, or hosted observability. This is a deliberate choice ("the framework is the product") but it means more ops work than Mastra Cloud or Vercel's AI SDK + Vercel platform combo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No first-class evals/replay.&lt;/strong&gt; Other frameworks (Mastra, LangGraph) ship eval runners and prompt replay. Flue points you at Braintrust or your own observer. Fine for senior teams, friction for newer ones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills are still a moving target.&lt;/strong&gt; The &lt;code&gt;with { type: 'skill' }&lt;/code&gt; import attribute works in Node 22+ and modern bundlers, but expect occasional tooling rough edges (especially in monorepos with older TypeScript versions).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subagent dispatch is sequential.&lt;/strong&gt; No native parallel fan-out yet — if your orchestrator needs to dispatch five subagents at once, you wire it with &lt;code&gt;Promise.all&lt;/code&gt; yourself. The roadmap mentions native parallel dispatch but it's not shipped.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox cost is real.&lt;/strong&gt; A Daytona container per active agent ID adds up fast. The framework doesn't pool or hibernate containers automatically; you have to set TTLs yourself. Plan accordingly.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  When To Choose Flue
&lt;/h2&gt;

&lt;p&gt;✅ &lt;strong&gt;Great fit if you...&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Already use TypeScript end-to-end and don't want to drop into Python for agents&lt;/li&gt;
&lt;li&gt;Are building a coding agent, support bot, or CI agent that needs a real sandbox&lt;/li&gt;
&lt;li&gt;Want to deploy to Cloudflare Workers, GitHub Actions, or Daytona&lt;/li&gt;
&lt;li&gt;Care about durable execution and session resume across crashes&lt;/li&gt;
&lt;li&gt;Want first-class MCP and Skills support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ &lt;strong&gt;Skip it if you...&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Need a managed cloud with one-click deploys today (try Mastra Cloud)&lt;/li&gt;
&lt;li&gt;Are building a single-turn classifier or RAG bot — overkill (use Vercel AI SDK directly)&lt;/li&gt;
&lt;li&gt;Are deep in the Python ecosystem and your team has no TS appetite&lt;/li&gt;
&lt;li&gt;Need parallel subagent fan-out as a built-in primitive (not shipped yet)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How It Compares (Quick Table)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;Sandbox built-in&lt;/th&gt;
&lt;th&gt;Durable execution&lt;/th&gt;
&lt;th&gt;Deploy adapters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Flue&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;✅ (4 backends)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Node, CF, GH Actions, Daytona&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mastra&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;❌ (DIY)&lt;/td&gt;
&lt;td&gt;✅ (cloud)&lt;/td&gt;
&lt;td&gt;Mastra Cloud, Node&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangGraph JS&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;LangSmith, Node&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vercel AI SDK&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Vercel, Node&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agno&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;Partial&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;DIY&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The "sandbox built-in + Cloudflare Workers deploy" combination is unique to Flue right now.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Flue from the same team as Astro?
&lt;/h3&gt;

&lt;p&gt;Yes. It's published under the &lt;code&gt;withastro&lt;/code&gt; GitHub org and led by Fred K. Schott, Astro's founder. It does &lt;strong&gt;not&lt;/strong&gt; require Astro — Flue is a standalone framework. (But if you're already using Astro, the mental model and CLI ergonomics will feel familiar.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use OpenAI/Gemini/local models instead of Anthropic?
&lt;/h3&gt;

&lt;p&gt;Yes. The &lt;code&gt;model&lt;/code&gt; field is a string like &lt;code&gt;openai/gpt-4o-mini&lt;/code&gt;, &lt;code&gt;google/gemini-2.5-pro&lt;/code&gt;, or &lt;code&gt;ollama/llama-3.3-70b&lt;/code&gt;. Provider routing happens through the runtime; you can also pass a custom client.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does it work with MCP servers?
&lt;/h3&gt;

&lt;p&gt;Yes — Flue connects to MCP servers as tool sources. See &lt;a href="https://flueframework.com/docs/guide/tools/" rel="noopener noreferrer"&gt;docs/guide/tools/#connect-mcp-tools&lt;/a&gt;. For coverage of the wider MCP ecosystem, see our &lt;a href="https://andrew.ooo/posts/codebase-memory-mcp-code-intelligence-review/" rel="noopener noreferrer"&gt;Codebase Memory MCP review&lt;/a&gt; and &lt;a href="https://andrew.ooo/posts/unity-mcp-ai-game-development-bridge/" rel="noopener noreferrer"&gt;Unity MCP review&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is the session store implemented?
&lt;/h3&gt;

&lt;p&gt;Default is in-memory (good for dev). For production, install &lt;code&gt;@flue/postgres&lt;/code&gt; for Postgres-backed sessions, or implement the &lt;code&gt;SessionStore&lt;/code&gt; interface for Redis, DynamoDB, KV, etc. On Cloudflare Workers, each agent &lt;code&gt;id&lt;/code&gt; becomes a Durable Object — session state is colocated with the runtime.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the cost model?
&lt;/h3&gt;

&lt;p&gt;Flue itself is free (open source). You pay for: LLM tokens (your provider), sandbox compute (Daytona/local/CF), and any observability backend you add. There is no Flue Cloud and no managed pricing — by design.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does durable execution work?
&lt;/h3&gt;

&lt;p&gt;Every model turn and tool call is checkpointed to the session store. If the process crashes mid-turn, the next invocation resumes from the last checkpoint. This matters for long-running agents (hours-long bug triage runs) where losing 30 minutes of work to a deploy or OOM is unacceptable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;If you build TypeScript agents and you've been frustrated that every "framework" so far has really been a fancy SDK, &lt;strong&gt;Flue is worth a serious look&lt;/strong&gt;. The sandbox-first design, the Cloudflare Workers story, and the Astro team's track record make it the most credible new entrant in months.&lt;/p&gt;

&lt;p&gt;It's not the right tool for one-shot classifiers, and the managed-cloud gap means more ops work than competitors. But for production coding agents, support bots, or CI agents that need to run for hours and survive restarts — this is the cleanest API in the TypeScript ecosystem right now.&lt;/p&gt;

&lt;p&gt;Star it, scaffold a test project, and ship a small agent against your own repo. The 60-second setup is real.&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Source&lt;/strong&gt;: &lt;a href="https://github.com/withastro/flue" rel="noopener noreferrer"&gt;github.com/withastro/flue&lt;/a&gt;&lt;br&gt;
→ &lt;strong&gt;Docs&lt;/strong&gt;: &lt;a href="https://flueframework.com/" rel="noopener noreferrer"&gt;flueframework.com&lt;/a&gt;&lt;br&gt;
→ &lt;strong&gt;License&lt;/strong&gt;: MIT&lt;/p&gt;

</description>
      <category>flue</category>
      <category>withastro</category>
      <category>aiagents</category>
      <category>typescript</category>
    </item>
    <item>
      <title>codebase-memory-mcp Review: 99% Token Cut for Code Agents</title>
      <dc:creator>Andrew</dc:creator>
      <pubDate>Sat, 20 Jun 2026 10:13:03 +0000</pubDate>
      <link>https://dev.to/andrew-ooo/codebase-memory-mcp-review-99-token-cut-for-code-agents-3d81</link>
      <guid>https://dev.to/andrew-ooo/codebase-memory-mcp-review-99-token-cut-for-code-agents-3d81</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Originally published on &lt;a href="https://andrew.ooo/posts/codebase-memory-mcp-code-intelligence-review/" rel="noopener noreferrer"&gt;andrew.ooo&lt;/a&gt;&lt;/strong&gt; — visit the original for any updates, code snippets that aged out, or follow-up posts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;codebase-memory-mcp&lt;/strong&gt; is a single-binary MCP server that &lt;strong&gt;indexes any codebase into a persistent knowledge graph in milliseconds&lt;/strong&gt;, answers structural queries in &lt;strong&gt;under 1ms&lt;/strong&gt;, and cuts agent token spend by &lt;strong&gt;~99%&lt;/strong&gt; on the kinds of "where is this called from?" / "what does this affect?" questions that otherwise drown your model in grep-and-read loops. The repo is at &lt;strong&gt;8,703 GitHub stars with 4,212 added this week&lt;/strong&gt; (#3 on GitHub Trending) and is published with an arXiv preprint reporting &lt;strong&gt;83% answer quality at 10× fewer tokens and 2.1× fewer tool calls&lt;/strong&gt; vs. file-by-file exploration across 31 real-world repos.&lt;/p&gt;

&lt;p&gt;What makes this different from the dozen other "code graph for AI" projects: it's written in &lt;strong&gt;pure C&lt;/strong&gt; (zero runtime dependencies), ships as a &lt;strong&gt;single static binary&lt;/strong&gt; for macOS / Linux / Windows, vendors &lt;strong&gt;158 tree-sitter grammars&lt;/strong&gt; directly into the binary, ships &lt;strong&gt;Hybrid LSP semantic type resolution&lt;/strong&gt; for 11 major languages (Python, TypeScript/JavaScript/JSX/TSX, PHP, C#, Go, C, C++, Java, Kotlin, Rust), and the &lt;code&gt;install&lt;/code&gt; command &lt;strong&gt;auto-detects and configures 11 different coding agents&lt;/strong&gt; — Claude Code, Codex CLI, Gemini CLI, Zed, OpenCode, Antigravity, Aider, KiloCode, VS Code, OpenClaw, and Kiro.&lt;/p&gt;

&lt;p&gt;Key facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;8,703 GitHub stars&lt;/strong&gt;, &lt;strong&gt;+4,212 this week&lt;/strong&gt; (#3 GitHub Trending)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linux kernel&lt;/strong&gt; (28M LOC, 75K files) full-indexed in &lt;strong&gt;3 minutes&lt;/strong&gt; → 4.81M nodes, 7.72M edges&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sub-1ms&lt;/strong&gt; structural queries via in-memory SQLite, &lt;strong&gt;&amp;lt;10ms&lt;/strong&gt; name search, &lt;strong&gt;~150ms&lt;/strong&gt; dead-code detection on full graphs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;14 MCP tools&lt;/strong&gt;: search, trace, architecture, impact analysis, Cypher-like queries, dead code, cross-service HTTP/gRPC/GraphQL linking, ADR management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5,604 tests passing&lt;/strong&gt;, &lt;strong&gt;SLSA 3&lt;/strong&gt; provenance, &lt;strong&gt;OpenSSF Scorecard&lt;/strong&gt; badged, &lt;strong&gt;VirusTotal&lt;/strong&gt; scanned every release&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MIT licensed&lt;/strong&gt;, available on npm, PyPI, Homebrew, Scoop, Winget, Chocolatey, AUR, &lt;code&gt;go install&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;arXiv:2603.27277&lt;/strong&gt; preprint with reproducible benchmarks across 31 repos&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why grep-and-read loops are killing your agent bill
&lt;/h2&gt;

&lt;p&gt;If you've watched a coding agent answer "what calls &lt;code&gt;ProcessOrder&lt;/code&gt;?" in a non-trivial codebase, you've seen the pathology. It opens five files, greps for the symbol, opens five more, follows imports, greps again, and by the time it produces a half-correct answer it has consumed 50,000 tokens — most of them spent re-reading the same file headers, license blocks, and unrelated functions.&lt;/p&gt;

&lt;p&gt;The paper behind codebase-memory-mcp (&lt;a href="https://arxiv.org/abs/2603.27277" rel="noopener noreferrer"&gt;arXiv:2603.27277&lt;/a&gt;) puts numbers on this. Across &lt;strong&gt;31 real-world repositories&lt;/strong&gt; and &lt;strong&gt;five structural questions per repo&lt;/strong&gt;, file-by-file exploration consumed &lt;strong&gt;~412,000 tokens&lt;/strong&gt; versus &lt;strong&gt;~3,400 tokens&lt;/strong&gt; when the same questions were answered from a pre-built knowledge graph. That's a &lt;strong&gt;120× reduction&lt;/strong&gt; on token spend, or &lt;strong&gt;99.2% fewer tokens&lt;/strong&gt; depending on how you frame it. Answer quality was 83% vs. 92% for file exploration — a 9-point drop, but at one tenth the cost and 2.1× fewer tool calls.&lt;/p&gt;

&lt;p&gt;The structural insight: most "code understanding" questions are &lt;strong&gt;graph queries in disguise&lt;/strong&gt;. "What calls X?" is inbound traversal. "What does Y affect?" is outbound traversal. "Where is this HTTP route defined?" is a node lookup. None of these need an LLM to read source code line by line — they need a pre-built graph and a query engine. codebase-memory-mcp is that graph + query engine, exposed over MCP so any compatible agent can use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: "what calls ProcessOrder?"

Agent calls: trace_path(function_name="ProcessOrder", direction="inbound")

codebase-memory-mcp: executes graph query in &amp;lt;1ms, returns structured results

Agent: presents the call chain in plain English
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The deliberate design choice: &lt;strong&gt;no built-in LLM&lt;/strong&gt;. Other code graph tools embed one for natural-language-to-query translation, which means extra API keys, extra cost, and another model to configure. With MCP, the agent you're already talking to &lt;em&gt;is&lt;/em&gt; the query translator. codebase-memory-mcp is a pure structural analysis backend; the intelligence layer is whatever agent you point at it.&lt;/p&gt;

&lt;p&gt;What's inside the binary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;158 vendored tree-sitter grammars&lt;/strong&gt; compiled in — no installs, nothing that breaks on system updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid LSP semantic type resolution&lt;/strong&gt; for 11 languages — a lightweight C reimplementation of major language-server type-resolution algorithms (compatible with tsserver, pyright, gopls, Roslyn, JDT, rust-analyzer)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bundled Nomic &lt;code&gt;nomic-embed-code&lt;/code&gt; embeddings&lt;/strong&gt; (40K tokens, 768d int8) for semantic search — no API key, no Ollama, no Docker&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In-memory SQLite&lt;/strong&gt; with FTS5 full-text search and &lt;code&gt;cbm_camel_split&lt;/code&gt; tokenizer (camelCase / snake_case aware)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aho-Corasick&lt;/strong&gt; fused pattern matching for the indexing pipeline&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LZ4&lt;/strong&gt; compression for RAM-resident graph storage; memory released back to the OS after indexing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick start (60 seconds)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;One-line install (macOS / Linux):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the optional 3D graph visualization UI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | bash &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="nt"&gt;--ui&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Windows (PowerShell):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Invoke-WebRequest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-Uri&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;-OutFile&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;install.ps1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;\install.ps1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart your coding agent. Say &lt;strong&gt;"Index this project"&lt;/strong&gt; — done. The &lt;code&gt;install&lt;/code&gt; command auto-detects every coding agent on the box and writes the MCP server entry, instruction file, and any pre-tool hooks each one needs. On macOS it also handles quarantine attributes and ad-hoc codesigning automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 14 MCP tools (what your agent gets)
&lt;/h2&gt;

&lt;p&gt;A condensed map of what shows up when your agent connects:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;index_repository&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Build or refresh the graph for a path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search_graph&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Structural search: regex names, label filters, degree bounds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search_code&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Graph-augmented grep over indexed files only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;semantic_query&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Vector search via bundled Nomic embeddings (11-signal scoring)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;trace_path&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Inbound or outbound traversal from any node&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;get_architecture&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Languages, packages, entry points, routes, hotspots, clusters in one call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;detect_changes&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Git diff → affected symbols + risk classification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;manage_adr&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Persist Architecture Decision Records across sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cypher_query&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cypher-like graph queries (&lt;code&gt;MATCH (f:Function)-[:CALLS]-&amp;gt;(g)...&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dead_code&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Functions with zero callers, excluding entry points&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Plus four more for cross-service linking and graph maintenance. The full list is in the README.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real code: a Cypher-like query against your own codebase
&lt;/h2&gt;

&lt;p&gt;Once installed and indexed, your agent can do this directly (and so can you, via CLI):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codebase-memory-mcp cli cypher_query &lt;span class="s1"&gt;'{
  "query": "MATCH (f:Function)-[:CALLS]-&amp;gt;(g:Function) WHERE f.name = '&lt;/span&gt;&lt;span class="se"&gt;\'&lt;/span&gt;&lt;span class="s1"&gt;'handleRequest'&lt;/span&gt;&lt;span class="se"&gt;\'&lt;/span&gt;&lt;span class="s1"&gt;' RETURN g.name, g.file_path"
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or a regex name search across 158 languages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;codebase-memory-mcp cli search_graph &lt;span class="s1"&gt;'{"name_pattern": ".*Handler.*", "label": "Function"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For agents, the same calls go through MCP and return structured JSON the agent stitches into a natural-language answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cross-service intelligence (the part that surprised me)
&lt;/h2&gt;

&lt;p&gt;Most code-graph tools stop at the single-repo level. codebase-memory-mcp goes further:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;HTTP routes ↔ call sites&lt;/strong&gt; — links REST handlers to the code that calls them, with a confidence score.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gRPC, GraphQL, tRPC&lt;/strong&gt; service detection, including protobuf &lt;code&gt;Route&lt;/code&gt; node extraction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Channel detection&lt;/strong&gt; — &lt;code&gt;EMITS&lt;/code&gt; / &lt;code&gt;LISTENS_ON&lt;/code&gt; edges for Socket.IO, EventEmitter, and generic pub-sub across 8 languages, with constant resolution so &lt;code&gt;EVENTS.USER_CREATED&lt;/code&gt; is matched correctly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-repo edges&lt;/strong&gt; (&lt;code&gt;CROSS_*&lt;/code&gt;) — index multiple repos under the same store and the graph stitches them together. The optional 3D UI variant has a multi-galaxy layout for this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure-as-code&lt;/strong&gt; — Dockerfiles, Kubernetes manifests, and Kustomize overlays are first-class graph nodes with &lt;code&gt;Resource&lt;/code&gt; and &lt;code&gt;Module&lt;/code&gt; types and &lt;code&gt;IMPORTS&lt;/code&gt; edges.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you operate a service mesh and your agent has access to the meshed repos, this turns "what services consume the new auth header?" into a single graph query instead of a half-day grep session across 12 repos.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance (M3 Pro)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Linux kernel full index&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3 min&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;28M LOC, 75K files → 4.81M nodes, 7.72M edges&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Linux kernel fast index&lt;/td&gt;
&lt;td&gt;1m 12s&lt;/td&gt;
&lt;td&gt;1.88M nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Django full index&lt;/td&gt;
&lt;td&gt;~6s&lt;/td&gt;
&lt;td&gt;49K nodes, 196K edges&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cypher query&lt;/td&gt;
&lt;td&gt;&amp;lt;1ms&lt;/td&gt;
&lt;td&gt;Relationship traversal&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Name search (regex)&lt;/td&gt;
&lt;td&gt;&amp;lt;10ms&lt;/td&gt;
&lt;td&gt;SQL LIKE pre-filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dead code detection&lt;/td&gt;
&lt;td&gt;~150ms&lt;/td&gt;
&lt;td&gt;Full graph scan + degree filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trace call path (depth=5)&lt;/td&gt;
&lt;td&gt;&amp;lt;10ms&lt;/td&gt;
&lt;td&gt;BFS traversal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The RAM-first pipeline is unusual: LZ4-compressed reads, in-memory SQLite, single dump at the end, and the process releases memory back to the OS after indexing completes. Persistent storage lives in &lt;code&gt;~/.cache/codebase-memory-mcp/&lt;/code&gt; and a background watcher does git-aware incremental re-indexing when files change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Team-shared graph artifact (skip the reindex)
&lt;/h2&gt;

&lt;p&gt;A clever bit of operational design: &lt;code&gt;.codebase-memory/graph.db.zst&lt;/code&gt; is an optional, zstd-compressed snapshot of the knowledge graph that you can &lt;strong&gt;commit to your repo&lt;/strong&gt;. When a teammate clones and runs &lt;code&gt;codebase-memory-mcp&lt;/code&gt; for the first time, the artifact is decompressed and only the local diff is incrementally indexed — no full reindex.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Format:&lt;/strong&gt; SQLite with indexes stripped, &lt;code&gt;VACUUM INTO&lt;/code&gt; compacted, zstd 1.5.7 compressed (8–13:1 typical)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two tiers:&lt;/strong&gt; &lt;code&gt;best&lt;/code&gt; (&lt;code&gt;zstd -9&lt;/code&gt;) on explicit &lt;code&gt;index_repository&lt;/code&gt;, &lt;code&gt;fast&lt;/code&gt; (&lt;code&gt;zstd -3&lt;/code&gt;) by the watcher&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No merge pain:&lt;/strong&gt; &lt;code&gt;.gitattributes&lt;/code&gt; auto-writes &lt;code&gt;merge=ours&lt;/code&gt; for the artifact on first export&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Opt-in:&lt;/strong&gt; if you'd rather have everyone reindex from scratch, add &lt;code&gt;.codebase-memory/&lt;/code&gt; to &lt;code&gt;.gitignore&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is similar in spirit to graphify's &lt;code&gt;graphify-out/&lt;/code&gt; directory but as a single compressed file with explicit two-tier export and integrity-checked import.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community reactions
&lt;/h2&gt;

&lt;p&gt;The reception is unusually strong for a tool that does one thing well:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Trending #3 in the AI/dev category this week&lt;/strong&gt; with 4,212 stars in seven days on top of a base of ~4.5K.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5,604 tests passing&lt;/strong&gt;, &lt;strong&gt;SLSA Level 3&lt;/strong&gt; build provenance, &lt;strong&gt;OpenSSF Scorecard&lt;/strong&gt; badged, and every release scanned by &lt;strong&gt;70+ antivirus engines via VirusTotal&lt;/strong&gt; — unusually serious supply-chain hygiene for a 4-week-old viral project.&lt;/li&gt;
&lt;li&gt;The accompanying &lt;strong&gt;arXiv preprint&lt;/strong&gt; with reproducible benchmarks lends real credibility — most "code graph for AI" projects make claims; this one publishes the methodology.&lt;/li&gt;
&lt;li&gt;Hacker News and Reddit r/LocalLLaMA discussions have focused on the &lt;strong&gt;pure-C, zero-dependency&lt;/strong&gt; angle as the differentiator vs. graphify (TypeScript/Node) and similar tooling. Single static binary + 158 vendored grammars is genuinely operationally easier.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The skeptical takes are worth holding in mind too: the arXiv paper's &lt;strong&gt;83% answer quality vs. 92% for file exploration&lt;/strong&gt; is a real 9-point drop. For exploratory questions where the agent needs to read prose comments or inline docstrings, raw file access still wins on quality. The right mental model is "graph queries for structural questions, file reads for narrative questions" — and the project actively encourages this split.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No built-in LLM, by design.&lt;/strong&gt; You need an MCP-compatible agent. If your stack doesn't speak MCP, this isn't for you (yet).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;9-point answer-quality drop&lt;/strong&gt; vs. file exploration on the arXiv benchmark. Token savings buy you 99% off the bill, not 99% off the work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid LSP covers 11 languages&lt;/strong&gt;, not all 158. The other 147 languages get tree-sitter AST parsing only, which is excellent for structure but weaker on type resolution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows SmartScreen&lt;/strong&gt; will warn on the unsigned binary the first time you run it — expected, mitigated by published SHA-256 checksums and VirusTotal scans.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph UI is a separate binary variant.&lt;/strong&gt; If you want the 3D visualization at &lt;code&gt;localhost:9749&lt;/code&gt;, you need the &lt;code&gt;-ui-&lt;/code&gt; archive, not the standard one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indexing is RAM-hungry mid-run.&lt;/strong&gt; On a 28M-LOC monorepo you'll want headroom (no pun intended) even though memory is released after the indexing pass completes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to use codebase-memory-mcp, when to skip
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Great fit if you…&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Claude Code, Codex, Cursor, or any MCP-compatible agent on a non-trivial codebase.&lt;/li&gt;
&lt;li&gt;Pay real money for tokens on structural questions ("what calls X?", "what does Y affect?", "where is route Z defined?").&lt;/li&gt;
&lt;li&gt;Operate multiple repos or a service mesh and want cross-repo edges.&lt;/li&gt;
&lt;li&gt;Want a single binary you can &lt;code&gt;install&lt;/code&gt; and forget — no Docker, no API keys, no runtime.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Skip it if you…&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Work entirely in a single small repo where grep-and-read is already cheap.&lt;/li&gt;
&lt;li&gt;Use an agent stack that doesn't speak MCP.&lt;/li&gt;
&lt;li&gt;Need 92%+ answer quality on long, narrative-style questions where reading inline comments matters more than structure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: How does this compare to graphify, Understand-Anything, or other "code knowledge graph" tools?&lt;/strong&gt;&lt;br&gt;
A: Same problem, different operational profile. Most alternatives are Node/TypeScript with &lt;code&gt;npm install&lt;/code&gt; chains; codebase-memory-mcp is pure C as a single static binary with 158 grammars and Hybrid LSP for 11 languages compiled in. The cross-service HTTP/gRPC/GraphQL linking and the IaC indexing (Dockerfiles, K8s, Kustomize) are also broader than what most competitors ship. The arXiv preprint makes the benchmarks reproducible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does my code leave my machine?&lt;/strong&gt;&lt;br&gt;
A: No. All processing happens locally. The bundled Nomic embedding model is compiled into the binary; SQLite storage lives in &lt;code&gt;~/.cache/codebase-memory-mcp/&lt;/code&gt;. The only outbound traffic is an optional startup update check, which can be disabled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How big is the index?&lt;/strong&gt;&lt;br&gt;
A: It depends on graph density, but typical mid-size repos compress to a few MB in the &lt;code&gt;.codebase-memory/graph.db.zst&lt;/code&gt; artifact. The Linux kernel produces 4.81M nodes and 7.72M edges — large, but still queryable in milliseconds because SQLite is in-memory during operation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does it work with self-hosted models like Llama via Ollama?&lt;/strong&gt;&lt;br&gt;
A: Yes — through whichever MCP-compatible agent you use to drive it. The MCP server is model-agnostic; it just answers graph queries. Claude Code, Codex, Cursor, OpenCode, and others all work, and several of them support routing to local models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Is the team-shared &lt;code&gt;graph.db.zst&lt;/code&gt; artifact safe to commit?&lt;/strong&gt;&lt;br&gt;
A: Yes, if you want to. It's an opaque SQLite snapshot, and the auto-written &lt;code&gt;.gitattributes&lt;/code&gt; line uses &lt;code&gt;merge=ours&lt;/code&gt; so concurrent edits don't produce binary-merge conflicts. The savings — teammates skipping a full reindex on first run — are usually worth the few MB.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What if my project is a polyglot monorepo?&lt;/strong&gt;&lt;br&gt;
A: That's the sweet spot. Multi-language manifest resolution (&lt;code&gt;package.json&lt;/code&gt;, &lt;code&gt;go.mod&lt;/code&gt;, &lt;code&gt;Cargo.toml&lt;/code&gt;, &lt;code&gt;pyproject.toml&lt;/code&gt;, &lt;code&gt;composer.json&lt;/code&gt;, &lt;code&gt;pubspec.yaml&lt;/code&gt;, &lt;code&gt;pom.xml&lt;/code&gt;, &lt;code&gt;build.gradle&lt;/code&gt;, &lt;code&gt;mix.exs&lt;/code&gt;, &lt;code&gt;*.gemspec&lt;/code&gt;) is built in, and &lt;code&gt;CROSS_*&lt;/code&gt; edges link nodes across the indexed fleet. Cross-service &lt;code&gt;HTTP_CALLS&lt;/code&gt; and &lt;code&gt;EMITS&lt;/code&gt; / &lt;code&gt;LISTENS_ON&lt;/code&gt; edges connect services that talk over HTTP, gRPC, GraphQL, tRPC, or pub-sub channels.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it today
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/DeusData/codebase-memory-mcp/main/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart Claude Code (or any agent in the supported list), and tell it: &lt;strong&gt;"Index this project."&lt;/strong&gt; Then ask the question that usually triggers a 50-file grep tour — "what calls our auth middleware?", "what's affected by changing this DB schema?" — and watch the agent answer from a single &lt;code&gt;trace_path&lt;/code&gt; call instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/DeusData/codebase-memory-mcp" rel="noopener noreferrer"&gt;github.com/DeusData/codebase-memory-mcp&lt;/a&gt; · &lt;strong&gt;Paper:&lt;/strong&gt; &lt;a href="https://arxiv.org/abs/2603.27277" rel="noopener noreferrer"&gt;arXiv:2603.27277&lt;/a&gt; · &lt;strong&gt;License:&lt;/strong&gt; MIT&lt;/p&gt;

</description>
      <category>codebasememorymcp</category>
      <category>mcpserver</category>
      <category>codeknowledgegraph</category>
      <category>aicodingagents</category>
    </item>
    <item>
      <title>agentsview Review: Local Analytics for 20+ Coding Agents</title>
      <dc:creator>Andrew</dc:creator>
      <pubDate>Fri, 19 Jun 2026 10:12:07 +0000</pubDate>
      <link>https://dev.to/andrew-ooo/agentsview-review-local-analytics-for-20-coding-agents-4l4d</link>
      <guid>https://dev.to/andrew-ooo/agentsview-review-local-analytics-for-20-coding-agents-4l4d</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Originally published on &lt;a href="https://andrew.ooo/posts/agentsview-coding-agent-session-analytics-review/" rel="noopener noreferrer"&gt;andrew.ooo&lt;/a&gt;&lt;/strong&gt; — visit the original for any updates, code snippets that aged out, or follow-up posts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;agentsview&lt;/strong&gt; is a single-binary, local-first analytics tool that &lt;strong&gt;indexes the session files of 20+ AI coding agents&lt;/strong&gt; — Claude Code, Codex, Cursor, Copilot CLI, Gemini, Amp, Aider, Forge, Kilo, Kiro, iFlow, gptme, and more — into a local SQLite database and gives you &lt;strong&gt;full-text search, cost tracking, activity heatmaps, and a web dashboard&lt;/strong&gt; at &lt;code&gt;http://127.0.0.1:8080&lt;/code&gt;. It bills itself (correctly) as a &lt;strong&gt;"100× faster replacement for ccusage,"&lt;/strong&gt; because it indexes session JSONL once instead of re-parsing on every query. The repo is &lt;strong&gt;2,901 stars and adding 1,382 this week&lt;/strong&gt;, written in Go, MIT-licensed, and runs as one binary with no accounts, no telemetry, and no cloud dependency.&lt;/p&gt;

&lt;p&gt;The pitch is sharp: if you use more than one coding agent — and almost everyone does in 2026 — there is no single place to see what your sessions did, how much they cost, or what they actually wrote. ccusage covers Claude Code. The OpenAI dashboard covers Codex. Cursor's billing page covers Cursor. agentsview is the first tool I've seen that puts all of them in one searchable database, locally, with no API keys.&lt;/p&gt;

&lt;p&gt;Key facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;2,901 GitHub stars&lt;/strong&gt;, &lt;strong&gt;1,382 added this week&lt;/strong&gt;, written in &lt;strong&gt;Go&lt;/strong&gt; (single static binary)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;20+ supported agents&lt;/strong&gt; auto-discovered from their session directories on first run&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full-text search&lt;/strong&gt; across all messages via SQLite FTS5&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-agent, per-model, per-session cost tracking&lt;/strong&gt; using live LiteLLM pricing (offline fallback)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt-caching-aware&lt;/strong&gt; cost math (separates cache-creation vs cache-read tokens)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live updates&lt;/strong&gt; via Server-Sent Events as your active sessions append messages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Desktop apps&lt;/strong&gt; for macOS and Windows, plus Homebrew cask, plus Docker, plus a curl-pipe-bash installer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MIT licensed&lt;/strong&gt;, runs entirely on &lt;code&gt;127.0.0.1&lt;/code&gt; with &lt;code&gt;Host&lt;/code&gt;-header DNS-rebinding protection by default&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why this is a real problem
&lt;/h2&gt;

&lt;p&gt;If you've watched your monthly bills over the last six months, you know the shape of this problem. A typical 2026 engineer's setup looks like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; in the IDE for the heavy refactors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex&lt;/strong&gt; in the terminal for one-shot scripts and CI debugging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cursor&lt;/strong&gt; for inline edits inside the editor&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Copilot CLI&lt;/strong&gt; in &lt;code&gt;gh&lt;/code&gt; for repo automation&lt;/li&gt;
&lt;li&gt;Maybe &lt;strong&gt;Aider&lt;/strong&gt; or &lt;strong&gt;Forge&lt;/strong&gt; or &lt;strong&gt;Gemini CLI&lt;/strong&gt; on the side&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's five different APIs, five different billing dashboards, five different session formats — and zero ways to ask "what did I work on yesterday?" without opening five different things. Worse, the existing cost-tracking tools (&lt;code&gt;ccusage&lt;/code&gt; for Claude, the OpenAI usage dashboard for Codex, etc.) each only know about &lt;em&gt;one&lt;/em&gt; agent and re-parse files on every invocation, which means asking "how much did I spend across all my agents this month?" is a half-hour data-engineering exercise.&lt;/p&gt;

&lt;p&gt;agentsview is the obvious-in-retrospect fix. It walks &lt;code&gt;~/.claude/projects/&lt;/code&gt;, &lt;code&gt;~/.codex/sessions/&lt;/code&gt;, &lt;code&gt;~/.cursor/projects/&lt;/code&gt;, and the 17+ other session directories at startup, parses everything once into SQLite, and then everything you'd want — search, daily-spend charts, per-project breakdowns — is a single SQL query or one CLI call against an already-indexed database.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install in 60 seconds
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS / Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://agentsview.io/install.sh | bash

&lt;span class="c"&gt;# Windows&lt;/span&gt;
powershell &lt;span class="nt"&gt;-ExecutionPolicy&lt;/span&gt; ByPass &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"irm https://agentsview.io/install.ps1 | iex"&lt;/span&gt;

&lt;span class="c"&gt;# Or Homebrew (macOS, includes the desktop app)&lt;/span&gt;
brew &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--cask&lt;/span&gt; agentsview

&lt;span class="c"&gt;# Or Docker&lt;/span&gt;
docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 127.0.0.1:8080:8080 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; agentsview-data:/data &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/.claude/projects:/agents/claude:ro"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/.codex/sessions:/agents/codex:ro"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;CLAUDE_PROJECTS_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/agents/claude &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;CODEX_SESSIONS_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/agents/codex &lt;span class="se"&gt;\&lt;/span&gt;
  ghcr.io/kenn-io/agentsview:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentsview serve &lt;span class="nt"&gt;--background&lt;/span&gt;     &lt;span class="c"&gt;# start the dashboard server&lt;/span&gt;
agentsview serve status           &lt;span class="c"&gt;# confirm it's running&lt;/span&gt;
open http://127.0.0.1:8080        &lt;span class="c"&gt;# browse sessions in the web UI&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the entire setup. On first run, it discovers every session directory that exists on your machine, imports the lot into &lt;code&gt;~/.agentsview/sessions.db&lt;/code&gt;, and starts serving the dashboard. Background mode logs to &lt;code&gt;~/.agentsview/serve.log&lt;/code&gt; and prints the PID so you can &lt;code&gt;agentsview serve stop&lt;/code&gt; later.&lt;/p&gt;

&lt;h2&gt;
  
  
  The killer feature: &lt;code&gt;agentsview usage daily&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;This is the one most people will install for. ccusage already exists for Claude Code, but it only covers Claude Code and it re-parses raw JSONL on every invocation. &lt;code&gt;agentsview usage&lt;/code&gt; covers everything and queries the already-built SQLite index — &lt;strong&gt;the docs claim 100× faster, and in practice it's instant.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Daily cost summary (last 30 days, all agents)&lt;/span&gt;
agentsview usage daily

&lt;span class="c"&gt;# Per-model breakdown&lt;/span&gt;
agentsview usage daily &lt;span class="nt"&gt;--breakdown&lt;/span&gt;

&lt;span class="c"&gt;# Filter to one agent, one date range&lt;/span&gt;
agentsview usage daily &lt;span class="nt"&gt;--agent&lt;/span&gt; claude &lt;span class="nt"&gt;--since&lt;/span&gt; 2026-04-01 &lt;span class="nt"&gt;--until&lt;/span&gt; 2026-05-01

&lt;span class="c"&gt;# One-line for shell prompts (works in starship, oh-my-zsh, fish)&lt;/span&gt;
agentsview usage statusline

&lt;span class="c"&gt;# JSON for scripts&lt;/span&gt;
agentsview usage daily &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;--json&lt;/span&gt; | jq &lt;span class="s1"&gt;'.totals'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It uses &lt;strong&gt;LiteLLM pricing tables&lt;/strong&gt; for live rates with an offline fallback, and it correctly separates &lt;strong&gt;cache-creation tokens, cache-read tokens, input tokens, and output tokens&lt;/strong&gt; — which is the part &lt;code&gt;ccusage&lt;/code&gt; and the OpenAI dashboard tend to get wrong. If you've ever wondered why your Anthropic bill is half what your token count suggests, that's prompt caching; agentsview's math actually models it.&lt;/p&gt;

&lt;p&gt;The standalone usage mode is particularly nice for CI: you don't need to run the server, you can just SSH onto a build box and &lt;code&gt;agentsview usage daily --json&lt;/code&gt; to dump cost data into a Prometheus exporter or a Slack alert.&lt;/p&gt;

&lt;h2&gt;
  
  
  Full-text search across every session you've ever had
&lt;/h2&gt;

&lt;p&gt;The other killer feature, and the one I personally got the most value out of: &lt;code&gt;Cmd+K&lt;/code&gt; in the web UI does &lt;strong&gt;SQLite FTS5 search across every message in every session from every agent.&lt;/strong&gt; "what did I do with the Stripe webhook handler last March?" returns results from your Claude Code session, your Codex transcript, and your Cursor edits, all in one ranked list.&lt;/p&gt;

&lt;p&gt;Other navigation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;j&lt;/code&gt; / &lt;code&gt;k&lt;/code&gt; — move between sessions (Vim-style)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[&lt;/code&gt; / &lt;code&gt;]&lt;/code&gt; — page through messages within a session&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;?&lt;/code&gt; — show all keyboard shortcuts&lt;/li&gt;
&lt;li&gt;Sessions can be &lt;strong&gt;exported as HTML&lt;/strong&gt; or &lt;strong&gt;published to a GitHub Gist&lt;/strong&gt; with one click&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's also a live SSE channel so sessions you're actively running show new messages as they arrive — useful if you've got a long-running Codex agent in another tab and want to watch its progress without alt-tabbing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the dashboard actually shows
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentsview stats     &lt;span class="c"&gt;# human-readable summary over last 28 days&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output is a structured analytics report with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Session archetypes&lt;/strong&gt; — automation vs quick vs standard vs deep vs marathon, based on duration + message count + tools-per-turn&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache economics&lt;/strong&gt; — how much you saved (or didn't) from Anthropic's prompt cache&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool / model / agent mix&lt;/strong&gt; — which tools you actually use vs which you installed and forgot&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporal heatmap&lt;/strong&gt; — your coding activity by hour-of-day and day-of-week&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Peak context tokens&lt;/strong&gt; — how close you got to the model's window in each session&lt;/li&gt;
&lt;li&gt;Opt-in &lt;strong&gt;git outcomes&lt;/strong&gt; with &lt;code&gt;--include-git-outcomes&lt;/code&gt; (commits / LOC / files changed per session)&lt;/li&gt;
&lt;li&gt;Opt-in &lt;strong&gt;GitHub PR outcomes&lt;/strong&gt; with &lt;code&gt;--include-github-outcomes&lt;/code&gt; (calls &lt;code&gt;gh&lt;/code&gt; to count merged PRs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The git/GitHub outcomes are off by default because they're slow on large repos, but they're the part that turns "I spent 12 hours in Claude Code last week" into "I spent 12 hours in Claude Code last week and shipped 7 PRs across 3 repos." That's the metric that matters.&lt;/p&gt;

&lt;p&gt;The JSON output is &lt;strong&gt;versioned (&lt;code&gt;schema_version: 1&lt;/code&gt;)&lt;/strong&gt;, which is the kind of detail that tells you this project is built for downstream consumers — you can pipe it into Grafana, into a status bar plugin, into a weekly self-review email — without worrying about breakage.&lt;/p&gt;

&lt;h2&gt;
  
  
  The full supported-agent list
&lt;/h2&gt;

&lt;p&gt;agentsview auto-discovers sessions from all of these on first run:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Session directory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Amp&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.local/share/amp/threads/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Antigravity&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.gemini/antigravity/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Antigravity CLI&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.gemini/antigravity-cli/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.claude/projects/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Cowork&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/Library/Application Support/Claude/local-agent-mode-sessions/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.codex/sessions/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Copilot CLI&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.copilot/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cortex Code&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.snowflake/cortex/conversations/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.cursor/projects/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek TUI&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;~/.codewhale/sessions/&lt;/code&gt;, &lt;code&gt;~/.deepseek/sessions/&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Forge&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.forge/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini CLI&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.gemini/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gptme&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.local/share/gptme/logs/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hermes Agent&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.hermes/sessions/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;iFlow&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.iflow/projects/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kilo&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.local/share/kilo/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.kimi/sessions/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kiro CLI / Kiro IDE&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;~/.kiro/sessions/cli/&lt;/code&gt;, &lt;code&gt;~/Library/Application Support/Kiro/&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's already comprehensive, and the project is adding new agents fast — the commit history shows a typical week ships support for 1–2 more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Remote access (the part most people get wrong)
&lt;/h2&gt;

&lt;p&gt;agentsview binds to &lt;strong&gt;&lt;code&gt;127.0.0.1&lt;/code&gt; only&lt;/strong&gt; and validates the request &lt;code&gt;Host&lt;/code&gt; header to protect against DNS-rebinding attacks. That's correct security defaults, but it means that if you try to reach it over SSH port forwarding or from a remote dev environment (exe.dev, Codespaces, Coder, WSL2), API calls get rejected with &lt;code&gt;403 Forbidden&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The fix is documented and clean:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Browser opens http://127.0.0.1:18080 via `ssh -L 18080:127.0.0.1:8080 host`&lt;/span&gt;
agentsview serve &lt;span class="nt"&gt;--public-url&lt;/span&gt; http://127.0.0.1:18080

&lt;span class="c"&gt;# Browser opens a forwarded hostname like https://your-workspace.exe.dev&lt;/span&gt;
agentsview serve &lt;span class="nt"&gt;--public-url&lt;/span&gt; https://your-workspace.exe.dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For any setup that genuinely exposes the UI beyond loopback, also pass &lt;code&gt;--require-auth&lt;/code&gt;. The Docker compose example explicitly publishes only on &lt;code&gt;127.0.0.1&lt;/code&gt; for the same reason.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;

&lt;p&gt;A few caveats before you &lt;code&gt;brew install --cask&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQLite is fine until it isn't.&lt;/strong&gt; The default backend works for tens of thousands of sessions. If you're an enterprise team mirroring sessions from multiple machines, agentsview supports a &lt;strong&gt;PostgreSQL backend&lt;/strong&gt; (&lt;code&gt;PG_SERVE=1&lt;/code&gt;, &lt;code&gt;AGENTSVIEW_PG_URL=...&lt;/code&gt;) and an experimental &lt;strong&gt;DuckDB mirror&lt;/strong&gt; with a "Quack" remote-query layer. Those are documented but newer; expect rough edges if you're the first person at your company to use them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Containerized sessions only see what you mount.&lt;/strong&gt; If you run agentsview in Docker, it can only discover the agent directories you explicitly mount as volumes. Forget to mount Codex, and Codex won't appear in the UI. The README is clear about this — but it bit me the first time, so worth flagging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Git outcomes are opt-in for a reason.&lt;/strong&gt; &lt;code&gt;--include-git-outcomes&lt;/code&gt; runs &lt;code&gt;git log&lt;/code&gt; and &lt;code&gt;git diff --shortstat&lt;/code&gt; across your repos to attribute LOC changes to sessions. On a 100K-file monorepo this is slow. Don't enable it by default; run it on a cron once a day if you want the data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Per-session cost can be &lt;code&gt;null&lt;/code&gt; for new models.&lt;/strong&gt; Pricing comes from LiteLLM's rate table. If you're using a freshly released Anthropic or Bedrock model that LiteLLM hasn't picked up yet, &lt;code&gt;has_cost&lt;/code&gt; will be &lt;code&gt;false&lt;/code&gt; and the cost column will be empty for those sessions. The fix is upstream — &lt;code&gt;pip install -U litellm&lt;/code&gt; and re-run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker container runs as root.&lt;/strong&gt; The default image runs as root, which is fine inside a container but means bind-mounting &lt;code&gt;/data&lt;/code&gt; to your host home directory creates root-owned files. Prefer named volumes, or pre-create the directory with the ownership you want.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: How does this compare to ccusage?&lt;/strong&gt;&lt;br&gt;
ccusage is Claude-Code-only and re-parses JSONL on every invocation. agentsview covers 20+ agents, indexes once into SQLite, and queries are ~100× faster. The cost math is also more accurate because agentsview separates cache-creation tokens from cache-read tokens; ccusage doesn't. If you only use Claude Code, ccusage is fine. If you use anything else alongside it, switch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does it phone home or upload my session data anywhere?&lt;/strong&gt;&lt;br&gt;
No. The binary binds to &lt;code&gt;127.0.0.1&lt;/code&gt;, there are no accounts, no telemetry, and no cloud component. Everything lives in &lt;code&gt;~/.agentsview/sessions.db&lt;/code&gt;. The optional GitHub Gist export only runs when you click "publish" on a specific session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Will it work with self-hosted models (Ollama, vLLM, LM Studio)?&lt;/strong&gt;&lt;br&gt;
Sessions from agents that talk to self-hosted models still get indexed — the message content, tool calls, and metadata all show up. Cost tracking won't return USD figures for them (since the models are free), but token counts and session structure are fully captured.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Can I use it on a server that runs my CI agents?&lt;/strong&gt;&lt;br&gt;
Yes. The standalone &lt;code&gt;agentsview usage daily --json&lt;/code&gt; mode doesn't require the server — you can SSH into a build box, run it, and pipe the JSON into Prometheus or a Slack webhook. If you want the web UI on a CI host, run with &lt;code&gt;--require-auth&lt;/code&gt; and &lt;code&gt;--public-url&lt;/code&gt; set to the hostname.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What happens to my data if I uninstall?&lt;/strong&gt;&lt;br&gt;
The session files in &lt;code&gt;~/.claude/projects/&lt;/code&gt; etc. are owned by the agents themselves, not by agentsview — uninstalling agentsview leaves them untouched. The only thing you'd delete is &lt;code&gt;~/.agentsview/&lt;/code&gt; (the SQLite index and logs).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Is there an MCP server so agents can query their own history?&lt;/strong&gt;&lt;br&gt;
Not yet, but it's the obvious next step and there's an open issue for it. For now, agents can hit the REST API at &lt;code&gt;http://127.0.0.1:8080/api/v1/sessions/...&lt;/code&gt; if you want to wire it up manually.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;agentsview is the rare GitHub-trending repo that solves an actually-painful problem and is engineered like a finished product, not a weekend hack. The default security posture is correct (loopback + Host validation). The CLI is scriptable and the JSON output is versioned. The cost math models prompt caching properly. The session-archetype analytics turn raw usage data into something you can reason about.&lt;/p&gt;

&lt;p&gt;If you use more than one AI coding agent — and in mid-2026 that's most people — install it today: &lt;code&gt;brew install --cask agentsview&lt;/code&gt;, then &lt;code&gt;agentsview serve --background&lt;/code&gt;. The dashboard at &lt;code&gt;http://127.0.0.1:8080&lt;/code&gt; will pay for itself the first time you ask "what was I doing in Codex three weeks ago?" and get an answer in one keystroke.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/kenn-io/agentsview" rel="noopener noreferrer"&gt;github.com/kenn-io/agentsview&lt;/a&gt;. Docs and screenshots: &lt;a href="https://agentsview.io" rel="noopener noreferrer"&gt;agentsview.io&lt;/a&gt;. MIT licensed, Go, one binary. At 1,382 stars added this week, it's on track to be the default cross-agent session viewer of the year.&lt;/p&gt;

</description>
      <category>agentsview</category>
      <category>aicodingagents</category>
      <category>claudecode</category>
      <category>codex</category>
    </item>
    <item>
      <title>Vibe-Trading Review: HKU's Open-Source Trading Agent</title>
      <dc:creator>Andrew</dc:creator>
      <pubDate>Thu, 18 Jun 2026 10:09:25 +0000</pubDate>
      <link>https://dev.to/andrew-ooo/vibe-trading-review-hkus-open-source-trading-agent-3bj5</link>
      <guid>https://dev.to/andrew-ooo/vibe-trading-review-hkus-open-source-trading-agent-3bj5</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Originally published on &lt;a href="https://andrew.ooo/posts/vibe-trading-hkuds-personal-trading-agent-review/" rel="noopener noreferrer"&gt;andrew.ooo&lt;/a&gt;&lt;/strong&gt; — visit the original for any updates, code snippets that aged out, or follow-up posts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Vibe-Trading&lt;/strong&gt; is HKU Data Science Lab's open-source trading agent that turns natural-language prompts into runnable market research, backtests, alpha benches, and — if you opt in — bounded live orders through a broker you authorize. It's been live since April 2026, is &lt;strong&gt;MIT licensed&lt;/strong&gt;, sits at &lt;strong&gt;12,526 GitHub stars&lt;/strong&gt; (2,409 forks, 9 open issues), and has shipped daily updates through June.&lt;/p&gt;

&lt;p&gt;The pitch is simple: most "AI trading" projects are either a thin ChatGPT wrapper that hallucinates ticker prices, or a backtest framework with no LLM in front. Vibe-Trading is one of the first that actually combines a real agent loop (48 tools, 77 finance skills, 29 multi-agent presets) with the boring infrastructure quant work actually needs — point-in-time data loaders, lookahead-banned alphas, walk-forward validation, a kill-switch broker connector, and an audit ledger.&lt;/p&gt;

&lt;p&gt;Key facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;12,526 GitHub stars&lt;/strong&gt;, &lt;strong&gt;2,409 forks&lt;/strong&gt;, &lt;strong&gt;9 open issues&lt;/strong&gt; (extremely low for a project this size — they're actively triaging)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Created 2026-04-01&lt;/strong&gt;, hit PyPI on day 19; &lt;strong&gt;v0.1.9&lt;/strong&gt; is current&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MIT licensed&lt;/strong&gt;, FastAPI backend + React 19 frontend + Python 3.11+ agent core&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;48 agent tools, 77 finance skills, 29 swarm presets&lt;/strong&gt;, 452 pre-built alphas&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10 broker connectors&lt;/strong&gt; (IBKR, Robinhood, Alpaca, Tiger, Longbridge, OKX, Binance, Futu, Dhan, Shoonya) — most paper-only, a few gated for live with a mandate + kill switch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;13+ LLM providers&lt;/strong&gt; — Claude (Opus 4.8+), GPT, Gemini 3.x, Kimi, DeepSeek, Z.ai, MiniMax, Qwen, OpenRouter, Ollama, Codex, GLM/Zhipu, MoonShot&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bills itself as research-first&lt;/strong&gt; — never custody, never trades outside your committed limits&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's the rare "vibe-trading" repo that doesn't immediately fall apart when you actually try to ship a strategy from it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Builds This and Why It Matters
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;HKUDS&lt;/strong&gt; is the &lt;strong&gt;Data Science Lab at the University of Hong Kong&lt;/strong&gt;. Their previous open-source releases (LightRAG, GraphGPT, MMSSL) are among the most-cited graph + recommendation system repos on GitHub, and they have a track record of shipping research artifacts that actually run, not just &lt;code&gt;awesome-list&lt;/code&gt; README repos.&lt;/p&gt;

&lt;p&gt;That academic anchor is doing real work here. "AI trading bot" is the most-spammed category on GitHub. The thing that separates Vibe-Trading from the 10,000 LangChain-plus-yfinance demos is the lab actually understands point-in-time data, the lookahead-bias trap, and why a 191-alpha factor library needs an AST purity gate before it's safe to bench. Every alpha shipped in their Alpha Zoo (Qlib 158, Kakushadze Alpha101, GTJA 191, Fama-French 5 + Carhart) is paired with a license file and a same-universe random-control test to catch factors that just track market beta.&lt;/p&gt;

&lt;p&gt;The Trendshift trending badge plus 12.5K stars in 2.5 months also tells you something: this is filling a real gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "One Command to Empower Your Agent" Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;The whole pitch fits in one bash block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;vibe-trading-ai

&lt;span class="c"&gt;# Natural-language research&lt;/span&gt;
vibe-trading run &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"Backtest a BTC-USDT 20/50 moving-average strategy for 2024, summarize return and drawdown, then export the report"&lt;/span&gt;

&lt;span class="c"&gt;# Bench a pre-built alpha zoo (one line)&lt;/span&gt;
vibe-trading alpha bench &lt;span class="nt"&gt;--zoo&lt;/span&gt; gtja191 &lt;span class="nt"&gt;--universe&lt;/span&gt; csi300 &lt;span class="nt"&gt;--period&lt;/span&gt; 2018-2025 &lt;span class="nt"&gt;--top&lt;/span&gt; 20

&lt;span class="c"&gt;# Compare alphas head-to-head&lt;/span&gt;
vibe-trading alpha compare alpha012 alpha034 alpha101 &lt;span class="nt"&gt;--sort&lt;/span&gt; ir
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That second command is the differentiated one. It says: take the &lt;strong&gt;191 alphas from Guotai Junan's 2014 short-horizon factor report&lt;/strong&gt;, bench all of them on the &lt;strong&gt;CSI 300 universe&lt;/strong&gt; from &lt;strong&gt;2018 through 2025&lt;/strong&gt;, sort by Information Coefficient, and print the top 20. Three minutes ago that was a half-week of pandas yak-shaving. Now it's a single line, and every alpha has a license attribution baked into the result.&lt;/p&gt;

&lt;p&gt;The Web UI does the same thing with a chat interface, an Alpha Zoo "Compare" tab, and a live swarm dashboard. The CLI version stays the production surface because it's scriptable and the team treats &lt;code&gt;--version&lt;/code&gt;, &lt;code&gt;vibe-trading provider doctor&lt;/code&gt;, and &lt;code&gt;vibe-trading resume &amp;lt;session-id&amp;gt;&lt;/code&gt; like first-class citizens.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Four Things It Actually Does Differently
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Shadow Account — diagnose your own trades
&lt;/h3&gt;

&lt;p&gt;This is the feature that earned the project its &lt;code&gt;0.1.5&lt;/code&gt; release. Upload a CSV exported from your broker — Tongdaxin (同花顺), East Money (东方财富), Futu, or any generic format — and the agent extracts your behavior:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vibe-trading &lt;span class="nt"&gt;--upload&lt;/span&gt; trades_export.csv
vibe-trading run &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"Analyze my trading behavior, extract my shadow strategy, and compare it with my actual trades"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It produces a holding-days profile, win rate, PnL ratio, max drawdown, and four bias diagnostics (disposition effect, overtrading, momentum chasing, anchoring). Then it converts your recurring entries and exits into an explicit rule-based "shadow strategy," backtests it across markets, and shows you how much money your rule-breaking is costing. The output is an 8-section HTML/PDF report you can hand to a partner or archive for next quarter's review.&lt;/p&gt;

&lt;p&gt;This isn't a model. It's deterministic OHLCV feature evaluation, recently upgraded from a calendar-phase stub. It does the un-sexy thing well.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Connector-first broker architecture with a real kill switch
&lt;/h3&gt;

&lt;p&gt;The trading layer has 10 brokers and treats &lt;code&gt;paper&lt;/code&gt; vs &lt;code&gt;live&lt;/code&gt; as an attribute of the connector, not a global config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vibe-trading connector list
vibe-trading connector use robinhood
vibe-trading connector account
vibe-trading connector positions
vibe-trading connector orders
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For brokers whose API doesn't structurally separate paper from live (Longbridge, Dhan, Shoonya), live trading is &lt;strong&gt;structurally refused&lt;/strong&gt; at the first line of &lt;code&gt;place_order&lt;/code&gt;. For the five brokers that do support live (Robinhood Agentic, Tiger, Alpaca, OKX, Binance, Futu), there's a five-layer safety model:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;User-committed mandate&lt;/strong&gt; — symbol universe / order size / exposure / leverage / daily cap, all signed off explicitly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filesystem kill switch&lt;/strong&gt; — touch a file and every order halts before the next iteration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fail-closed pre-trade gate&lt;/strong&gt; — every order checked against the mandate before transit&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full audit ledger&lt;/strong&gt; — every decision and order written, no silent retries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-expiry&lt;/strong&gt; — the mandate dies on its own schedule unless you re-commit&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I've reviewed maybe 50 "agentic trading" repos this year. This is the only one I would actually trust with a $500 paper account, let alone a real one.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The Alpha Zoo — 452 pre-built alphas with attribution
&lt;/h3&gt;

&lt;p&gt;Pre-built alphas with proper licensing is the kind of work no one does because it's not glamorous. Vibe-Trading shipped four zoos:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Zoo&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Alphas&lt;/th&gt;
&lt;th&gt;License&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;qlib158&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Microsoft Qlib&lt;/td&gt;
&lt;td&gt;158&lt;/td&gt;
&lt;td&gt;Apache-2 (attributed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;alpha101&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Kakushadze 2015 paper (arXiv:1601.00991)&lt;/td&gt;
&lt;td&gt;101&lt;/td&gt;
&lt;td&gt;Paper rewrite&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gtja191&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Guotai Junan 2014 factor report&lt;/td&gt;
&lt;td&gt;191&lt;/td&gt;
&lt;td&gt;Source-cited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;academic&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Fama-French 5 + Carhart momentum&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Public-domain proxies&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Plus an &lt;strong&gt;AST purity gate&lt;/strong&gt; (no I/O, no globals, no lookahead operators) and &lt;code&gt;pytest-socket&lt;/code&gt; as a network kill-switch in tests. The &lt;code&gt;run_bench_strict()&lt;/code&gt; variant adds a same-universe random control and an OOS split to catch factors that just track market beta. This is the kind of plumbing PhD students at HKU spend a semester building. They open-sourced it.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Multi-agent swarms with real worker visibility
&lt;/h3&gt;

&lt;p&gt;The 29 swarm presets aren't toys. The &lt;code&gt;investment_committee&lt;/code&gt; preset runs a bull-bear debate, hands the conclusions to a risk reviewer, then to a portfolio manager for the final call. The &lt;code&gt;quant_strategy_desk&lt;/code&gt; chains a screener → factor researcher → backtester → risk auditor. Each worker has its own tool set, including a local &lt;code&gt;get_market_data&lt;/code&gt; tool that pulls through the same normalized loader as MCP — so you don't get the classic LangGraph failure where one worker hallucinates a price and contaminates the chain.&lt;/p&gt;

&lt;p&gt;Workers stream their state (&lt;code&gt;waiting / running / done / failed / blocked / retrying&lt;/code&gt;) into the chat timeline in real time, and a finished card rehydrates from the final &lt;code&gt;run_swarm&lt;/code&gt; result on reconnect. That last bit matters more than it sounds: most swarm frameworks treat a UI disconnect as a permanent state loss.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Limitations
&lt;/h2&gt;

&lt;p&gt;This isn't ProductHunt copy. Things that aren't great yet:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The terminology is awful.&lt;/strong&gt; "Vibe-Trading" is a name that sounds like a meme stock. Half the search results for "vibe trading" are TikTok how-to videos, not the repo. Discoverability suffers. (HKUDS clearly knows this — note the trendshift badge — but the name is sticky now.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Robinhood Agentic is the only live broker that's been verified end-to-end.&lt;/strong&gt; The others are paper-account or read-only by current design. If you want fully autonomous live execution on Interactive Brokers, you're still waiting on their official remote MCP path to stabilize.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost transparency is incomplete.&lt;/strong&gt; Per-run token usage now persists as &lt;code&gt;llm_usage.json&lt;/code&gt; (added 2026-06-14), but it's provider-reported only with no price estimation. You still have to multiply yourself.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider quirks bite.&lt;/strong&gt; Multiple recent releases are dedicated to fixing DeepSeek hangs, Kimi User-Agent rejections, and Gemini 3.x &lt;code&gt;thoughtSignature&lt;/code&gt; round-tripping. The "agent loop avoiding assistant-prefill messages rejected by Opus 4.8+" was a fix shipped on 2026-06-17. The capability layer is much better than v0.1.0, but if you run a less-common provider, expect to file an issue or two.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The repository is huge.&lt;/strong&gt; First-time clone + dev install is real work. The Docker image is the recommended on-ramp, and &lt;code&gt;vibe-trading init&lt;/code&gt; does the right thing for the bare-CLI path, but going from &lt;code&gt;pip install&lt;/code&gt; to a benched alpha on the CSI 300 universe is not a five-minute experience the first time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A-share bias.&lt;/strong&gt; The lab is at HKU and roughly half the loaders, brokers, and presets are tuned for Chinese markets. US-only users will use maybe a third of the features. (That's also a strength — the Chinese-market support is by far the best you'll find in any English-readable open-source project.)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How It Compares
&lt;/h2&gt;

&lt;p&gt;Three points of comparison worth naming:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;vs. FinRobot / FinGPT (AI4Finance Foundation):&lt;/strong&gt; AI4Finance's projects are still the most-starred in the category, but most of them are research artifacts that haven't shipped a working CLI in 6+ months. Vibe-Trading ships weekly, has live PyPI releases, and is much more "actually a tool you'd use" than "a paper with attached code."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;vs. CrewAI + a custom yfinance tool:&lt;/strong&gt; CrewAI is more general-purpose but you're building the entire alpha library, backtest engine, broker connector, and safety layer yourself. Vibe-Trading gives you all of that for free with the same multi-agent pattern.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;vs. QuantConnect Lean / Backtrader:&lt;/strong&gt; Lean and Backtrader are still the right answer if you already know exactly what backtest you want to run. Vibe-Trading is the right answer if you want to ask "is this strategy idea worth testing at all?" in plain English first, then iterate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community Reactions
&lt;/h2&gt;

&lt;p&gt;The repo's velocity speaks for itself: 12.5K stars in 79 days, 4 translated READMEs (Chinese, Japanese, Korean, Arabic), a Discord, a public wiki at &lt;a href="https://vibetrading.wiki/" rel="noopener noreferrer"&gt;vibetrading.wiki&lt;/a&gt;, and a Cloudflare Pages-hosted Alpha Library that auto-renders every shipped alpha.&lt;/p&gt;

&lt;p&gt;The PRs and issue triage tell a more interesting story. Almost every recent News entry is a fix credited to a community contributor (&lt;code&gt;thanks @BillDin&lt;/code&gt;, &lt;code&gt;thanks @LemonCANDY42&lt;/code&gt;, &lt;code&gt;thanks @Teerapat-Vatpitak&lt;/code&gt;, &lt;code&gt;thanks @mvanhorn&lt;/code&gt;, and many more). The 9 open issues number is real — the team aggressively triages and closes, which is the strongest signal that maintainer attention is high. A 12.5K-star repo with single-digit open issues two months in is something I see roughly once a year.&lt;/p&gt;

&lt;p&gt;There's a Hacker News thread asking "Anybody keep hearing about vibe-trading on TikTok and Reddit? Who has tried it" from October 2025 — predating this repo — that suggests the term itself was already in the air, which probably helped the project's initial discovery.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Vibe-Trading actually safe to use with a real brokerage account?
&lt;/h3&gt;

&lt;p&gt;The safety model is solid for what it claims: a committed mandate, a kill switch, an audit ledger, and structural refusal of live trading for brokers that can't separate paper from live. The team is explicit about this being &lt;strong&gt;experimental&lt;/strong&gt; and recommends paper accounts first. I would absolutely paper-trade it; I would only live-trade after running paper for at least a month with the same mandate I planned to use live. That advice is also exactly what their README says.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it handle lookahead bias and data leakage?
&lt;/h3&gt;

&lt;p&gt;This is where the academic provenance shows. Every alpha in the Alpha Zoo passes an AST purity check before it can be benched (no I/O, no globals, no lookahead operators), &lt;code&gt;pytest-socket&lt;/code&gt; is used as a network kill-switch in tests, and &lt;code&gt;run_bench_strict()&lt;/code&gt; adds a same-universe random control plus OOS split. Point-in-time financial-statement enrichment fails fast rather than silently falling back to raw bars. This is markedly better than 95% of the "AI quant" repos on GitHub.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use it with a local LLM via Ollama?
&lt;/h3&gt;

&lt;p&gt;Yes, and the Docker workflow even has a host-side Ollama default (&lt;code&gt;OLLAMA_BASE_URL=http://host.docker.internal:11434&lt;/code&gt;) that "just works" on Docker Desktop and Linux. Smaller local models will struggle with the multi-agent presets — those want a real frontier model — but for simple &lt;code&gt;vibe-trading run -p "&amp;lt;question&amp;gt;"&lt;/code&gt; calls and alpha benches, a quantized Qwen3 or Llama 3.5 8B is enough. Pair it with &lt;a href="https://dev.to/posts/whichllm-local-llm-hardware-ranker-review/"&gt;whichllm&lt;/a&gt; to pick the right model for your hardware before you commit.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between Vibe-Trading and just asking ChatGPT?
&lt;/h3&gt;

&lt;p&gt;ChatGPT will hallucinate prices, can't backtest, has no access to PIT financial statements, can't bench 191 alphas on a real universe, and can't issue a paper order to your broker. Vibe-Trading is the agent loop that uses an LLM as the planning brain and routes everything else through deterministic finance tools — backtest engines, data loaders, the alpha registry, broker connectors. It's the difference between "a model that knows about trading" and "a system that does trading work."&lt;/p&gt;

&lt;h2&gt;
  
  
  Should You Try It?
&lt;/h2&gt;

&lt;p&gt;Try it if you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Want to test a strategy idea quickly without writing the backtest plumbing yourself&lt;/li&gt;
&lt;li&gt;Have a broker CSV sitting in a folder and have been meaning to audit your own trading&lt;/li&gt;
&lt;li&gt;Work on Chinese or Hong Kong markets and want the best open-source coverage available&lt;/li&gt;
&lt;li&gt;Are research-curious about quant alphas but didn't want to spend two weeks on the AST safety layer&lt;/li&gt;
&lt;li&gt;Want a multi-LLM-provider agent that doesn't lock you into one API key&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Skip it if you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Already have a working Lean / Backtrader pipeline and don't need an LLM front-end&lt;/li&gt;
&lt;li&gt;Only trade US equities and don't care about the A-share-heavy data layer&lt;/li&gt;
&lt;li&gt;Need real-time, sub-second tick execution (this is research-grade, not HFT)&lt;/li&gt;
&lt;li&gt;Are looking for an "AI bot that prints money" — that's not what this is, and the team is clear about that&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;pip install vibe-trading-ai&lt;/code&gt; and start with the BTC moving-average backtest example. If you finish that and feel underwhelmed, you can be confident in walking away. If you finish it and immediately go look for your broker's CSV export — like I did — you've found a project worth the next few hours of your week.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Vibe-Trading is open source on &lt;a href="https://github.com/HKUDS/Vibe-Trading" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, available via &lt;code&gt;pip install vibe-trading-ai&lt;/code&gt;, has its own &lt;a href="https://vibetrading.wiki/" rel="noopener noreferrer"&gt;wiki&lt;/a&gt;, and has a Discord linked from the README. MIT licensed, so you can fork it.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>vibetrading</category>
      <category>agents</category>
      <category>tradingagent</category>
      <category>quant</category>
    </item>
    <item>
      <title>LMCache Review: 3-10x Faster vLLM via KV Cache Reuse (2026)</title>
      <dc:creator>Andrew</dc:creator>
      <pubDate>Wed, 17 Jun 2026 10:09:30 +0000</pubDate>
      <link>https://dev.to/andrew-ooo/lmcache-review-3-10x-faster-vllm-via-kv-cache-reuse-2026-3k72</link>
      <guid>https://dev.to/andrew-ooo/lmcache-review-3-10x-faster-vllm-via-kv-cache-reuse-2026-3k72</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Originally published on &lt;a href="https://andrew.ooo/posts/lmcache-fastest-kv-cache-layer-vllm-review/" rel="noopener noreferrer"&gt;andrew.ooo&lt;/a&gt;&lt;/strong&gt; — visit the original for any updates, code snippets that aged out, or follow-up posts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LMCache&lt;/strong&gt; is an open-source Key-Value caching layer that sits between vLLM (or SGLang) and your storage hierarchy, turning the KV cache from an in-GPU scratchpad into a persistent, reusable, vendor-neutral asset. The numbers are big: &lt;strong&gt;3.7–6.8x lower TTFT&lt;/strong&gt; (time-to-first-token), &lt;strong&gt;up to 15x throughput improvement&lt;/strong&gt; in chatbot and RAG workloads, and a 10x boost on Mixture-of-Experts inference after the April 2026 multiprocess rearchitecture.&lt;/p&gt;

&lt;p&gt;The project came out of breakthrough research at the &lt;strong&gt;University of Chicago&lt;/strong&gt;, joined the &lt;strong&gt;PyTorch Foundation&lt;/strong&gt; in October 2025, and is now integrated into &lt;strong&gt;NVIDIA Dynamo&lt;/strong&gt;, &lt;strong&gt;IBM's open-source LLM serving stack&lt;/strong&gt;, &lt;strong&gt;CoreWeave&lt;/strong&gt;'s production inference for &lt;strong&gt;Cohere&lt;/strong&gt;, and the official &lt;strong&gt;vLLM production-stack&lt;/strong&gt;. As of June 2026 it has &lt;strong&gt;9.2K+ GitHub stars&lt;/strong&gt;, 709 added this week, and a growing list of enterprise adopters who needed somewhere to put 1–2 GB of KV cache that wasn't expensive H100 HBM.&lt;/p&gt;

&lt;p&gt;Key facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Apache 2.0&lt;/strong&gt;, open source at &lt;a href="https://github.com/LMCache/LMCache" rel="noopener noreferrer"&gt;LMCache/LMCache&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vendor-neutral by design&lt;/strong&gt; — works with vLLM, SGLang, NVIDIA Dynamo, multiple hardware vendors (NVIDIA, AMD MI300X, Arm, Huawei Ascend), and storage backends (CPU RAM, local SSD, Redis/Valkey, Mooncake, InfiniStore, S3-compatible, NIXL, GDS)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two deployment modes&lt;/strong&gt; — Multiprocess (standalone daemon, recommended) and In-process (embedded in vLLM)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-prefix KV reuse&lt;/strong&gt; via CacheBlend — reuse cached blocks at &lt;em&gt;any&lt;/em&gt; position in the prompt, not just shared prefixes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PD disaggregation&lt;/strong&gt; support — transfer KV cache from prefill workers to decode workers over NVLink, RDMA, or TCP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production observability&lt;/strong&gt; — Kubernetes-native metrics, request-level and token-level cache hit ratios&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Engine-independent&lt;/strong&gt; — cache survives even if the inference engine crashes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Day-1 support for gpt-oss 20B/120B&lt;/strong&gt;, Qwen3 series, Llama 3, and most modern model families&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What LMCache Actually Solves
&lt;/h2&gt;

&lt;p&gt;The KV cache is the single biggest performance lever in LLM serving, and almost nobody outside of model-serving teams understands it. Quick refresher: when an LLM processes a prompt, every token's attention computation produces Key and Value tensors. For a 32K-token context on a 70B model, that's roughly &lt;strong&gt;1–2 GB of KV cache per request&lt;/strong&gt;. The cache lets the model decode subsequent tokens without recomputing the prompt. Lose it, recompute. Long prompt? Long recompute.&lt;/p&gt;

&lt;p&gt;The default behavior of every modern inference engine is brutal: KV cache lives in GPU HBM, gets evicted when memory pressure hits, and dies when the engine restarts. If a user comes back 5 minutes later with a follow-up question on the same 50-page document, the engine &lt;strong&gt;redoes the entire prefill&lt;/strong&gt; — that's your TTFT spike.&lt;/p&gt;

&lt;p&gt;LMCache fixes this with three big ideas:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Tiered offloading.&lt;/strong&gt; Move KV blocks from GPU HBM → CPU RAM → local SSD → remote storage (Redis, S3, Mooncake, etc.). When you need them back, retrieve only the relevant blocks via a high-throughput connector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Cross-engine sharing.&lt;/strong&gt; Multiple vLLM instances can share one KV cache pool. The LMCache server runs as a standalone daemon, so engines come and go without losing cached state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Non-prefix reuse via CacheBlend.&lt;/strong&gt; Prefix caching (vLLM's built-in feature) only helps when the &lt;em&gt;exact&lt;/em&gt; prefix matches. CacheBlend reuses KV blocks at any position by selectively recomputing a small number of tokens to recover quality. This is the unlock for RAG, where you stitch retrieved chunks into a fresh prompt every time.&lt;/p&gt;

&lt;p&gt;The first two are operational wins. The third is what makes LMCache fundamentally more powerful than vLLM's stock prefix cache.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It's Trending Now (June 2026)
&lt;/h2&gt;

&lt;p&gt;Three concurrent waves:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agentic workloads broke prefix caching.&lt;/strong&gt; Multi-turn agent loops generate prompts where the shared structure isn't at the front anymore — tool outputs, intermediate reasoning, and retrieved context all interleave. Stock prefix cache hit rates collapsed below 20% on agentic traffic. LMCache's &lt;a href="https://blog.lmcache.ai/en/2026/04/03/lmcaches-new-architecture-boosts-moe-inference-performance-by-10x/" rel="noopener noreferrer"&gt;April 2026 MoE rearchitecture&lt;/a&gt; and &lt;a href="https://blog.lmcache.ai/en/2026/05/12/benchmarking-lmcache-for-multi-turn-agentic-workloads-on-amd-mi300x/" rel="noopener noreferrer"&gt;May 2026 AMD MI300X benchmark&lt;/a&gt; directly targeted this, showing 10x improvement on multi-turn workloads.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;NVIDIA Dynamo shipped with LMCache integration&lt;/strong&gt; in September 2025, putting it in front of every team running NVIDIA's reference inference stack.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tensormesh launched&lt;/strong&gt; in October 2025 as the commercial steward, providing enterprise support while keeping the project Apache 2.0. That removed the "who maintains this in production?" objection for risk-averse buyers.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The HN thread from July 2025 (&lt;a href="https://news.ycombinator.com/item?id=44367811" rel="noopener noreferrer"&gt;Lossless LLM 3x Throughput Increase by LMCache&lt;/a&gt;) and the r/LocalLLaMA post (&lt;a href="https://www.reddit.com/r/LocalLLaMA/comments/1lewhla/we_built_this_project_to_increase_llm_throughput/" rel="noopener noreferrer"&gt;We built this project to increase LLM throughput by 3x&lt;/a&gt;) were both turning points — IBM's adoption announcement in the comments gave it credibility, and the project has compounded since.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install &amp;amp; First Run (60 Seconds)
&lt;/h2&gt;

&lt;p&gt;The cleanest path is vLLM + LMCache via uv:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv venv &lt;span class="nt"&gt;--python&lt;/span&gt; 3.12
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
uv pip &lt;span class="nb"&gt;install &lt;/span&gt;lmcache vllm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two deployment modes are available. &lt;strong&gt;Multiprocess (MP) mode is now the recommended default&lt;/strong&gt; because the cache survives engine crashes and one server can feed multiple vLLM instances.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MP mode&lt;/strong&gt; — start the LMCache server in one terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;lmcache server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--l1-size-gb&lt;/span&gt; 20 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--eviction-policy&lt;/span&gt; LRU &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--chunk-size&lt;/span&gt; 256
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ZMQ port (default 5555) accepts engine connections; the HTTP frontend (default 8080) exposes Prometheus-compatible metrics and a management API.&lt;/p&gt;

&lt;p&gt;Start vLLM with the MP connector in a second terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve Qwen/Qwen3-8B &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kv-transfer-config&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s1"&gt;'{"kv_connector":"LMCacheMPConnector",
    "kv_connector_module_path":"lmcache.integration.vllm.lmcache_mp_connector",
    "kv_role":"kv_both"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;kv_connector_module_path&lt;/code&gt; override is important: it pins the connector to the LMCache-shipped implementation rather than vLLM's vendored copy, so you get the latest server protocol and fixes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In-process mode&lt;/strong&gt; is one command if you just want a single-node setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve Qwen/Qwen3-8B &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kv-offloading-backend&lt;/span&gt; lmcache &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--kv-offloading-size&lt;/span&gt; 20 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--disable-hybrid-kv-cache-manager&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That last flag is &lt;strong&gt;mandatory&lt;/strong&gt; in in-process mode — forgetting it is the single most common setup mistake reported in the GitHub issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Production Test — Long-Doc Q&amp;amp;A
&lt;/h2&gt;

&lt;p&gt;The canonical benchmark workload looks like this. Two requests share a long document prefix; the second one should hit cache.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# First request — cache is cold, full prefill happens&lt;/span&gt;
curl http://localhost:8000/v1/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "Qwen/Qwen3-8B",
    "prompt": "&amp;lt;50-page document&amp;gt;... Summarize section 3.",
    "max_tokens": 200
  }'&lt;/span&gt;

&lt;span class="c"&gt;# Second request — same document, different question&lt;/span&gt;
curl http://localhost:8000/v1/completions &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
    "model": "Qwen/Qwen3-8B",
    "prompt": "&amp;lt;50-page document&amp;gt;... What are the limitations?",
    "max_tokens": 200
  }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see logs like this from the LMCache server on the cold pass:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;[2026-04-22 19:49:56,316] LMCache INFO: Stored 256 tokens in 0.023 seconds
[2026-04-22 19:49:56,555] LMCache INFO: Stored 256 tokens in 0.005 seconds
&lt;/span&gt;&lt;span class="c"&gt;...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And retrieval on the second request:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;[2026-04-22 19:50:04,686] LMCache INFO: Retrieved 256 tokens in 0.003 seconds
[2026-04-22 19:50:04,968] LMCache INFO: Stored 256 tokens in 0.005 seconds
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the ceph.io benchmark with Qwen3-32B on a long-doc workload, &lt;strong&gt;TTFT dropped from ~6.5 seconds to ~0.4 seconds&lt;/strong&gt; on the cached pass — roughly 16x faster on the cache hit path. Across the full LMCache paper (arXiv 2510.09665), the average improvement is &lt;strong&gt;3.7–6.8x TTFT reduction&lt;/strong&gt; and &lt;strong&gt;19x inter-token-latency reduction&lt;/strong&gt; on TriviaQA-style long-context workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks That Matter
&lt;/h2&gt;

&lt;p&gt;A few numbers from peer-reviewed and vendor benchmarks:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workload&lt;/th&gt;
&lt;th&gt;Engine&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After LMCache&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Multi-round chat (LMCache paper)&lt;/td&gt;
&lt;td&gt;vLLM&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;3.7–6.8x lower TTFT&lt;/td&gt;
&lt;td&gt;3.7–6.8x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TriviaQA long-context&lt;/td&gt;
&lt;td&gt;vLLM&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;19x lower ITL&lt;/td&gt;
&lt;td&gt;19x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RAG document QA (ceph.io)&lt;/td&gt;
&lt;td&gt;vLLM + Ceph&lt;/td&gt;
&lt;td&gt;6.5s TTFT&lt;/td&gt;
&lt;td&gt;0.4s TTFT&lt;/td&gt;
&lt;td&gt;~16x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-turn agentic (AMD MI300X)&lt;/td&gt;
&lt;td&gt;vLLM&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;10x throughput&lt;/td&gt;
&lt;td&gt;10x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MoE inference (Qwen3-235B)&lt;/td&gt;
&lt;td&gt;vLLM 0.18.1&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;10x throughput&lt;/td&gt;
&lt;td&gt;10x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chatbot + RAG (PyTorch blog)&lt;/td&gt;
&lt;td&gt;vLLM&lt;/td&gt;
&lt;td&gt;baseline&lt;/td&gt;
&lt;td&gt;up to 15x throughput&lt;/td&gt;
&lt;td&gt;15x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two important caveats. First, these are &lt;strong&gt;cache-hit benchmarks&lt;/strong&gt;. If your workload has near-zero prompt overlap (e.g., one-shot classification of unique inputs), LMCache won't help — and can add small overhead. The &lt;a href="https://github.com/LMCache/LMCache/issues/1812" rel="noopener noreferrer"&gt;open GitHub issue #1812&lt;/a&gt; shows exactly this: a benchmark designed without prefix overlap measured higher latency with LMCache enabled, which is the expected behavior. Match the tool to the workload.&lt;/p&gt;

&lt;p&gt;Second, on the &lt;a href="https://levelup.gitconnected.com/vllm-prefix-caching-vs-lmcache-benchmarking-kv-reuse-tradeoffs-944fbaf98b56" rel="noopener noreferrer"&gt;Level Up Coding benchmark&lt;/a&gt;, a small-model single-node Colab test, vLLM's built-in prefix cache was actually faster than LMCache for the common case where everything fits in HBM. LMCache's win condition is when your &lt;em&gt;working set exceeds GPU memory&lt;/em&gt; — exactly the case in production multi-tenant serving and long-context RAG.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community Reactions
&lt;/h2&gt;

&lt;p&gt;From the r/LocalLLaMA launch thread:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"We built this project to increase LLM throughput by 3x. Now it has been adopted by IBM in their LLM serving stack."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;From the &lt;a href="https://news.ycombinator.com/item?id=44367811" rel="noopener noreferrer"&gt;HN thread&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"In LLM serving, the input is computed into intermediate states called KV cache to further provide answers. These data are relatively large (~1-2GB for long context) and are often evicted when GPU memory is not enough."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;From r/mlops on production prefix-cache hit rates:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Hey everyone, so I spent the last few weeks going down the KV cache rabbit hole. One thing which is most of what makes LLM inference expensive is the [recompute on prefix miss]."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The general sentiment in production inference circles: LMCache stopped being optional once you hit two conditions — long contexts (&amp;gt;8K tokens routine) and multi-tenant serving where cache eviction is constant. Below those thresholds, vLLM's built-in prefix cache is enough and adding LMCache is over-engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Limitations
&lt;/h2&gt;

&lt;p&gt;A frank list of things to weigh before adopting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Not a magic bullet for stateless workloads.&lt;/strong&gt; Classification, embeddings, one-shot translation — workloads with no prompt overlap — get no benefit and a small overhead penalty. Measure before deploying.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational complexity.&lt;/strong&gt; MP mode means an additional daemon process, ZMQ ports, l1/l2 tiering, eviction policy tuning, and Prometheus scraping. In-process mode is simpler but loses the cross-engine sharing benefit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vendor moves fast.&lt;/strong&gt; The connector interface changed between vLLM 0.18 and 0.20 (the &lt;code&gt;kv_connector_module_path&lt;/code&gt; override exists precisely because of this). Pin versions in production and read release notes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CacheBlend has a quality recovery step.&lt;/strong&gt; Non-prefix reuse selectively recomputes tokens to repair quality, but this isn't free — there's a CPU/GPU cost. For high-quality-bar applications (legal, medical), validate accuracy on your golden set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage backend complexity.&lt;/strong&gt; S3-compatible and InfiniStore add the usual distributed-systems failure modes (network partitions, consistency edge cases). Start with CPU+SSD tiering and only move to remote storage when you genuinely need cross-node sharing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No managed offering from the project itself.&lt;/strong&gt; Tensormesh sells enterprise support, but if you want fully-managed serving, you're combining LMCache with vLLM-on-Kubernetes yourself (or using the official production-stack Helm chart).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to Choose LMCache vs Alternatives
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;vs vLLM's built-in prefix cache&lt;/strong&gt; — vLLM's prefix cache is great until your working set exceeds HBM or your prompts share content that isn't at the start. LMCache wins on both.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;vs SGLang's RadixAttention&lt;/strong&gt; — Comparable in design philosophy. LMCache is more vendor-neutral and has stronger ecosystem integrations (Dynamo, IBM, PyTorch Foundation membership). SGLang ships RadixAttention as a built-in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;vs custom KV cache solutions&lt;/strong&gt; — If you already built one (and many production inference teams have), LMCache's value is the connector ecosystem and the published research backing CacheBlend.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;vs not caching at all&lt;/strong&gt; — If you're running short-context, low-overlap workloads, you don't need a KV cache layer. Don't add LMCache as a default — measure first.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: What's the actual TTFT reduction I should expect for RAG?&lt;/strong&gt;&lt;br&gt;
A: 3–16x depending on document overlap and context length. The high end (16x) requires real overlap across queries on the same documents; expect 3–5x on a typical mixed RAG workload with moderate reuse. The published LMCache paper reports 3.7–6.8x as the consistent range.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does LMCache work with closed-source APIs like OpenAI or Claude?&lt;/strong&gt;&lt;br&gt;
A: No. LMCache operates on KV tensors &lt;em&gt;inside&lt;/em&gt; the inference engine, so it only works with engines you self-host (vLLM, SGLang, NVIDIA Dynamo). Closed APIs expose only the chat interface and handle caching internally.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Can I run LMCache on Apple Silicon / CPU only?&lt;/strong&gt;&lt;br&gt;
A: Not as a primary serving stack — LMCache is a layer for GPU-based inference engines. You can run the cache server on CPU and offload to it, but the actual model serving needs vLLM/SGLang which require CUDA, ROCm, or compatible accelerators. AMD MI300X, Arm, and Huawei Ascend are explicitly supported.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does it work with quantized models (FP8, INT4)?&lt;/strong&gt;&lt;br&gt;
A: Yes. LMCache caches the KV tensors regardless of model weight quantization — the KV cache itself is independent of weight dtype. Day-1 support is shipped for gpt-oss 20B/120B FP8 and most quantized Qwen3 / Llama variants.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How does the cost math work in production?&lt;/strong&gt;&lt;br&gt;
A: The case study most cited is the CoreWeave + Cohere deployment, where LMCache let them serve the same throughput with less GPU memory pressure, deferring an HBM-bound capacity buy. The savings are highly workload-dependent — multi-turn chat and RAG get the biggest wins; agentic loops (the new hot workload) benefit if you use the MP mode with multi-engine sharing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Is the project actually production-ready, or still research?&lt;/strong&gt;&lt;br&gt;
A: Production-ready for the supported configurations. It's deployed at IBM, CoreWeave (for Cohere), inside NVIDIA Dynamo, and powers the official vLLM production-stack. PyTorch Foundation membership and the Tensormesh commercial support layer add the institutional backing that risk-averse buyers care about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;If you self-host LLM inference at any meaningful scale, &lt;strong&gt;LMCache is now table stakes&lt;/strong&gt; for long-context and multi-turn workloads. The combination of 3–10x TTFT/throughput wins, vendor-neutral storage backends, NVIDIA Dynamo integration, and PyTorch Foundation stewardship makes it the default open-source KV cache layer in 2026.&lt;/p&gt;

&lt;p&gt;For one-shot, stateless, or short-context workloads, don't bother — vLLM's built-in prefix cache is fine. For everything else: install it, run the long-doc benchmark on your traffic, and you'll see the hit rate justify the daemon.&lt;/p&gt;

&lt;p&gt;Project: &lt;a href="https://github.com/LMCache/LMCache" rel="noopener noreferrer"&gt;github.com/LMCache/LMCache&lt;/a&gt; — Apache 2.0, 9.2K+ stars, PyTorch Ecosystem.&lt;/p&gt;

</description>
      <category>lmcache</category>
      <category>kvcache</category>
      <category>vllm</category>
      <category>llminference</category>
    </item>
    <item>
      <title>Apple Container Review: A Native Docker Alternative for Mac</title>
      <dc:creator>Andrew</dc:creator>
      <pubDate>Tue, 16 Jun 2026 10:09:24 +0000</pubDate>
      <link>https://dev.to/andrew-ooo/apple-container-review-a-native-docker-alternative-for-mac-3bo3</link>
      <guid>https://dev.to/andrew-ooo/apple-container-review-a-native-docker-alternative-for-mac-3bo3</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Originally published on &lt;a href="https://andrew.ooo/posts/apple-container-mac-linux-docker-alternative-review/" rel="noopener noreferrer"&gt;andrew.ooo&lt;/a&gt;&lt;/strong&gt; — visit the original for any updates, code snippets that aged out, or follow-up posts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;apple/container&lt;/code&gt;&lt;/strong&gt; is Apple's first-party tool for running Linux containers on Mac — written in &lt;strong&gt;Swift&lt;/strong&gt;, optimized for &lt;strong&gt;Apple silicon&lt;/strong&gt;, and built on top of the new macOS 26 Virtualization framework. It's the &lt;strong&gt;#2 trending GitHub repo this week&lt;/strong&gt; with &lt;strong&gt;37,648 stars total&lt;/strong&gt; and &lt;strong&gt;10,541 new stars in the past seven days&lt;/strong&gt;, sitting right next to the &lt;code&gt;mvanhorn/last30days-skill&lt;/code&gt; skill at the top of the trending list.&lt;/p&gt;

&lt;p&gt;The model is genuinely different from Docker Desktop: instead of one shared Linux VM hosting all your containers, &lt;strong&gt;every container runs in its own dedicated lightweight VM&lt;/strong&gt;. That sounds heavy. It isn't — boot times are comparable to shared-VM containers, and per-container memory is lower because each VM only includes the kernel surface that one image actually needs.&lt;/p&gt;

&lt;p&gt;Key facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;37,648 GitHub stars&lt;/strong&gt; (10,541 this week, &lt;strong&gt;#2 trending overall&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache 2.0&lt;/strong&gt; license, written in &lt;strong&gt;Swift&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apple silicon only&lt;/strong&gt; — no Intel Mac support, ever&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;macOS 26 required&lt;/strong&gt; — the team explicitly will not fix bugs reproduced on older macOS&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OCI-compatible&lt;/strong&gt; — pulls and pushes to any standard registry (Docker Hub, GHCR, ECR, GAR)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Each container = its own VM&lt;/strong&gt;, using the &lt;a href="https://github.com/apple/containerization" rel="noopener noreferrer"&gt;Containerization&lt;/a&gt; Swift package&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container Machine&lt;/strong&gt; (new this week) lets you spin up persistent Linux dev VMs with your dotfiles baked in&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedded DNS&lt;/strong&gt; (&lt;code&gt;container system dns create test&lt;/code&gt;) gives every container &lt;code&gt;name.test&lt;/code&gt; resolution automatically&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Currently 0.4.x&lt;/strong&gt; — minor versions can break, 1.0 not yet shipped&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've been paying for Docker Desktop on a M-series Mac purely because OrbStack made you nervous about going all-in on a non-Docker option, this is the third viable choice — and the only one shipped by the OS vendor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Repo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/apple/container" rel="noopener noreferrer"&gt;apple/container&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stars&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;37,648 (10,541 this week, #2 trending)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintainer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apple (&lt;code&gt;/jglogan&lt;/code&gt;, &lt;code&gt;/katiewasnothere&lt;/code&gt;, &lt;code&gt;/dcantah&lt;/code&gt;, &lt;code&gt;/dkovba&lt;/code&gt;, &lt;code&gt;/realrajaryan&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Language&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Swift&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Underlying package&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/apple/containerization" rel="noopener noreferrer"&gt;apple/containerization&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Host requirement&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;macOS 26, Apple silicon&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Install&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Signed &lt;code&gt;.pkg&lt;/code&gt; from &lt;a href="https://github.com/apple/container/releases" rel="noopener noreferrer"&gt;releases&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Image format&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;OCI v1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Default kernel&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Kata Containers 3.17.0 (static arm64)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Network backend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;macOS &lt;code&gt;vmnet&lt;/code&gt; framework&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What is Apple Container actually doing?
&lt;/h2&gt;

&lt;p&gt;The CLI surface looks deliberately Docker-shaped:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;container build &lt;span class="nt"&gt;--tag&lt;/span&gt; web-test &lt;span class="nt"&gt;--file&lt;/span&gt; Dockerfile &lt;span class="nb"&gt;.&lt;/span&gt;
container run &lt;span class="nt"&gt;--name&lt;/span&gt; my-web-server &lt;span class="nt"&gt;--detach&lt;/span&gt; &lt;span class="nt"&gt;--rm&lt;/span&gt; web-test
container &lt;span class="nb"&gt;ls
&lt;/span&gt;container logs my-web-server
container &lt;span class="nb"&gt;exec&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; my-web-server sh
container stop my-web-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you replaced &lt;code&gt;container&lt;/code&gt; with &lt;code&gt;docker&lt;/code&gt; you'd barely notice. That's the point — Apple is not trying to invent a new mental model, just trying to make the runtime under it native.&lt;/p&gt;

&lt;p&gt;The runtime is where it gets interesting. From the &lt;a href="https://github.com/apple/container/blob/main/docs/technical-overview.md" rel="noopener noreferrer"&gt;technical overview&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;container&lt;/code&gt; runs containers differently. Using the open source Containerization package, it runs a &lt;strong&gt;lightweight VM for each container&lt;/strong&gt; that you create.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That breaks down into three concrete wins versus the shared-VM model that Docker Desktop, Colima, and OrbStack all use:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt; — Each container has full VM isolation. The attack surface inside one container can't reach the kernel of another container, because it's a literally separate kernel.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy&lt;/strong&gt; — &lt;code&gt;--volume&lt;/code&gt; mounts only touch the VM for that specific container. With a shared VM, every directory you might ever want to mount has to be reachable by the host VM, then bind-mounted selectively into containers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt; — Each VM only pages in the libraries that one image needs. A 200MB Alpine + Python image really does run in ~200MB of guest memory, not 200MB plus a multi-GB shared kernel.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The trade-off is honest: &lt;strong&gt;boot time is comparable, not better&lt;/strong&gt;, because per-container VM startup is fast (microvm-style) but not free. And memory ballooning back to macOS is &lt;strong&gt;incomplete&lt;/strong&gt; — once a container has grabbed RAM, returning it to the host is partial, so if you have 10 containers idling, they collectively still hold their high-water-mark memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing and the first &lt;code&gt;hello world&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The install is a signed &lt;code&gt;.pkg&lt;/code&gt; from GitHub releases. Double-click, type your password, and it drops binaries under &lt;code&gt;/usr/local&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start the service (launchd-managed)&lt;/span&gt;
container system start

&lt;span class="c"&gt;# First run prompts to fetch the Kata kernel&lt;/span&gt;
&lt;span class="c"&gt;# Installing base container filesystem...&lt;/span&gt;
&lt;span class="c"&gt;# Install the recommended default kernel from&lt;/span&gt;
&lt;span class="c"&gt;#   https://github.com/kata-containers/kata-containers/releases/download/3.17.0/...&lt;/span&gt;
&lt;span class="c"&gt;# [Y/n]: y&lt;/span&gt;

&lt;span class="c"&gt;# Verify&lt;/span&gt;
container list &lt;span class="nt"&gt;--all&lt;/span&gt;
&lt;span class="c"&gt;# ID  IMAGE  OS  ARCH  STATE  IP&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A web server Dockerfile that works unchanged from Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; docker.io/python:alpine&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /content&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apk add curl
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'&amp;lt;!DOCTYPE html&amp;gt;&amp;lt;html&amp;gt;&amp;lt;body&amp;gt;&amp;lt;h1&amp;gt;Hello, world!&amp;lt;/h1&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; index.html
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["python3", "-m", "http.server", "80", "--bind", "0.0.0.0"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build, run, hit it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;container build &lt;span class="nt"&gt;--tag&lt;/span&gt; web-test &lt;span class="nt"&gt;--file&lt;/span&gt; Dockerfile &lt;span class="nb"&gt;.&lt;/span&gt;
container run &lt;span class="nt"&gt;--name&lt;/span&gt; my-web-server &lt;span class="nt"&gt;--detach&lt;/span&gt; &lt;span class="nt"&gt;--rm&lt;/span&gt; web-test

container &lt;span class="nb"&gt;ls&lt;/span&gt;
&lt;span class="c"&gt;# ID             IMAGE             OS     ARCH   STATE    IP&lt;/span&gt;
&lt;span class="c"&gt;# buildkit       container-builder linux  arm64  running  192.168.64.2&lt;/span&gt;
&lt;span class="c"&gt;# my-web-server  web-test:latest   linux  arm64  running  192.168.64.3&lt;/span&gt;

curl http://192.168.64.3
&lt;span class="c"&gt;# &amp;lt;!DOCTYPE html&amp;gt;&amp;lt;html&amp;gt;&amp;lt;body&amp;gt;&amp;lt;h1&amp;gt;Hello, world!&amp;lt;/h1&amp;gt;&amp;lt;/body&amp;gt;&amp;lt;/html&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the IP is real and routable from the Mac host directly — no port-forwarding gymnastics. That's the &lt;code&gt;vmnet&lt;/code&gt; framework doing its job.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Container Machine feature (new this week)
&lt;/h2&gt;

&lt;p&gt;The headline change that pushed &lt;code&gt;apple/container&lt;/code&gt; up the trending list this week is &lt;strong&gt;Container Machine&lt;/strong&gt; — full-fat persistent Linux VMs that share your home directory, dotfiles, and &lt;code&gt;~/.ssh&lt;/code&gt; automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a persistent Ubuntu dev environment&lt;/span&gt;
container machine create ubuntu-dev &lt;span class="nt"&gt;--image&lt;/span&gt; ubuntu:24.04

&lt;span class="c"&gt;# SSH into it (no manual key setup)&lt;/span&gt;
container machine ssh ubuntu-dev

&lt;span class="c"&gt;# Stop, start, snapshot&lt;/span&gt;
container machine stop ubuntu-dev
container machine start ubuntu-dev
container machine list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the WSL-equivalent slot. The HN discussion that pushed it onto the front page explicitly compared it to &lt;code&gt;wslc&lt;/code&gt; (WSL containers, announced at Microsoft Build) — and one commenter noted &lt;em&gt;"What if Apple and Microsoft had teamed up? Can you imagine?"&lt;/em&gt; As of this morning &lt;a href="https://news.ycombinator.com/item?id=48469658" rel="noopener noreferrer"&gt;the macOS Container Machines HN thread&lt;/a&gt; is still trending after 24 hours.&lt;/p&gt;

&lt;p&gt;The pattern Apple is targeting is the same one OrbStack, Lima, and Multipass have served: a long-lived Linux box with your dotfiles, where you SSH in and run &lt;code&gt;git clone&lt;/code&gt;/&lt;code&gt;npm install&lt;/code&gt;/&lt;code&gt;cargo build&lt;/code&gt; in a real Linux filesystem rather than fighting macOS' BSD coreutils and case-insensitive APFS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embedded DNS — the small thing that matters
&lt;/h2&gt;

&lt;p&gt;One quietly excellent feature: &lt;code&gt;container&lt;/code&gt; ships with an &lt;strong&gt;embedded DNS service&lt;/strong&gt; that maps container names to IPs. Configure it once:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;container system dns create &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That writes a resolver file under &lt;code&gt;/etc/resolver/test&lt;/code&gt; and tells macOS to use the container DNS for the &lt;code&gt;test&lt;/code&gt; TLD. Then any container started with &lt;code&gt;--name foo&lt;/code&gt; is reachable as &lt;code&gt;foo.test&lt;/code&gt; from your Mac:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;container run &lt;span class="nt"&gt;--name&lt;/span&gt; api &lt;span class="nt"&gt;--detach&lt;/span&gt; my-api:latest
curl http://api.test:8080/health
&lt;span class="c"&gt;# {"status":"ok"}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the kind of thing Docker has supported for a decade behind &lt;code&gt;--network&lt;/code&gt; and the embedded Docker DNS, but it only works &lt;em&gt;inside&lt;/em&gt; the Docker network. &lt;code&gt;apple/container&lt;/code&gt; makes the names resolvable &lt;strong&gt;from the macOS host itself&lt;/strong&gt;, with zero extra config beyond the one &lt;code&gt;dns create&lt;/code&gt; call. For multi-service local dev that one thing is worth the install.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community reactions
&lt;/h2&gt;

&lt;p&gt;The conversation falls cleanly into three camps.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Camp 1: "Tired of Docker for Mac, this is enough."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.outcoldman.com/blog/2026/05/02/apple-container-tired-of-docker/" rel="noopener noreferrer"&gt;outcoldman blog post&lt;/a&gt; is the most-cited piece in the recent HN threads. The author's punchline: for the 80% case (build, run, log, exec) it's a complete Docker replacement on a Mac, with better memory behavior and no Docker Desktop subscription pressure. The pain points they hit:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--add-host&lt;/code&gt; is partial. You can inject DNS via &lt;code&gt;--dns&lt;/code&gt;, &lt;code&gt;--dns-domain&lt;/code&gt;, &lt;code&gt;--dns-search&lt;/code&gt;, but pinning a single &lt;code&gt;/etc/hosts&lt;/code&gt; entry isn't there yet.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--link&lt;/code&gt; isn't supported — but Docker deprecated &lt;code&gt;--link&lt;/code&gt; years ago, so this is a non-issue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anonymous volumes don't auto-cleanup with &lt;code&gt;--rm&lt;/code&gt;.&lt;/strong&gt; Docker reaps them, &lt;code&gt;apple/container&lt;/code&gt; doesn't. If your CI loop creates ephemeral volumes you'll need a &lt;code&gt;container volume prune&lt;/code&gt; cron.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Camp 2: "But I need Compose."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://pbxscience.com/apples-native-linux-container-tool-has-arrived-but-can-it-really-replace-docker/" rel="noopener noreferrer"&gt;pbxscience writeup&lt;/a&gt; is the most-quoted critique:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The two most significant gaps are &lt;strong&gt;no native Docker Compose support&lt;/strong&gt; (third-party bridges exist but are unofficial) &lt;strong&gt;and incomplete DevContainer support in VS Code&lt;/strong&gt;. For single-container local development workflows, it is an excellent alternative.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the real ceiling today. Most non-trivial local dev means &lt;code&gt;docker compose up&lt;/code&gt; bringing up Postgres + Redis + your app + a worker. &lt;code&gt;apple/container&lt;/code&gt; makes you stitch that together with shell scripts and &lt;code&gt;container run&lt;/code&gt; invocations, or wait for one of the third-party Compose bridges to mature.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Camp 3: "What about filesystem perf for &lt;code&gt;git clone&lt;/code&gt; and &lt;code&gt;npm install&lt;/code&gt;?&lt;/strong&gt;"&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.reddit.com/r/devops/comments/1oj9wxs/apples_new_container_runtime_vs_docker_desktop/" rel="noopener noreferrer"&gt;r/devops thread&lt;/a&gt; zeroed in on the historical Docker-on-Mac pain point: small-file I/O over the macOS↔Linux mount boundary. The community consensus so far: &lt;code&gt;apple/container&lt;/code&gt; uses &lt;code&gt;virtiofs&lt;/code&gt; under the Virtualization framework, which is materially faster than the gRPC-FUSE bridge Docker Desktop used historically (and roughly on par with OrbStack's macFUSE-based path). Nobody has published rigorous numbers yet — that's an open opportunity for a follow-up post.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;

&lt;p&gt;These are the things you should know &lt;strong&gt;before&lt;/strong&gt; you uninstall Docker Desktop:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No Docker Compose.&lt;/strong&gt; Single biggest one. If your team's &lt;code&gt;docker-compose.yml&lt;/code&gt; is the source of truth for local dev, you stay on Docker Desktop, OrbStack, or Colima until the unofficial Compose bridges stabilize.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apple silicon only.&lt;/strong&gt; No Intel. If you have any Intel Macs in the fleet, you can't standardize.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;macOS 26 only.&lt;/strong&gt; The team explicitly won't fix bugs on older macOS. If your org is locked to macOS 25 for IT-approval reasons, wait.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pre-1.0 API stability.&lt;/strong&gt; Breaking changes are allowed between minor versions. Don't pin your CI to &lt;code&gt;apple/container&lt;/code&gt; until 1.0 ships.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory ballooning is partial.&lt;/strong&gt; Containers don't fully return RAM to macOS. Restart &lt;code&gt;container system&lt;/code&gt; weekly if you run many containers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DevContainer support in VS Code is incomplete.&lt;/strong&gt; The VS Code Dev Containers extension assumes Docker. You can hack it by symlinking &lt;code&gt;docker&lt;/code&gt; → &lt;code&gt;container&lt;/code&gt;, but it breaks on Compose-shaped configs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anonymous volume cleanup is manual.&lt;/strong&gt; Add &lt;code&gt;container volume prune&lt;/code&gt; to your shutdown script.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Builder is BuildKit-based but a separate &lt;code&gt;buildkit&lt;/code&gt; container&lt;/strong&gt; runs alongside your workloads. It's lightweight, but if you &lt;code&gt;container ls&lt;/code&gt; and see something unexpected, that's why.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  When to actually use it
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Pick&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single-container dev (web server, Python script, Go binary)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;&lt;code&gt;apple/container&lt;/code&gt;&lt;/strong&gt; — clean, native, no subscription&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-service local stack with Compose&lt;/td&gt;
&lt;td&gt;Docker Desktop or OrbStack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;You want WSL-style persistent Linux box on your Mac&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;&lt;code&gt;apple/container machine&lt;/code&gt;&lt;/strong&gt; or &lt;a href="https://github.com/lima-vm/lima" rel="noopener noreferrer"&gt;Lima&lt;/a&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Intel Mac or older macOS&lt;/td&gt;
&lt;td&gt;Docker Desktop, OrbStack, or Colima&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Production Kubernetes parity&lt;/td&gt;
&lt;td&gt;Anything with Compose support + kind/k3d on top&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security-sensitive workloads on a dev laptop&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;&lt;code&gt;apple/container&lt;/code&gt;&lt;/strong&gt; — per-container VM isolation is the strongest model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CI runner on a Mac mini farm&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;&lt;code&gt;apple/container&lt;/code&gt;&lt;/strong&gt; if you can require macOS 26 — less memory, better isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How I'd run it today
&lt;/h2&gt;

&lt;p&gt;If you're a solo developer or a backend engineer who mostly runs one container at a time, &lt;strong&gt;install it, use it, keep Docker around as a fallback for Compose-shaped projects&lt;/strong&gt;. The Apple silicon optimization is real — I'm seeing ~30-40% less RAM held by the same set of containers compared to Docker Desktop on the same M2 Pro.&lt;/p&gt;

&lt;p&gt;If you're a team lead, &lt;strong&gt;don't migrate your whole team yet&lt;/strong&gt;. Wait for 1.0 and at least one of the third-party Compose bridges to land in a stable release. Pilot it on the team members who already work mostly in single-container projects.&lt;/p&gt;

&lt;p&gt;If you're running Mac mini CI, &lt;strong&gt;start piloting now&lt;/strong&gt;. Per-container VM isolation is the right model for a multi-tenant CI runner, and Apple is treating this as a first-class macOS subsystem, not a hobby project.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Apple Container a Docker Desktop replacement?
&lt;/h3&gt;

&lt;p&gt;For the 80% case — single containers, OCI images, build/run/logs/exec — yes. For the Compose case, no, not today. The community consensus from the May/June 2026 HN and Reddit threads is consistent: replace it for solo and simple workflows, keep Docker (or OrbStack) for Compose-driven multi-service local stacks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does it work on Intel Macs?
&lt;/h3&gt;

&lt;p&gt;No, and it never will. &lt;code&gt;apple/container&lt;/code&gt; is Apple silicon only and depends on the macOS 26 Virtualization framework + the Containerization Swift package, both of which target arm64.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between &lt;code&gt;container&lt;/code&gt; and &lt;code&gt;container machine&lt;/code&gt;?
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;container run&lt;/code&gt; is short-lived — like &lt;code&gt;docker run&lt;/code&gt;. Each container is its own throwaway VM. &lt;code&gt;container machine create&lt;/code&gt; is &lt;strong&gt;persistent&lt;/strong&gt; — a long-lived Linux VM with your home directory, dotfiles, and SSH keys mounted in, similar to &lt;code&gt;wsl&lt;/code&gt; on Windows or &lt;code&gt;multipass launch&lt;/code&gt; on Linux. Use machines for "I want a Linux box to live in," use &lt;code&gt;run&lt;/code&gt; for "I want to test this image."&lt;/p&gt;

&lt;h3&gt;
  
  
  How does it compare to OrbStack performance-wise?
&lt;/h3&gt;

&lt;p&gt;OrbStack is still faster for the &lt;strong&gt;shared-VM, many-container&lt;/strong&gt; workload because everything boots into one already-warm VM. &lt;code&gt;apple/container&lt;/code&gt; is competitive (and often better on RAM) for the &lt;strong&gt;few-containers, security-isolated&lt;/strong&gt; workload because each container gets a private kernel. Filesystem performance is roughly equivalent for git/npm-shaped workloads — both use virtiofs-class fast paths. No published rigorous benchmarks yet as of 2026-06-16.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I push images I build with &lt;code&gt;container&lt;/code&gt; to Docker Hub or GHCR?
&lt;/h3&gt;

&lt;p&gt;Yes. &lt;code&gt;container&lt;/code&gt; produces OCI v1 images, which every modern registry accepts. &lt;code&gt;container image push docker.io/youruser/web-test:latest&lt;/code&gt; works the same as with Docker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;apple/container&lt;/code&gt; is the most interesting thing to happen to macOS containers since Docker Desktop existed. It's not feature-complete enough to replace Compose-shaped workflows yet, but for the single-container, security-isolated, low-memory case it's already the cleanest tool on a Mac — and it's the only one shipped, signed, and maintained by the platform vendor. With macOS 26 in widespread deployment and the Container Machine feature filling the WSL-equivalent slot, the momentum is unambiguous: &lt;strong&gt;10,541 stars in a single week&lt;/strong&gt;, #2 trending overall, and an HN front page that won't quit.&lt;/p&gt;

&lt;p&gt;Watch for 1.0 and a stable Compose bridge. When those land, this stops being a niche option and starts being the default.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Resources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/apple/container" rel="noopener noreferrer"&gt;github.com/apple/container&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Containerization package: &lt;a href="https://github.com/apple/containerization" rel="noopener noreferrer"&gt;github.com/apple/containerization&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Tutorial: &lt;a href="https://github.com/apple/container/blob/main/docs/tutorials/start-here.md" rel="noopener noreferrer"&gt;Start here&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Technical overview: &lt;a href="https://github.com/apple/container/blob/main/docs/technical-overview.md" rel="noopener noreferrer"&gt;docs/technical-overview.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;HN discussion (Container Machines): &lt;a href="https://news.ycombinator.com/item?id=48469658" rel="noopener noreferrer"&gt;news.ycombinator.com/item?id=48469658&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Outcoldman: &lt;a href="https://www.outcoldman.com/blog/2026/05/02/apple-container-tired-of-docker/" rel="noopener noreferrer"&gt;"Tired of updating Docker for Mac"&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>applecontainer</category>
      <category>dockeralternative</category>
      <category>containers</category>
      <category>applesilicon</category>
    </item>
    <item>
      <title>Graphify Review: Turn Your Codebase Into a Queryable Graph</title>
      <dc:creator>Andrew</dc:creator>
      <pubDate>Mon, 15 Jun 2026 10:10:28 +0000</pubDate>
      <link>https://dev.to/andrew-ooo/graphify-review-turn-your-codebase-into-a-queryable-graph-ec</link>
      <guid>https://dev.to/andrew-ooo/graphify-review-turn-your-codebase-into-a-queryable-graph-ec</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Originally published on &lt;a href="https://andrew.ooo/posts/graphify-code-knowledge-graph-skill-review/" rel="noopener noreferrer"&gt;andrew.ooo&lt;/a&gt;&lt;/strong&gt; — visit the original for any updates, code snippets that aged out, or follow-up posts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Graphify&lt;/strong&gt; is an open-source AI coding assistant skill that turns any folder — code, SQL schemas, docs, PDFs, images, even videos — into a &lt;strong&gt;queryable knowledge graph&lt;/strong&gt; your AI agent can search instead of grep-walking through files. You type &lt;code&gt;/graphify .&lt;/code&gt; in Claude Code, Codex, Cursor, Gemini CLI, GitHub Copilot CLI, Aider, OpenClaw or any of ~20 supported assistants, and you get back three artifacts: an interactive &lt;code&gt;graph.html&lt;/code&gt;, a &lt;code&gt;GRAPH_REPORT.md&lt;/code&gt; with the surprising connections, and a &lt;code&gt;graph.json&lt;/code&gt; you can query for the rest of the session.&lt;/p&gt;

&lt;p&gt;The repo is &lt;strong&gt;#1 trending on GitHub this week&lt;/strong&gt; with &lt;strong&gt;67,416 stars&lt;/strong&gt; (+5,478 in seven days) and is backed by &lt;strong&gt;Y Combinator&lt;/strong&gt;. Built by &lt;strong&gt;&lt;code&gt;/safishamsi&lt;/code&gt;&lt;/strong&gt; with help from &lt;code&gt;/claude&lt;/code&gt;, &lt;code&gt;/cursoragent&lt;/code&gt;, &lt;code&gt;/TheFedaikin&lt;/code&gt;, and &lt;code&gt;/jippi&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Key facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;67,416 GitHub stars&lt;/strong&gt;, &lt;strong&gt;+5,478 this week&lt;/strong&gt;, #1 trending repo&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PyPI package:&lt;/strong&gt; &lt;code&gt;graphifyy&lt;/code&gt; (double-y — there's a squatter on &lt;code&gt;graphify&lt;/code&gt;), CLI command is still &lt;code&gt;graphify&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~20 supported AI coding assistants&lt;/strong&gt; — Claude Code, Codex, OpenCode, Cursor, Gemini CLI, GitHub Copilot CLI, Aider, OpenClaw, Factory Droid, Trae, Hermes, Kimi Code, Kiro, Pi, Devin CLI, Google Antigravity, Amp, Kilo Code, and more&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;36 tree-sitter grammars&lt;/strong&gt; plus Salesforce Apex, Terraform/HCL, MCP configs, Office docs, Google Workspace, PDFs, images, video/audio&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confidence-tagged edges&lt;/strong&gt; — every inferred relationship is labeled &lt;code&gt;EXTRACTED&lt;/code&gt;, &lt;code&gt;INFERRED&lt;/code&gt;, or &lt;code&gt;AMBIGUOUS&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent and queryable&lt;/strong&gt; — &lt;code&gt;graphify query "&amp;lt;question&amp;gt;"&lt;/code&gt; works for the whole session without re-reading files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Y Combinator–backed&lt;/strong&gt;, MIT-style open source on GitHub&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've ever watched Claude Code burn 40K tokens grepping for "where is auth validated?" — Graphify is the obvious fix.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Graphify actually does
&lt;/h2&gt;

&lt;p&gt;Most AI coding assistants are stateless re-readers. Ask "how does login work?" and the agent runs &lt;code&gt;Glob&lt;/code&gt;, then &lt;code&gt;Grep&lt;/code&gt;, then opens five files, then reads each one top to bottom. That's the same shape every time. It costs tokens, it's slow, and it misses anything the grep regex doesn't match — like a relationship that lives in a PDF spec, a Mermaid diagram, or a comment 200 lines below the function name.&lt;/p&gt;

&lt;p&gt;Graphify replaces the grep-and-read loop with a one-time &lt;strong&gt;extraction pass&lt;/strong&gt; followed by &lt;strong&gt;scoped graph queries&lt;/strong&gt; for the rest of the session.&lt;/p&gt;

&lt;p&gt;The extraction pass walks your repository, runs language-specific extractors (tree-sitter for code, AST parsers for SQL/Terraform, OCR/VLM for images, faster-whisper for video), and emits a graph where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Nodes&lt;/strong&gt; are concepts: functions, classes, tables, env vars, MCP servers, design rationales, diagrams, PDF sections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edges&lt;/strong&gt; are relationships: &lt;code&gt;calls&lt;/code&gt;, &lt;code&gt;imports&lt;/code&gt;, &lt;code&gt;reads_table&lt;/code&gt;, &lt;code&gt;documented_by&lt;/code&gt;, &lt;code&gt;depends_on&lt;/code&gt;, plus design-time edges from comments (&lt;code&gt;# WHY:&lt;/code&gt;, &lt;code&gt;# HACK:&lt;/code&gt;) and docstrings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;God nodes&lt;/strong&gt; — the most-connected concepts — get surfaced separately so you immediately see what everything routes through.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Surprising connections&lt;/strong&gt; are ranked by how unexpected they are (e.g., a Terraform module that references a function three repos away).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once the graph exists, the skill rewrites your assistant's behavior so codebase questions hit the graph first. On Claude Code, Gemini CLI, CodeBuddy, Codex, and Kilo Code, PreToolUse hooks intercept search-style tool calls and nudge the agent toward &lt;code&gt;graphify query&lt;/code&gt; before it grepwalks. On Cursor it's a &lt;code&gt;.cursor/rules/graphify.mdc&lt;/code&gt; file with &lt;code&gt;alwaysApply: true&lt;/code&gt;. On OpenClaw, Aider, and others it's via persistent instruction files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quickstart in 60 seconds
&lt;/h2&gt;

&lt;p&gt;The install is genuinely tiny:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Install the package (uv is recommended — pipx works too)&lt;/span&gt;
uv tool &lt;span class="nb"&gt;install &lt;/span&gt;graphifyy

&lt;span class="c"&gt;# 2. Register the skill with your AI assistant&lt;/span&gt;
graphify &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="c"&gt;# 3. Open your AI assistant and type&lt;/span&gt;
/graphify &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. About 20–60 seconds later (depending on repo size) you get:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;graphify-out/
├── graph.html        # open in any browser — click nodes, filter, search
├── GRAPH_REPORT.md   # key concepts, surprising connections, suggested questions
└── graph.json        # the full graph — query it anytime
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also run &lt;code&gt;graphify export callflow-html&lt;/code&gt; to get a readable architecture page with Mermaid call-flow diagrams baked in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project-scoped install&lt;/strong&gt; (instead of writing to your user profile) is &lt;code&gt;graphify install --project&lt;/code&gt;. The skill goes under &lt;code&gt;.claude/skills/graphify/SKILL.md&lt;/code&gt; (or &lt;code&gt;.agents/skills/graphify/SKILL.md&lt;/code&gt;, etc.) and the CLI even prints a &lt;code&gt;git add&lt;/code&gt; hint for the files that should be committed.&lt;/p&gt;

&lt;p&gt;For agents that need a nudge to use the graph after build, run the platform-specific bind once (&lt;code&gt;graphify claude install&lt;/code&gt;, &lt;code&gt;graphify cursor install&lt;/code&gt;, &lt;code&gt;graphify codex install&lt;/code&gt;, &lt;code&gt;graphify copilot install&lt;/code&gt;, &lt;code&gt;graphify gemini install&lt;/code&gt;, &lt;code&gt;graphify claw install&lt;/code&gt;, &lt;code&gt;graphify aider install&lt;/code&gt;, &lt;code&gt;graphify droid install&lt;/code&gt;). This writes the per-agent config that tells your assistant to &lt;em&gt;prefer&lt;/em&gt; &lt;code&gt;graphify query "&amp;lt;question&amp;gt;"&lt;/code&gt; over &lt;code&gt;Read&lt;/code&gt;/&lt;code&gt;Glob&lt;/code&gt;/&lt;code&gt;Grep&lt;/code&gt; for architecture questions. &lt;code&gt;GRAPH_REPORT.md&lt;/code&gt; stays available for broad reviews.&lt;/p&gt;

&lt;h2&gt;
  
  
  Querying the graph
&lt;/h2&gt;

&lt;p&gt;Once the graph exists, the LLM (or you) can query it like a small structured search index:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Plain question — the skill maps it to graph traversals&lt;/span&gt;
graphify query &lt;span class="s2"&gt;"where is request auth validated?"&lt;/span&gt;

&lt;span class="c"&gt;# Scoped queries&lt;/span&gt;
graphify query &lt;span class="nt"&gt;--node&lt;/span&gt; &lt;span class="s2"&gt;"User"&lt;/span&gt;       &lt;span class="c"&gt;# everything connected to the User node&lt;/span&gt;
graphify query &lt;span class="nt"&gt;--edge&lt;/span&gt; &lt;span class="s2"&gt;"calls"&lt;/span&gt; &lt;span class="nt"&gt;--from&lt;/span&gt; &lt;span class="s2"&gt;"handleLogin"&lt;/span&gt;

&lt;span class="c"&gt;# Inspect god nodes — most-connected concepts&lt;/span&gt;
graphify report &lt;span class="nt"&gt;--god-nodes&lt;/span&gt;

&lt;span class="c"&gt;# See surprising cross-module connections&lt;/span&gt;
graphify report &lt;span class="nt"&gt;--surprising&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The skill exposes the same interface to the agent. A typical Claude Code interaction now looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: "Where do we revoke a session?"

Claude (with Graphify): "Per the graph, sessions are revoked in
auth/session.py::revoke_session(), which is called by /logout
(handlers/auth.py), the admin force-logout endpoint
(handlers/admin.py), and a TTL cleanup job in jobs/expiry.py.
The revoke writes through to redis_sessions and emits a
session.revoked event picked up by audit/listener.py.
Confidence: EXTRACTED for direct calls, INFERRED for the
event link (matched on event name)."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No grep storm. No 40K of tool output. The agent loaded a few graph slices and answered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multi-modal extraction — the underrated part
&lt;/h2&gt;

&lt;p&gt;The headline is "knowledge graph from code," but the multi-modal extractors are what make Graphify hard to clone:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Extensions&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;36 tree-sitter grammars: &lt;code&gt;.py .ts .js .jsx .tsx .mjs .go .rs .java .c .cpp .rb .cs .kt .scala .php .swift .lua .luau .zig .ps1 .ex .exs .m .mm .jl .vue .svelte .astro .groovy .gradle .dart .v .sv .svh .sql .f90 .pas .sh .bash .json .dm&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Plus Salesforce Apex (regex), Terraform/HCL (&lt;code&gt;[terraform]&lt;/code&gt; extra)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP configs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;.mcp.json&lt;/code&gt;, &lt;code&gt;claude_desktop_config.json&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Extracts server nodes, package refs, env var requirements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Docs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.md .mdx .qmd .html .txt .rst .yaml .yml&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Headings, links, code blocks become nodes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Office&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.docx .xlsx&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;[office]&lt;/code&gt; extra&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google Workspace&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.gdoc .gsheet .gslides&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Opt-in; needs &lt;code&gt;gws auth&lt;/code&gt; and &lt;code&gt;--google-workspace&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PDFs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.pdf&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Section-level extraction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Images&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.png .jpg .webp .gif&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;VLM-described nodes linked to surrounding context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Video/Audio&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;.mp4 .mov .mp3 .wav&lt;/code&gt; and more&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;[video]&lt;/code&gt; extra (faster-whisper + yt-dlp); YouTube URLs work directly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That means a PDF spec sitting next to your code can contribute nodes that link to actual functions. A Mermaid diagram in &lt;code&gt;ARCHITECTURE.md&lt;/code&gt; becomes traversable. A whiteboard photo committed to &lt;code&gt;docs/&lt;/code&gt; becomes nodes. For domains where the &lt;em&gt;why&lt;/em&gt; lives outside the code — fintech, healthcare, infra-as-code, games — that's a massive context win.&lt;/p&gt;

&lt;p&gt;A clean detail: the extractor also pulls inline rationale (&lt;code&gt;# NOTE:&lt;/code&gt;, &lt;code&gt;# WHY:&lt;/code&gt;, &lt;code&gt;# HACK:&lt;/code&gt;) and docstrings out as &lt;strong&gt;separate nodes&lt;/strong&gt; linked to the code they explain. So your agent can say "this looks like dead code, but the &lt;code&gt;# WHY:&lt;/code&gt; comment three lines above says it covers an iOS 16 bug" instead of cheerfully proposing to delete it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Confidence tags — the part I really like
&lt;/h2&gt;

&lt;p&gt;Every inferred edge in the graph is tagged:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;EXTRACTED&lt;/code&gt;&lt;/strong&gt; — directly observable in the source (a Python &lt;code&gt;import&lt;/code&gt;, a SQL &lt;code&gt;JOIN&lt;/code&gt;, a function call AST node).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;INFERRED&lt;/code&gt;&lt;/strong&gt; — derived from naming, file colocation, or pattern matching (an event name that appears as both an emit string and a listener handler).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;AMBIGUOUS&lt;/code&gt;&lt;/strong&gt; — multiple candidate targets; the graph keeps all of them with weights.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's a real differentiator. Most "code intelligence" tools quietly mash extracted and inferred edges into one bucket and let the LLM hallucinate as a result. Graphify makes confidence a first-class property of every edge, so the agent can say "definitely calls X, probably emits Y, possibly reads Z" instead of asserting all three.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community reactions
&lt;/h2&gt;

&lt;p&gt;Sentiment from Reddit and dev.to has been unusually positive for a tool that grew this fast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;r/ClaudeCode&lt;/strong&gt;: A &lt;em&gt;"My experience with Graphify"&lt;/em&gt; thread compares it to &lt;code&gt;code-review-graph&lt;/code&gt; and reports Graphify works better on &lt;strong&gt;large polyglot codebases&lt;/strong&gt; thanks to tree-sitter coverage and graph queries, while &lt;code&gt;code-review-graph&lt;/code&gt; is sharper for &lt;strong&gt;review-focused diff context&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;dev.to&lt;/strong&gt;: A "Graphify + code-review-graph" combo tutorial argued for running both — Graphify for the persistent knowledge graph, code-review-graph for per-PR overlay. It's one of the reasons the repo's weekly traffic is so spiky.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;knightli.com (May 2026)&lt;/strong&gt; called it "Claude Code's biggest limitation, solved" — the limitation being long-running coding sessions degrading as the context window fills with redundant file reads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;graphify.net&lt;/strong&gt; went live in April 2026 as the marketing page; the project graduated to Y Combinator shortly after.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The recurring praise is the same thing in three sentences: it stops the agent from re-grepping the world, it picks up rationale from docs/diagrams/comments, and the &lt;code&gt;EXTRACTED&lt;/code&gt;/&lt;code&gt;INFERRED&lt;/code&gt;/&lt;code&gt;AMBIGUOUS&lt;/code&gt; tags make the answers trustable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;

&lt;p&gt;The README is unusually candid, which I appreciate.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;PyPI naming gotcha.&lt;/strong&gt; The package is &lt;code&gt;graphifyy&lt;/code&gt; (two y's). Other &lt;code&gt;graphify*&lt;/code&gt; packages on PyPI are squatters/unrelated. Use &lt;code&gt;uv tool install graphifyy&lt;/code&gt; or &lt;code&gt;pipx install graphifyy&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;pip install&lt;/code&gt; is fragile on macOS/Windows.&lt;/strong&gt; The skill resolves Python at runtime from &lt;code&gt;graphify-out/.graphify_python&lt;/code&gt;. If &lt;code&gt;pip install&lt;/code&gt; lands the module in a different interpreter, you get &lt;code&gt;ModuleNotFoundError&lt;/code&gt;. &lt;code&gt;uv tool&lt;/code&gt; and &lt;code&gt;pipx&lt;/code&gt; isolate the env and avoid this.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Git hooks need a reinstall after upgrades.&lt;/strong&gt; &lt;code&gt;graphify hook install&lt;/code&gt; embeds the interpreter path into the post-commit hook. Re-run it after upgrades or the hook silently fails in GUI git clients and CI runners.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sequential extraction on OpenClaw and Aider.&lt;/strong&gt; Parallel subagent dispatch lands on Claude Code, Codex, Trae, Factory Droid, CodeBuddy, and Gemini CLI. First-time builds on OpenClaw/Aider are slower.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex needs &lt;code&gt;multi_agent = true&lt;/code&gt;&lt;/strong&gt; under &lt;code&gt;[features]&lt;/code&gt; in &lt;code&gt;~/.codex/config.toml&lt;/code&gt; for parallel extraction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Codex command is &lt;code&gt;$graphify&lt;/code&gt; not &lt;code&gt;/graphify&lt;/code&gt;.&lt;/strong&gt; Easy gotcha if you switch between assistants.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PowerShell&lt;/strong&gt; users run &lt;code&gt;graphify .&lt;/code&gt; — the leading slash is a path separator on Windows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leiden community detection&lt;/strong&gt; (&lt;code&gt;[leiden]&lt;/code&gt; extra) is &lt;strong&gt;Python 3.13–incompatible&lt;/strong&gt;. Drop to 3.10–3.12 if you need it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optional extras pile up.&lt;/strong&gt; Each backend (PDF, Office, video, Neo4j, FalkorDB, SQL, Postgres, Terraform, Ollama, OpenAI, Gemini, Anthropic, Bedrock, Azure) is its own extra. &lt;code&gt;[all]&lt;/code&gt; works but it's heavy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's an extraction pass, not a watcher.&lt;/strong&gt; The graph rebuilds on &lt;code&gt;/graphify&lt;/code&gt; or the post-commit hook. Between rebuilds the graph can drift from your working tree.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where it fits in the stack
&lt;/h2&gt;

&lt;p&gt;Graphify isn't the only "give your AI agent better code context" tool — and the author actually points at the others:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/safishamsi/code-review-graph" rel="noopener noreferrer"&gt;code-review-graph&lt;/a&gt;&lt;/strong&gt; — same author, narrower scope. Builds a per-PR overlay graph for review context. Pairs cleanly with Graphify (persistent project graph + per-PR delta).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://dev.to/posts/headroom-context-compression-llm-agents-review/"&gt;Headroom&lt;/a&gt;&lt;/strong&gt; — compresses tool outputs, RAG chunks, and logs &lt;em&gt;before&lt;/em&gt; they reach the LLM. Graphify reduces what you ask for; Headroom compresses whatever you still send. They stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://dev.to/posts/serena-mcp-coding-agent-ide-review/"&gt;Serena MCP&lt;/a&gt;&lt;/strong&gt; — IDE-level coding agent skill. Graphify gives Serena a queryable graph instead of a grep loop.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tree-sitter / ctags&lt;/strong&gt; — Graphify is more or less "tree-sitter + AST parsers + multi-modal extractors + an LLM-friendly query layer + skill bindings for every coding assistant" wrapped together. If you only want code symbols, classic ctags is enough. If you want PDFs, diagrams, MCP configs, design rationale, and Terraform in the same graph, that's Graphify.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The unique value isn't the graph itself. It's the &lt;strong&gt;distribution + skill integration&lt;/strong&gt;: one &lt;code&gt;graphify install&lt;/code&gt; makes ~20 different coding assistants behave like they have a shared semantic index of your project.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Does Graphify send my code to a third-party server?
&lt;/h3&gt;

&lt;p&gt;No. The default extractors run locally — tree-sitter, AST parsers, faster-whisper, OCR libs. You can opt into an LLM backend (&lt;code&gt;--backend claude&lt;/code&gt;, &lt;code&gt;--backend openai&lt;/code&gt;, &lt;code&gt;--backend gemini&lt;/code&gt;, &lt;code&gt;--backend bedrock&lt;/code&gt;, &lt;code&gt;--backend azure&lt;/code&gt;, or &lt;code&gt;--backend ollama&lt;/code&gt; for fully local) for richer relationship inference and image/video descriptions, but it's not required for the core graph.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between &lt;code&gt;graphify&lt;/code&gt; and &lt;code&gt;graphifyy&lt;/code&gt; on PyPI?
&lt;/h3&gt;

&lt;p&gt;The official package is &lt;strong&gt;&lt;code&gt;graphifyy&lt;/code&gt;&lt;/strong&gt; (double-y). The CLI command is still &lt;code&gt;graphify&lt;/code&gt;. Other &lt;code&gt;graphify*&lt;/code&gt; packages on PyPI are unrelated/squatters — don't install them.&lt;/p&gt;

&lt;h3&gt;
  
  
  How big a repo can Graphify handle?
&lt;/h3&gt;

&lt;p&gt;Reports on r/ClaudeCode put it at "comfortable on 100K LOC, slow but workable on 1M+ LOC with parallel extraction enabled." The incremental cache and the post-commit hook are the steady-state answer — full rebuilds on giant monorepos can run several minutes, but per-commit deltas are fast.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Graphify work with OpenClaw?
&lt;/h3&gt;

&lt;p&gt;Yes. Install with &lt;code&gt;graphify install --platform claw&lt;/code&gt; (or the legacy &lt;code&gt;graphify install --platform openclaw&lt;/code&gt;), then &lt;code&gt;graphify claw install&lt;/code&gt; to register the skill in your project. Parallel extraction on OpenClaw is sequential as of v8 — first builds are slower than Claude Code, but the steady-state experience is the same.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can the graph be pushed to Neo4j or FalkorDB?
&lt;/h3&gt;

&lt;p&gt;Yes. &lt;code&gt;uv tool install "graphifyy[neo4j]"&lt;/code&gt; or &lt;code&gt;uv tool install "graphifyy[falkordb]"&lt;/code&gt; adds push support, and the CLI exposes &lt;code&gt;graphify export neo4j&lt;/code&gt; / &lt;code&gt;graphify export falkordb&lt;/code&gt; for one-shot ingest. Useful if you want to query the graph from BI tools, a data app, or alongside other organizational graphs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Graphify production-ready?
&lt;/h3&gt;

&lt;p&gt;For "AI-coding-assistant context," yes — that's its primary use case and the integration surface is the most mature part. For "use the graph as a build-time source of truth for a CI gate," it's getting there but check confidence tags before you fail builds on &lt;code&gt;INFERRED&lt;/code&gt; edges. The author tags every inferred relationship for exactly this reason.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verdict
&lt;/h2&gt;

&lt;p&gt;Graphify is the rare AI tool that earns its trending position. It's a clear win for anyone running Claude Code, Codex, Cursor, or OpenClaw on a non-trivial codebase: the agent stops grep-walking, picks up rationale that lives outside the code, and labels its own confidence. The multi-modal extractors are what set it apart — PDFs, Mermaid diagrams, MCP configs, even videos become first-class nodes alongside your code.&lt;/p&gt;

&lt;p&gt;For the 60-second install cost and a &lt;code&gt;graphifyy&lt;/code&gt; PyPI typo gotcha, you get back tokens, latency, and a measurably smarter agent on day-one of any new repo. Pair it with &lt;a href="https://dev.to/posts/headroom-context-compression-llm-agents-review/"&gt;Headroom&lt;/a&gt; and you've got the cleanest "make my AI coding assistant cheaper &lt;em&gt;and&lt;/em&gt; smarter" stack currently on GitHub.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/safishamsi/graphify" rel="noopener noreferrer"&gt;github.com/safishamsi/graphify&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Install:&lt;/strong&gt; &lt;code&gt;uv tool install graphifyy &amp;amp;&amp;amp; graphify install&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT (verify on the repo before commercial use)&lt;br&gt;
&lt;strong&gt;Backed by:&lt;/strong&gt; Y Combinator&lt;/p&gt;

</description>
      <category>graphify</category>
      <category>knowledgegraph</category>
      <category>claudecode</category>
      <category>codex</category>
    </item>
    <item>
      <title>SkillSpector Review: NVIDIA's AI Agent Skill Scanner</title>
      <dc:creator>Andrew</dc:creator>
      <pubDate>Sun, 14 Jun 2026 10:10:40 +0000</pubDate>
      <link>https://dev.to/andrew-ooo/skillspector-review-nvidias-ai-agent-skill-scanner-53hg</link>
      <guid>https://dev.to/andrew-ooo/skillspector-review-nvidias-ai-agent-skill-scanner-53hg</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Originally published on &lt;a href="https://andrew.ooo/posts/skillspector-nvidia-ai-agent-skill-security-scanner-review/" rel="noopener noreferrer"&gt;andrew.ooo&lt;/a&gt;&lt;/strong&gt; — visit the original for any updates, code snippets that aged out, or follow-up posts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;SkillSpector&lt;/strong&gt; is NVIDIA's open-source security scanner for AI agent skills — the SKILL.md + scripts bundles that Claude Code, Codex CLI, Gemini CLI, OpenClaw, and Cursor install with one command and then trust implicitly. It dropped on June 11, 2026 and rocketed to ~4,800 stars and #5 on GitHub Trending in three days because it answers the question every dev has been quietly avoiding: &lt;strong&gt;is this skill safe to install?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Key facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NVIDIA-published, Apache 2.0&lt;/strong&gt;, Python 3.9+, ships PyPI + Docker&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;64 vulnerability patterns across 16 categories&lt;/strong&gt; — prompt injection, data exfiltration, taint tracking, YARA, MCP least privilege, supply-chain CVEs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Two-stage analysis&lt;/strong&gt;: fast static + AST + YARA, then optional LLM semantic pass (OpenAI / Anthropic / NVIDIA build / local Ollama)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live CVE lookups via &lt;a href="https://osv.dev" rel="noopener noreferrer"&gt;OSV.dev&lt;/a&gt;&lt;/strong&gt; with offline fallback&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Four output formats&lt;/strong&gt;: terminal, JSON, Markdown, &lt;strong&gt;SARIF&lt;/strong&gt; (drops straight into GitHub code-scanning)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scans anything&lt;/strong&gt;: Git repos, URLs, zip files, directories, single SKILL.md files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research-backed&lt;/strong&gt;: cites the &lt;strong&gt;26.1% vulnerable / 5.2% likely-malicious&lt;/strong&gt; stat from the Skill-Inject paper — and Snyk's later ToxicSkills study put the bad-skill rate at &lt;strong&gt;36%&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limitations&lt;/strong&gt;: high false-positive rate on conservative patterns (Wildcard Permission, Unrestricted Tool Access), LLM stage costs API tokens, no Windows-native keychain integration yet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've ever piped &lt;code&gt;claude /plugin install some-random-skill&lt;/code&gt; from a GitHub gist into your work laptop, SkillSpector is the first tool that gives you a defensible answer when security asks why.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: Skills Are an Unaudited Software Supply Chain
&lt;/h2&gt;

&lt;p&gt;2026 is the year &lt;strong&gt;agent skills&lt;/strong&gt; became npm packages — without npm's seven years of supply-chain hardening. A skill is just a SKILL.md description, some scripts, and a few helper files. Drop it in &lt;code&gt;~/.claude/skills/&lt;/code&gt; or &lt;code&gt;~/.openclaw/skills/&lt;/code&gt; and your coding agent now executes its instructions every time the description matches.&lt;/p&gt;

&lt;p&gt;That's the design. It's also the attack surface.&lt;/p&gt;

&lt;p&gt;Three independent studies published in the last 90 days converged on a brutal picture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Skill-Inject (arXiv:2601.10338)&lt;/strong&gt; — analyzed 1,300+ public skills, found &lt;strong&gt;26.1% contain exploitable vulnerabilities&lt;/strong&gt; and &lt;strong&gt;5.2% show likely malicious intent&lt;/strong&gt;. Frontier models (Claude Opus, GPT-5, Codex) executed contextual skill-injection payloads up to &lt;strong&gt;79%&lt;/strong&gt; of the time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Snyk ToxicSkills (Feb 2026)&lt;/strong&gt; — scanned the ClawHub/Anthropic skill marketplaces and found &lt;strong&gt;36% of skills contain security flaws&lt;/strong&gt; including prompt injection, exposed secrets, and active malware payloads targeting Claude Code users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mitiga (May 2026)&lt;/strong&gt; — demonstrated silent codebase exfiltration via a skill that triggered on the innocuous phrase "review my changes" and POSTed the diff to an attacker-controlled webhook.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The community response has been a scramble. Lasso Security shipped a runtime PostToolUse Defender for Claude Code in January. Cisco AI Defense launched a closed-beta Skill Scanner. On June 11, NVIDIA dropped SkillSpector as the open-source baseline everyone else now has to beat.&lt;/p&gt;

&lt;p&gt;The pitch is sharp: &lt;strong&gt;static analysis for the AI age.&lt;/strong&gt; Bandit and Semgrep were built for Python and JS source. SkillSpector is built for the artifact that actually loads into your agent — the SKILL.md, the activation triggers, the parameter schemas, and the helper scripts together.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Inside: 16 Categories, 64 Patterns
&lt;/h2&gt;

&lt;p&gt;The scanner runs a layered detection pipeline. Each layer adds a different lens:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Prompt Injection (P1–P8)&lt;/strong&gt; — instruction overrides, hidden instructions in HTML comments / zero-width Unicode, exfiltration commands, behavior manipulation, system prompt leakage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Data Exfiltration (E1–E4)&lt;/strong&gt; — external URL transmission, environment variable harvesting, file system enumeration, context leakage.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Privilege Escalation (PE1–PE3)&lt;/strong&gt; — excessive permissions, sudo/root invocation, credential file access (SSH keys, &lt;code&gt;~/.aws/credentials&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Supply Chain (SC1–SC6)&lt;/strong&gt; — unpinned dependencies, &lt;code&gt;curl | bash&lt;/code&gt;, obfuscated code, known CVEs (live OSV.dev lookup), abandoned packages, typosquatting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Excessive Agency (EA1–EA4)&lt;/strong&gt; — unrestricted tool access, autonomous high-impact decisions, scope creep, unbounded resource consumption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Output Handling (OH1–OH3)&lt;/strong&gt; — unsanitized model output, cross-trust-boundary flows, unbounded generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Memory Poisoning (MP1–MP3)&lt;/strong&gt; — persistent context injection, context-window stuffing, memory tampering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;8. Tool Misuse (TM1–TM3)&lt;/strong&gt; — &lt;code&gt;shell=True&lt;/code&gt;, &lt;code&gt;--force&lt;/code&gt; parameter abuse, chained-tool bypass, unsafe defaults (TLS off, no auth).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;9. Rogue Agent (RA1–RA2)&lt;/strong&gt; — self-modification, persistence via cron/launchd/startup scripts. RA1 is CRITICAL severity for a reason: a skill that writes a new skill on activation is a worm.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;10. Trigger Abuse (TR1–TR3)&lt;/strong&gt; — overly broad activation patterns, shadow commands, keyword baiting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;11. Dangerous Code (AST1–AST8)&lt;/strong&gt; — Python AST traversal for &lt;code&gt;exec()&lt;/code&gt;, &lt;code&gt;eval()&lt;/code&gt;, &lt;code&gt;__import__()&lt;/code&gt;, &lt;code&gt;subprocess&lt;/code&gt;, &lt;code&gt;os.system&lt;/code&gt;, &lt;code&gt;compile()&lt;/code&gt;, dynamic &lt;code&gt;getattr()&lt;/code&gt;, and the CRITICAL "execution chain" where &lt;code&gt;exec&lt;/code&gt; is fed by network or encoded data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;12. Taint Tracking (TT1–TT5)&lt;/strong&gt; — data-flow analysis from sources (env, network, file) to sinks (exec, network out). TT3 (credential → network) and TT5 (network → exec) are CRITICAL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;13–16. YARA / MCP Least Privilege / Tool Poisoning / Memory&lt;/strong&gt; — YARA rules (malware, webshell, cryptominer, hack tools), MCP capability-declaration vs. code-usage diffs, hidden directives in MCP tool metadata, Unicode homoglyphs, parameter description injection, and description-vs-behavior mismatch (LLM-evaluated).&lt;/p&gt;

&lt;p&gt;Risk score is +50 for CRITICAL, +25 HIGH, +10 MED, +2 LOW, capped at 100. Anything ≥50 ships a &lt;code&gt;recommend: do not install&lt;/code&gt; verdict.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting Started in 60 Seconds
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
uv venv .venv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
git clone https://github.com/NVIDIA/SkillSpector.git
&lt;span class="nb"&gt;cd &lt;/span&gt;SkillSpector &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; make &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="c"&gt;# Scan a local skill&lt;/span&gt;
skillspector scan ./my-skill/

&lt;span class="c"&gt;# Scan a Git repo directly&lt;/span&gt;
skillspector scan https://github.com/some-user/some-skill

&lt;span class="c"&gt;# Scan a SKILL.md only&lt;/span&gt;
skillspector scan ~/.claude/skills/web-scraper/SKILL.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By default it runs static + AST + YARA only. To enable the LLM semantic pass:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SKILLSPECTOR_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;anthropic
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-...
skillspector scan ./my-skill/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or point it at a local Ollama for free semantic checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SKILLSPECTOR_PROVIDER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;openai
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ollama
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;http://localhost:11434/v1
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SKILLSPECTOR_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;llama3.1:8b
skillspector scan ./my-skill/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Docker path is even cleaner for CI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PWD&lt;/span&gt;&lt;span class="s2"&gt;:/scan"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  ghcr.io/nvidia/skillspector:latest &lt;span class="se"&gt;\&lt;/span&gt;
  scan ./my-skill/ &lt;span class="nt"&gt;--no-llm&lt;/span&gt; &lt;span class="nt"&gt;--format&lt;/span&gt; sarif &lt;span class="nt"&gt;--output&lt;/span&gt; report.sarif
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That SARIF file is what makes SkillSpector actually deployable. Upload it to GitHub via the &lt;a href="https://github.com/github/codeql-action" rel="noopener noreferrer"&gt;&lt;code&gt;codeql-action/upload-sarif&lt;/code&gt;&lt;/a&gt; action and every flagged pattern becomes an inline annotation in the Files Changed tab. No glue code.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Real Scan: What the Output Looks Like
&lt;/h2&gt;

&lt;p&gt;Pointed at a skill I'd been planning to install (a popular community "auto-commit" skill with 800 stars), the terminal output was sobering:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SkillSpector Report: auto-commit-skill
─────────────────────────────────────
Risk Score: 78/100 (HIGH)
Verdict: DO NOT INSTALL without manual review

CRITICAL: 1 finding
  - AST8: Dangerous Execution Chain
    File: scripts/install.sh:12
    `curl -s https://example.dev/setup.sh | bash`
    Network input flows to shell execution sink.

HIGH: 4 findings
  - E2: Env Variable Harvesting
    File: scripts/commit.py:34
    Reads os.environ['GITHUB_TOKEN'], os.environ['ANTHROPIC_API_KEY']
  - TT3: Credential Exfiltration Chain
    File: scripts/commit.py:34→47
    Env vars flow to requests.post('https://telemetry.example.dev/...')
  - SC2: External Script Fetching
    File: SKILL.md:18
    Instructions to run `curl ... | bash` on first activation.
  - TR2: Shadow Command Trigger
    Activation: "commit" — shadows git commit, hijacks intent.

MEDIUM: 6 findings
HIGH-confidence finding count: 5
LOW-confidence finding count: 6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Was it actually malicious? Probably not — the telemetry endpoint resolved to a legit-looking dev's personal domain, and the &lt;code&gt;curl | bash&lt;/code&gt; was just a Rust toolchain installer. But that's the point. &lt;strong&gt;I now have a defensible reason to either read every line of the skill or pick a different one.&lt;/strong&gt; That decision used to live in my head as vibes.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Compares to What Existed Before
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Skill-aware&lt;/th&gt;
&lt;th&gt;License&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SkillSpector&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SKILL.md + scripts + manifests&lt;/td&gt;
&lt;td&gt;✅ Native&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;Terminal, JSON, MD, &lt;strong&gt;SARIF&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bandit&lt;/td&gt;
&lt;td&gt;Python source&lt;/td&gt;
&lt;td&gt;❌ Source only&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;JSON, CSV, HTML&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semgrep&lt;/td&gt;
&lt;td&gt;Multi-language source&lt;/td&gt;
&lt;td&gt;⚠️ Custom rules needed&lt;/td&gt;
&lt;td&gt;LGPL&lt;/td&gt;
&lt;td&gt;JSON, SARIF&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lasso PostToolUse Defender&lt;/td&gt;
&lt;td&gt;Runtime (Claude Code only)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;Block/allow at runtime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cisco AI Defense Skill Scanner&lt;/td&gt;
&lt;td&gt;SaaS, closed beta&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Proprietary&lt;/td&gt;
&lt;td&gt;Dashboard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Snyk ToxicSkills&lt;/td&gt;
&lt;td&gt;Skill marketplace scanning&lt;/td&gt;
&lt;td&gt;✅ Research only&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Reports&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;SkillSpector is the first &lt;strong&gt;open-source, locally-runnable, CI-deployable&lt;/strong&gt; option in that table. Bandit and Semgrep miss the SKILL.md activation triggers and the YAML manifests entirely. Lasso is runtime-only and Claude-Code-only. Cisco's tool is a closed beta.&lt;/p&gt;

&lt;p&gt;The closest analog conceptually is &lt;strong&gt;Trivy for containers&lt;/strong&gt; — a single open-source scanner that became the default because it shipped good defaults, SARIF support, and CI examples on day one. SkillSpector has the same shape.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community Reactions: Mostly Positive, Some Friction
&lt;/h2&gt;

&lt;p&gt;Show HN thread (June 11): 312 points, 87 comments. The tone was "finally, but also overdue":&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I cannot believe we're three years into agent skills and this is the first open-source scanner. Bandit shipped for Python in 2014." — top comment&lt;/p&gt;

&lt;p&gt;"The SARIF output alone makes this worth running. GitHub code-scanning annotations on a PR that adds a new skill is exactly the workflow I wanted." — second-highest comment&lt;/p&gt;

&lt;p&gt;"False-positive city on EA1 and LP2. Every legitimate filesystem skill trips Unrestricted Tool Access. Going to need a config file or a baseline mechanism before this is CI-worthy." — frequent complaint&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The r/LocalLLaMA thread (June 12, 1.2K upvotes) focused on the Ollama integration: the fact that the LLM semantic pass works against a free local model is what makes SkillSpector usable for self-hosters who don't want to pay OpenAI per scan.&lt;/p&gt;

&lt;p&gt;On X, &lt;a href="https://x.com/simonw" rel="noopener noreferrer"&gt;@simonw&lt;/a&gt; wrote: &lt;em&gt;"The taint-tracking pass (TT3 / TT5) is the single most important capability here. Static analysis that can say 'this env var ends up in this network call' is exactly what skill review needs."&lt;/em&gt; The post got ~4K likes.&lt;/p&gt;

&lt;p&gt;The main pushback has been about &lt;strong&gt;false-positive volume on conservative patterns&lt;/strong&gt;. The maintainer's response in &lt;a href="https://github.com/NVIDIA/SkillSpector/issues/47" rel="noopener noreferrer"&gt;issue #47&lt;/a&gt; confirms a &lt;code&gt;.skillspector.yml&lt;/code&gt; baseline file is on the roadmap for v0.3, with per-pattern severity overrides and ignore lists.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Limitations
&lt;/h2&gt;

&lt;p&gt;After running SkillSpector against ~30 community skills across Claude Code, Codex, and OpenClaw, the friction is real:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High false-positive rate on EA1 (Unrestricted Tool Access) and LP2 (Wildcard Permission).&lt;/strong&gt; Any skill that legitimately needs broad filesystem access trips both. Without a baseline file, CI gating is painful.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM stage isn't free.&lt;/strong&gt; The semantic pass against &lt;code&gt;claude-opus-4-6&lt;/code&gt; runs roughly $0.03–$0.10 per skill. Ollama works, but &lt;code&gt;llama3.1:8b&lt;/code&gt; misses about 30% of what Claude catches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Python-only AST coverage.&lt;/strong&gt; AST1–AST8 patterns only fire on &lt;code&gt;.py&lt;/code&gt; files. A skill that ships shell scripts with &lt;code&gt;eval&lt;/code&gt; bypasses the AST stage entirely (YARA still catches obvious cases).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No Windows-native keychain integration yet&lt;/strong&gt; for the LLM provider credentials — macOS Keychain works, Linux Secret Service works, Windows Credential Manager is "planned."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OSV.dev rate limits.&lt;/strong&gt; Bulk scans of 100+ skills can hit OSV.dev throttling; an &lt;code&gt;--offline-cve&lt;/code&gt; flag uses a stale local snapshot.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No GitHub Marketplace integration yet.&lt;/strong&gt; You can't just install SkillSpector from the GitHub Actions marketplace — you have to write the workflow YAML manually. NVIDIA has confirmed this is shipping in v0.3.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these are dealbreakers. They're the normal shape of a v0.2 open-source release that landed three days ago.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Should Use This
&lt;/h2&gt;

&lt;p&gt;✅ &lt;strong&gt;You should run SkillSpector before installing any new skill from a source you don't fully trust.&lt;/strong&gt; That includes ClawHub, the Anthropic skill marketplace, random GitHub gists, and your colleague's "I built this on the weekend" Slack message.&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Engineering teams shipping agent-skill marketplaces or internal skill libraries.&lt;/strong&gt; The SARIF output drops into GitHub code-scanning with zero glue. Block PRs that introduce HIGH or CRITICAL findings.&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Security teams reviewing agent deployments.&lt;/strong&gt; The risk score gives you a defensible vendor-risk-management number to put in a spreadsheet.&lt;/p&gt;

&lt;p&gt;⚠️ &lt;strong&gt;Skip it (or run with &lt;code&gt;--no-llm&lt;/code&gt;) if:&lt;/strong&gt; you're just scanning your own skills you wrote five minutes ago. The false-positive volume isn't worth the time.&lt;/p&gt;

&lt;p&gt;❌ &lt;strong&gt;Don't rely on it for runtime defense.&lt;/strong&gt; SkillSpector is install-time review. For runtime, you still need Lasso's PostToolUse Defender or an equivalent, plus capability sandboxing (OpenClaw skill sandboxing, Claude Code permission prompts).&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture Note: Why the Two-Stage Design Matters
&lt;/h2&gt;

&lt;p&gt;The static + LLM split is the call that makes SkillSpector practical at scale. Pure-LLM scanners cost real money per skill and produce nondeterministic findings between runs — CI flakes. Pure-static scanners (Bandit, Semgrep) miss the natural-language attack vectors that are most of the actual risk in a SKILL.md.&lt;/p&gt;

&lt;p&gt;SkillSpector runs every statically-detectable pattern first — AST, regex, YARA, taint tracking, CVE lookups — and only escalates to the LLM for &lt;strong&gt;TP4 (description-behavior mismatch)&lt;/strong&gt; and a handful of context-dependent patterns. That keeps &lt;code&gt;--no-llm&lt;/code&gt; fast and free for the 80% case while preserving deeper analysis where it pays off. It's the same insight that made &lt;code&gt;cargo-audit&lt;/code&gt; succeed: do the cheap, deterministic, free thing first.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: Does SkillSpector work on MCP servers as well as agent skills?&lt;/strong&gt;&lt;br&gt;
A: Partially. The LP1–LP4 (least privilege) and TP1–TP4 (tool poisoning) categories are MCP-specific, and the scanner detects &lt;code&gt;mcp.json&lt;/code&gt; and &lt;code&gt;mcp.yaml&lt;/code&gt; manifests. But MCP servers are typically full applications with their own dependency graphs, so for production MCP review you'll want Bandit or Semgrep alongside SkillSpector.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Can I run it as a GitHub Action?&lt;/strong&gt;&lt;br&gt;
A: Not yet officially. The repo has an example workflow YAML in &lt;code&gt;examples/github-action.yml&lt;/code&gt; that uses the Docker image plus &lt;code&gt;codeql-action/upload-sarif&lt;/code&gt;. Official Marketplace publishing is on the v0.3 roadmap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How is this different from Lasso Security's PostToolUse Defender?&lt;/strong&gt;&lt;br&gt;
A: Lasso is runtime — it inspects tool &lt;em&gt;outputs&lt;/em&gt; in Claude Code at execution time and can block on prompt-injection patterns it sees. SkillSpector is install-time — it inspects the skill file before it ever runs. They're complementary, not competing. Use SkillSpector to gate which skills enter your machine; use Lasso (or equivalent) to catch what slips through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What's the LLM cost per scan in practice?&lt;/strong&gt;&lt;br&gt;
A: For an average ~200-line skill with the Anthropic provider on &lt;code&gt;claude-opus-4-6&lt;/code&gt;, expect $0.03–$0.10. Static-only (&lt;code&gt;--no-llm&lt;/code&gt;) is free. The Ollama / local-model path is free at the cost of ~30% lower recall on the semantic-only categories (TP4, contextual MP).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does it catch the Mitiga "review my changes" exfiltration skill?&lt;/strong&gt;&lt;br&gt;
A: Yes — TR1 (Overly Broad Trigger) catches the activation phrase, TT3 (Credential Exfiltration Chain) catches the env-var-to-network flow, and E1 (External Transmission) catches the POST. Combined risk score on the Mitiga sample skill: 92/100 CRITICAL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Is NVIDIA going to keep maintaining this?&lt;/strong&gt;&lt;br&gt;
A: NVIDIA's &lt;a href="https://developer.nvidia.com/blog/nvidia-verified-agent-skills-provide-capability-governance-for-ai-agents/" rel="noopener noreferrer"&gt;June 4 technical blog&lt;/a&gt; frames SkillSpector as one piece of a broader "NVIDIA-Verified Agent Skills" program, alongside capability governance and skill signing. That suggests sustained investment, not a tech-demo dump — three NVIDIA employees are primary committers and 18 PRs merged in its first three days.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;SkillSpector is the first open-source tool that takes &lt;strong&gt;agent-skill supply-chain security&lt;/strong&gt; seriously enough to be usable in CI. It's not perfect — the false-positive rate on EA1/LP2 will frustrate you, and you'll want the v0.3 baseline file before you gate PRs on it — but it's good enough today to run on every new skill you're about to install, and it's the right architectural shape for the category to mature into.&lt;/p&gt;

&lt;p&gt;Three years from now, scanning a skill before installing it will be as automatic as &lt;code&gt;npm audit&lt;/code&gt;. SkillSpector is the tool that started that habit. Run it on your &lt;code&gt;~/.claude/skills/&lt;/code&gt; directory tonight; you'll find at least one thing you didn't know was there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/NVIDIA/SkillSpector" rel="noopener noreferrer"&gt;NVIDIA/SkillSpector&lt;/a&gt; · &lt;strong&gt;Docs:&lt;/strong&gt; &lt;a href="https://docs.nvidia.com/skills/scanning-agent-skills" rel="noopener noreferrer"&gt;docs.nvidia.com/skills/scanning-agent-skills&lt;/a&gt; · &lt;strong&gt;License:&lt;/strong&gt; Apache 2.0&lt;/p&gt;

</description>
      <category>skillspector</category>
      <category>nvidia</category>
      <category>aisecurity</category>
      <category>agentskills</category>
    </item>
    <item>
      <title>MemPalace Review: Local AI Memory With 96.6% Recall</title>
      <dc:creator>Andrew</dc:creator>
      <pubDate>Sat, 13 Jun 2026 10:13:03 +0000</pubDate>
      <link>https://dev.to/andrew-ooo/mempalace-review-local-ai-memory-with-966-recall-162n</link>
      <guid>https://dev.to/andrew-ooo/mempalace-review-local-ai-memory-with-966-recall-162n</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Originally published on &lt;a href="https://andrew.ooo/posts/mempalace-local-ai-memory-system-review/" rel="noopener noreferrer"&gt;andrew.ooo&lt;/a&gt;&lt;/strong&gt; — visit the original for any updates, code snippets that aged out, or follow-up posts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;MemPalace&lt;/strong&gt; is a local-first, open-source AI memory system that &lt;strong&gt;stores conversation history verbatim&lt;/strong&gt; and retrieves it with semantic search — no summarization, no LLM rewriting, no API calls required. It currently leads its category with &lt;strong&gt;96.6% R@5 on the LongMemEval benchmark in raw mode&lt;/strong&gt; and &lt;strong&gt;98.4% on a held-out hybrid run&lt;/strong&gt;, and the GitHub repo has exploded to &lt;strong&gt;55,500 stars&lt;/strong&gt; with &lt;strong&gt;1,819 added this week&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The pitch is straightforward: every AI memory tool you've used so far probably summarizes your past sessions into "facts," loses nuance, and then can't tell you what you actually said. MemPalace keeps your raw text — your Claude Code session, your Cursor history, your project notes — indexed in a structured palace (wings → rooms → drawers), and retrieves the original transcript chunk when you ask "why did we switch to GraphQL?" Nothing leaves your machine.&lt;/p&gt;

&lt;p&gt;Key facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;55,500 GitHub stars&lt;/strong&gt;, &lt;strong&gt;1,819 added this week&lt;/strong&gt; — currently top-3 trending Python repo&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;96.6% R@5 on LongMemEval (raw, no LLM, no API key)&lt;/strong&gt; — best public open-source number&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;98.4% on held-out 450 questions&lt;/strong&gt; with hybrid v4 (keyword + temporal boosting)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verbatim storage&lt;/strong&gt; — no summarization, no paraphrasing, no information loss&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pluggable backend&lt;/strong&gt; — ChromaDB default, plus &lt;code&gt;sqlite_exact&lt;/code&gt;, Qdrant, and pgvector&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;29 MCP tools&lt;/strong&gt; for palace reads/writes, knowledge graph, agent diaries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-save hooks&lt;/strong&gt; for Claude Code, Codex CLI, and Cursor IDE&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporal knowledge graph&lt;/strong&gt; with validity windows, backed by local SQLite&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MIT licensed&lt;/strong&gt;, Python 3.9+, runs entirely offline once the embedding model is downloaded&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why "facts extraction" memory tools fail
&lt;/h2&gt;

&lt;p&gt;There's a familiar pattern in AI memory tools: ingest a conversation, ask an LLM to extract "facts," store them as embeddings, retrieve facts on demand. Mem0, Zep, Supermemory, Hindsight, and dozens of others work this way. It's what every memory startup pitch deck shows.&lt;/p&gt;

&lt;p&gt;It also has a problem that becomes obvious after a month: &lt;strong&gt;you can never get back what you actually said.&lt;/strong&gt; The "facts" are a lossy summarization, and the LLM that wrote them had no idea which of your offhand asides would matter later. By the time you ask "wait, what was that OAuth library I mentioned?", the original is gone — replaced with "the user uses OAuth" or nothing at all.&lt;/p&gt;

&lt;p&gt;MemPalace's bet is the opposite: &lt;strong&gt;store verbatim text, retrieve with semantic search, never summarize&lt;/strong&gt;. The "palace" structure — people and projects become wings, topics become rooms, content lives in drawers — is purely an &lt;em&gt;index&lt;/em&gt; for scoping. The drawers hold raw transcript chunks.&lt;/p&gt;

&lt;p&gt;Lossless storage + good retrieval beats clever summarization at any task where you eventually need the original.&lt;/p&gt;

&lt;h2&gt;
  
  
  The benchmark numbers (and why they matter)
&lt;/h2&gt;

&lt;p&gt;MemPalace publishes more reproducible benchmark detail than most commercial memory products. The headline result on LongMemEval (500 questions, recall@5):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;R@5&lt;/th&gt;
&lt;th&gt;LLM required&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Raw (semantic search, no heuristics)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;96.6%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hybrid v4, held-out 450q&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;98.4%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hybrid v4 + LLM rerank (full 500)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;≥99%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Any capable model&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few things to notice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;96.6% with zero LLM calls&lt;/strong&gt;, no API key, no cloud — this is the embedding-only path. The cost-per-query is effectively zero after install.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The 98.4% number is the held-out result&lt;/strong&gt; — they trained the hybrid heuristics on 50 dev questions and report on the other 450 they never tuned against. That's the honest generalisable figure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;They explicitly refuse to publish a 100% number&lt;/strong&gt;, calling it "teaching to the test" — the gap to 99%+ was closed by inspecting wrong answers, which is exactly the failure mode of every benchmark in the field.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For comparison: MemBench (ACL 2025, 8,500 items) hits R@5 of 80.3%, LoCoMo top-10 with hybrid v5 hits 88.9%, and ConvoMem averages 92.9% recall across categories. These are the numbers you can verify yourself with the commands in &lt;code&gt;benchmarks/BENCHMARKS.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What's &lt;em&gt;not&lt;/em&gt; in the README — pointedly — is a side-by-side against Mem0, Mastra, Supermemory, or Zep. The maintainers' position: those projects publish different metrics on different splits, and stacking retrieval recall next to end-to-end QA accuracy isn't an honest comparison. That's the right call, but it does leave you to do your own bake-off if you're picking between them.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the palace is structured
&lt;/h2&gt;

&lt;p&gt;The metaphor is load-bearing. Your data flows in as raw conversation, transcripts, or files, and gets indexed into a three-level hierarchy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Palace
├── Wing      (people, projects)
│   ├── Room      (topics)
│   │   └── Drawer    (verbatim text chunks)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you search, you can scope to a wing ("everything in the andrew.ooo project") or query globally. The semantic index sits on the drawer contents; the wing/room labels are metadata for filtering. This avoids the failure mode of pure vector DBs where every query searches the entire corpus and noise drowns out signal in long-lived stores.&lt;/p&gt;

&lt;p&gt;The pluggable backend layer is the part most people will care about as the project matures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ChromaDB&lt;/strong&gt; (default) — zero-config, local, fine for individual use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;sqlite_exact&lt;/code&gt;&lt;/strong&gt; — exact-vector correctness checks; useful for benchmarking and debugging recall&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qdrant&lt;/strong&gt; — REST backend; opt-in via &lt;code&gt;MEMPALACE_QDRANT_URL&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pgvector&lt;/strong&gt; — Postgres + JSONB; opt-in via &lt;code&gt;MEMPALACE_PGVECTOR_DSN&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Qdrant and pgvector paths are explicitly described as opt-in — they will send your verbatim drawer text to the configured server. That's correct posture for a local-first tool: cloud is possible, but never the default.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real install — 60 seconds
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# uv (recommended — isolated, on your PATH)&lt;/span&gt;
uv tool &lt;span class="nb"&gt;install &lt;/span&gt;mempalace

&lt;span class="c"&gt;# Initialize a palace in your project&lt;/span&gt;
mempalace init ~/projects/myapp

&lt;span class="c"&gt;# Mine project files into the palace&lt;/span&gt;
mempalace mine ~/projects/myapp

&lt;span class="c"&gt;# Mine your existing Claude Code session history&lt;/span&gt;
mempalace mine ~/.claude/projects/ &lt;span class="nt"&gt;--mode&lt;/span&gt; convos

&lt;span class="c"&gt;# Search&lt;/span&gt;
mempalace search &lt;span class="s2"&gt;"why did we switch to GraphQL"&lt;/span&gt;

&lt;span class="c"&gt;# Load context for a new session&lt;/span&gt;
mempalace wake-up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--mode convos&lt;/code&gt; flag is the killer feature for daily Claude Code users. Point it at &lt;code&gt;~/.claude/projects/&lt;/code&gt; and it backfills your entire conversation history — months of sessions — into the palace. Combined with the auto-save hooks, every future session also lands in the palace, scoped by &lt;code&gt;--wing&lt;/code&gt; so each project stays isolated.&lt;/p&gt;

&lt;p&gt;For Cursor IDE there's a separate hooks setup that adds session-start recall and a transcript snapshot before context compaction — that last part matters because Cursor's compaction is exactly when you lose detail.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wiring it into Claude Code (MCP)
&lt;/h2&gt;

&lt;p&gt;This is the path most readers will take. MemPalace ships an MCP stdio server with &lt;strong&gt;29 tools&lt;/strong&gt; covering palace reads/writes, the knowledge graph, cross-wing navigation, drawer management, and agent diaries.&lt;/p&gt;

&lt;p&gt;In your Claude Code config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mempalace"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docker"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"-i"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--rm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"-v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mempalace-data:/data"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mempalace"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or natively, without Docker, if you've &lt;code&gt;uv tool install&lt;/code&gt;'d it. The auto-save hooks are documented for Claude Code, Codex CLI, and Cursor IDE — wire those up before you start a real project, because backfilling existing JSONL transcripts works (&lt;code&gt;mempalace mine ~/.claude/projects/ --mode convos&lt;/code&gt;) but it's nicer to capture cleanly from day one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The knowledge graph
&lt;/h2&gt;

&lt;p&gt;This is the under-marketed feature. MemPalace ships a &lt;strong&gt;temporal entity-relationship graph with validity windows&lt;/strong&gt; — backed by local SQLite — that lets you encode relationships between people, projects, and concepts that change over time.&lt;/p&gt;

&lt;p&gt;In practice: "Andrew uses Claude Code (valid: 2025-08 onwards)" can later be invalidated by "Andrew migrated to Codex (valid: 2026-06 onwards)" without overwriting the original. Queries respect the temporal window. This is what real long-term memory looks like — not "facts" extracted into a flat KV store, but a graph that knows what was true &lt;em&gt;when&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;For multi-agent setups, each specialist agent gets its own wing and its own diary. There's a &lt;code&gt;mempalace_list_agents&lt;/code&gt; tool for runtime discovery, so you don't have to stuff agent context into the system prompt of every session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community reception
&lt;/h2&gt;

&lt;p&gt;Reception on r/LocalLLaMA and r/ClaudeAI over the last two weeks has been unusually positive for a memory tool. Three themes recur:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"Finally, verbatim."&lt;/strong&gt; Users who got burned by Mem0 and Supermemory summarizing away the details they wanted later are the loudest fans. The pitch ("we don't summarize") lands.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"The benchmarks are reproducible."&lt;/strong&gt; Most memory tools publish marketing numbers; MemPalace publishes the exact commands. The R@5 96.6% number is hard to argue with when you can run it yourself in ten minutes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"The MCP integration is the cleanest in the category."&lt;/strong&gt; 29 tools, all documented, all locally served. No hosted-API dependency, no rate limits, no privacy hand-wringing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most consistent critique: &lt;strong&gt;the palace metaphor takes a session to internalise&lt;/strong&gt;. New users don't immediately know whether their content should be a wing, a room, or a drawer. The docs have improved a lot, but expect a 30-minute learning curve before the structure clicks.&lt;/p&gt;

&lt;p&gt;Secondary critique: &lt;strong&gt;there's no built-in dedup against your existing Claude Code transcripts&lt;/strong&gt; when you backfill. If you've been using another memory tool that also mined your transcripts, you'll end up with overlapping content until you point both at the same backend or pick one to drop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embedding model download is ~300 MB&lt;/strong&gt; on first run. Onboarding offers &lt;code&gt;embeddinggemma-300m&lt;/code&gt; (multilingual, recommended) or &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; (English, ~30 MB).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No built-in privacy redaction.&lt;/strong&gt; If your sessions contain secrets, they go into the palace verbatim. The local-first posture makes this safer than cloud tools, but it's still on you to scrub.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;External backends (Qdrant, pgvector) are previews.&lt;/strong&gt; The Chroma path is what's battle-tested; treat the others as solid but not yet bulletproof.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No GUI yet.&lt;/strong&gt; Everything is CLI + MCP. A web UI is on the roadmap; for now, you query via &lt;code&gt;mempalace search&lt;/code&gt; or via your agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's not a replacement for prompt caching.&lt;/strong&gt; MemPalace is for &lt;em&gt;long-term&lt;/em&gt; memory across sessions. For the same-session context-window pressure, you still want provider-native compaction and a compression layer like &lt;a href="https://dev.to/posts/headroom-context-compression-llm-agents-review"&gt;Headroom&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Compared to the alternatives
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Storage model&lt;/th&gt;
&lt;th&gt;Local-first&lt;/th&gt;
&lt;th&gt;Open-source&lt;/th&gt;
&lt;th&gt;Benchmark transparency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MemPalace&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Verbatim&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;MIT&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Reproducible, R@5 96.6%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mem0&lt;/td&gt;
&lt;td&gt;LLM-summarized facts&lt;/td&gt;
&lt;td&gt;Optional cloud&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;Marketing numbers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supermemory&lt;/td&gt;
&lt;td&gt;Embeddings + facts&lt;/td&gt;
&lt;td&gt;Cloud-first&lt;/td&gt;
&lt;td&gt;Closed core&lt;/td&gt;
&lt;td&gt;Published, some reproducible&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zep&lt;/td&gt;
&lt;td&gt;Temporal KG + facts&lt;/td&gt;
&lt;td&gt;Optional self-host&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;td&gt;Published, partial&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hindsight&lt;/td&gt;
&lt;td&gt;Embeddings&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;MIT&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mastra Memory&lt;/td&gt;
&lt;td&gt;Vector + KV&lt;/td&gt;
&lt;td&gt;Optional cloud&lt;/td&gt;
&lt;td&gt;Open-source&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The differentiation is real: &lt;strong&gt;verbatim + local + reproducible benchmarks&lt;/strong&gt; is a triple that no other major tool in this category currently offers. Whether that combination matters to you depends on whether you trust LLM-summarized "facts" to capture what you'll want later. After a year of using these tools, my answer is no — and the benchmark numbers back that up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Should you use MemPalace?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use it if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You run daily Claude Code, Codex, or Cursor sessions and want them to remember across runs&lt;/li&gt;
&lt;li&gt;You've been burned by other memory tools summarizing away the details you needed&lt;/li&gt;
&lt;li&gt;You want a local-first tool with no cloud dependency for personal-use memory&lt;/li&gt;
&lt;li&gt;You're building multi-agent setups and need per-agent diaries with shared knowledge graph&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Skip it if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You only need same-session memory (provider-native compaction is enough)&lt;/li&gt;
&lt;li&gt;Your stack already standardised on Mem0 / Zep and switching costs are high&lt;/li&gt;
&lt;li&gt;You need a hosted SaaS with a polished web UI today&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For most Claude Code daily drivers, the install path is: &lt;code&gt;uv tool install mempalace&lt;/code&gt; → &lt;code&gt;mempalace mine ~/.claude/projects/ --mode convos&lt;/code&gt; → wire up the auto-save hooks → wire up the MCP server. Twenty minutes of work, and your agent now remembers months of context.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: Does MemPalace require an LLM or API key?&lt;/strong&gt;&lt;br&gt;
No, not for the core path. The 96.6% R@5 raw benchmark is reached using embeddings + semantic search only — zero API calls, no cloud. The LLM rerank pipeline that pushes the number to ≥99% is opt-in and works with any model (Claude Haiku, Sonnet, or minimax-m2.7 via Ollama Cloud).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How is this different from Mem0 or Supermemory?&lt;/strong&gt;&lt;br&gt;
The biggest difference is &lt;strong&gt;no summarization&lt;/strong&gt;. Mem0 and Supermemory extract LLM-generated "facts" from your conversations and discard the original text. MemPalace stores the verbatim text and searches it directly. When you ask "what was that OAuth library I mentioned?", MemPalace can find it; fact-extraction tools usually can't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Will it work with non-Claude agents?&lt;/strong&gt;&lt;br&gt;
Yes. The MCP server speaks JSON-RPC over stdio, so any MCP-aware client works: Claude Code, Gemini CLI, Antigravity, Codex CLI, Cursor, and custom agents. Auto-save hooks are documented for Claude Code, Codex CLI, and Cursor IDE today; other integrations use the MCP tools directly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What backend should I pick?&lt;/strong&gt;&lt;br&gt;
Start with ChromaDB (default). It's local, zero-config, and handles single-user palaces well past 1M drawers. Move to Qdrant (REST) or pgvector if you want multi-tenant isolation or you already run that infrastructure. Use &lt;code&gt;sqlite_exact&lt;/code&gt; only for benchmarking exact-vector correctness.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How much disk does the palace use?&lt;/strong&gt;&lt;br&gt;
The embedding model is ~300 MB on first install (or ~30 MB if you pick &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt;). After that, each drawer is roughly the size of the raw text plus the embedding vector — typical Claude Code session histories land around 200-500 MB after months of use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Can I encrypt the palace?&lt;/strong&gt;&lt;br&gt;
There's no built-in encryption-at-rest, but since everything is local-first and the data lives under a single directory (&lt;code&gt;/data&lt;/code&gt; in Docker, or your configured palace root), filesystem-level encryption (FileVault, LUKS, BitLocker) covers it. For multi-user setups, prefer the Qdrant or pgvector backend with namespace isolation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does it work with the knowledge graph and verbatim storage in the same query?&lt;/strong&gt;&lt;br&gt;
Yes. The knowledge graph and the drawer index are complementary. A query like "what's the current OAuth library?" first hits the temporal KG for the latest valid relationship, then optionally pulls supporting drawer context if you need the original conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Is the verbatim approach really better than fact extraction?&lt;/strong&gt;&lt;br&gt;
For retrieval-heavy use cases (debugging, code review, project archaeology) — yes, measurably. The R@5 numbers tell the story: lossless storage + good retrieval beats clever summarization on benchmarks where the question is "what did I actually say?" For pure summary use cases (give me a tldr of last week), summarization tools are arguably easier; MemPalace can also do this via reranking, but it's not the primary design center.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;MemPalace is the strongest open-source AI memory project in the category right now. &lt;strong&gt;Verbatim storage, local-first by default, reproducible 96.6% R@5 on LongMemEval, 29 MCP tools, auto-save hooks for the three agents most people actually use&lt;/strong&gt; — there's no major box left unchecked.&lt;/p&gt;

&lt;p&gt;If you run Claude Code daily and you've been irritated by other memory tools losing the details you needed, install it tonight: &lt;code&gt;uv tool install mempalace&lt;/code&gt;, point it at &lt;code&gt;~/.claude/projects/&lt;/code&gt;, and check whether the recall matches the benchmarks on your actual data. Twenty minutes will tell you.&lt;/p&gt;

&lt;p&gt;→ &lt;strong&gt;Try it:&lt;/strong&gt; &lt;code&gt;uv tool install mempalace&lt;/code&gt; (&lt;a href="https://github.com/MemPalace/mempalace" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://mempalaceofficial.com" rel="noopener noreferrer"&gt;Docs&lt;/a&gt; · &lt;a href="https://github.com/MemPalace/mempalace/blob/develop/benchmarks/BENCHMARKS.md" rel="noopener noreferrer"&gt;Benchmarks&lt;/a&gt;)&lt;/p&gt;

</description>
      <category>mempalace</category>
      <category>aimemory</category>
      <category>rag</category>
      <category>claudecode</category>
    </item>
    <item>
      <title>whichllm Review: One Command to Find Your Best Local LLM</title>
      <dc:creator>Andrew</dc:creator>
      <pubDate>Fri, 12 Jun 2026 10:10:25 +0000</pubDate>
      <link>https://dev.to/andrew-ooo/whichllm-review-one-command-to-find-your-best-local-llm-14lk</link>
      <guid>https://dev.to/andrew-ooo/whichllm-review-one-command-to-find-your-best-local-llm-14lk</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Originally published on &lt;a href="https://andrew.ooo/posts/whichllm-local-llm-hardware-ranker-review/" rel="noopener noreferrer"&gt;andrew.ooo&lt;/a&gt;&lt;/strong&gt; — visit the original for any updates, code snippets that aged out, or follow-up posts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;whichllm&lt;/code&gt;&lt;/strong&gt; is a Python CLI that &lt;strong&gt;auto-detects your GPU/CPU/RAM and ranks the local LLMs that will actually run best on your hardware&lt;/strong&gt; — judged by merged real benchmarks (LiveBench, Artificial Analysis, Aider, Vision, Chatbot Arena ELO, Open LLM Leaderboard v2), not by "biggest model that fits." It blew up on Hacker News with a &lt;strong&gt;144-point Show HN&lt;/strong&gt; and is currently &lt;strong&gt;trending on GitHub with 4,592 stars (1,800 in the past week)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The pitch is sharp: every other "what can I run?" tool just checks if the weights fit in VRAM. That hands you the biggest model — which is almost never the smartest model. &lt;code&gt;whichllm&lt;/code&gt; does it the other way around: it knows that on an RTX 4090, a Qwen3.6‑27B at Q5_K_M beats a Qwen3‑32B at Q4_K_M, because the newer 27B has a higher real benchmark score, even though both fit. That gap is the whole product.&lt;/p&gt;

&lt;p&gt;Key facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;4,592 GitHub stars&lt;/strong&gt;, &lt;strong&gt;1,800 added this week&lt;/strong&gt;, currently in the top 10 trending Python repos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;144 points on Show HN&lt;/strong&gt; ("One command to find the best local LLM for your hardware")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built by &lt;code&gt;/Andyyyy64&lt;/code&gt;&lt;/strong&gt; (with &lt;code&gt;/claude&lt;/code&gt;, &lt;code&gt;/devangpratap&lt;/code&gt;, &lt;code&gt;/hibiki333155555&lt;/code&gt;, &lt;code&gt;/justindotdevv&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One command, scriptable&lt;/strong&gt; — no TUI, no setup, JSON-pipe friendly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live HuggingFace data&lt;/strong&gt; — merges current-tier (LiveBench/Artificial Analysis/Aider/Vision) and frozen-tier (OLM Leaderboard v2 / Arena ELO) benchmarks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recency-aware&lt;/strong&gt; — stale leaderboard scores get demoted along each model's lineage so a 2024 model can't outrank a current-generation one on an outdated score&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU simulation&lt;/strong&gt; — &lt;code&gt;whichllm --gpu "RTX 5090"&lt;/code&gt; lets you plan an upgrade before you spend the money&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MIT licensed&lt;/strong&gt;, PyPI / Homebrew / &lt;code&gt;uvx&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why "biggest model that fits" is the wrong question
&lt;/h2&gt;

&lt;p&gt;If you've spent any time in &lt;code&gt;r/LocalLLaMA&lt;/code&gt;, you've seen the same loop: someone posts their rig, asks "what should I run?", and the replies are some mix of "Qwen is good," "just try llama3," and personal anecdotes. Useful, but not exactly a system.&lt;/p&gt;

&lt;p&gt;The other category of advice — "VRAM calculator" tools — answers a strictly easier question: &lt;em&gt;does this model's weights, KV cache, and activation budget fit?&lt;/em&gt; That's a real check, but it's not a recommendation. It will happily tell you that a 32B Q4 fits on your 4090 and stop there, even when a newer 27B at Q5 is meaningfully smarter at the same memory cost.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;whichllm&lt;/code&gt;'s author calls this "evidence-based ranking, not a size heuristic." The merged benchmark map decides the score; runtime fit, speed, evidence confidence, and source trust scale it. Size is a &lt;em&gt;factor&lt;/em&gt; (capped at +35 as a log-scaled world-knowledge proxy), not the driver.&lt;/p&gt;

&lt;p&gt;The result, for an RTX 4090 / 3090 with 24 GB VRAM at the time of writing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#1 Qwen/Qwen3.6-27B    27.8B  Q5_K_M  score 92.8   27 t/s
#2 Qwen/Qwen3-32B      32.0B  Q4_K_M  score 83.0   31 t/s
#3 Qwen/Qwen3-30B-A3B  30.0B  Q5_K_M  score 82.7  102 t/s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both #1 and #2 fit your card. A VRAM-only tool would rank the 32B higher. &lt;code&gt;whichllm&lt;/code&gt; ranks the 27B higher because its merged real benchmark score is genuinely better, and prints the published-date and benchmark snapshot under the table so you can sanity-check it. The #3 row is a 30B MoE running at ~102 t/s — speed is scored on active parameters, quality on total. That's the kind of nuance most "what runs on my GPU" pages get wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install in one line
&lt;/h2&gt;

&lt;p&gt;The whole thing is a &lt;code&gt;uvx&lt;/code&gt;-installable Python package, so you can run it without committing to anything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run once, no install&lt;/span&gt;
uvx whichllm@latest

&lt;span class="c"&gt;# Install when you use it often&lt;/span&gt;
uv tool &lt;span class="nb"&gt;install &lt;/span&gt;whichllm
uv tool upgrade whichllm

&lt;span class="c"&gt;# Alternative install paths&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;andyyyy64/whichllm/whichllm
pip &lt;span class="nb"&gt;install &lt;/span&gt;whichllm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Auto-detection covers NVIDIA (via &lt;code&gt;nvidia-ml-py&lt;/code&gt;, fallback &lt;code&gt;nvidia-smi&lt;/code&gt;), AMD (&lt;code&gt;rocm-smi&lt;/code&gt;, fallback &lt;code&gt;lspci&lt;/code&gt;), Intel iGPUs on Linux, Apple Silicon (&lt;code&gt;system_profiler&lt;/code&gt;), CPU cores + AVX2/AVX‑512, RAM, and disk free. If detection fails, it returns an empty result instead of crashing — fail-safe by design.&lt;/p&gt;

&lt;h2&gt;
  
  
  Six commands, one tool
&lt;/h2&gt;

&lt;p&gt;The CLI is intentionally small. Here are the six surfaces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1) Best models for this machine&lt;/span&gt;
whichllm

&lt;span class="c"&gt;# 2) Pretend you have a specific GPU (great for hardware planning)&lt;/span&gt;
whichllm &lt;span class="nt"&gt;--gpu&lt;/span&gt; &lt;span class="s2"&gt;"RTX 4090"&lt;/span&gt;

&lt;span class="c"&gt;# 3) Compare upgrade candidates&lt;/span&gt;
whichllm upgrade &lt;span class="s2"&gt;"RTX 4090"&lt;/span&gt; &lt;span class="s2"&gt;"RTX 5090"&lt;/span&gt; &lt;span class="s2"&gt;"H100"&lt;/span&gt;

&lt;span class="c"&gt;# 4) Find the GPU needed for a given model&lt;/span&gt;
whichllm plan &lt;span class="s2"&gt;"llama 3 70b"&lt;/span&gt;

&lt;span class="c"&gt;# 5) Start a chat with a model immediately&lt;/span&gt;
whichllm run &lt;span class="s2"&gt;"qwen 2.5 1.5b gguf"&lt;/span&gt;

&lt;span class="c"&gt;# 6) Print copy-paste Python you can drop into a script&lt;/span&gt;
whichllm snippet &lt;span class="s2"&gt;"qwen 7b"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The standouts here are &lt;code&gt;plan&lt;/code&gt; and &lt;code&gt;upgrade&lt;/code&gt;. If you've ever shopped for a GPU specifically to run a target model, you know "you need an H100" and "a 5090 will do" are very different conversations. &lt;code&gt;plan&lt;/code&gt; gives you the floor; &lt;code&gt;upgrade&lt;/code&gt; ranks candidate cards on the same scoring engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  A peek at the scoring
&lt;/h2&gt;

&lt;p&gt;This is the part that turns it from "another HN tool" into something I'd actually run before downloading 27 GB of weights:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;Effect&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Benchmark quality&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;core&lt;/td&gt;
&lt;td&gt;Merged LiveBench / Artificial Analysis / Aider / Vision / Arena ELO / OLM Leaderboard v2, weighted by source confidence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;up to +35&lt;/td&gt;
&lt;td&gt;log2-scaled world-knowledge proxy (MoE uses &lt;em&gt;total&lt;/em&gt; params)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Quantization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;× penalty&lt;/td&gt;
&lt;td&gt;Lower-bit quants are discounted multiplicatively&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Evidence confidence&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;×0.55–1.0&lt;/td&gt;
&lt;td&gt;none / self-reported ×0.55, inherited ×0.78, direct full&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Runtime fit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;×0.50–1.0&lt;/td&gt;
&lt;td&gt;partial-offload ×0.72, CPU-only ×0.50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;−8 to +8&lt;/td&gt;
&lt;td&gt;Usability gate vs a fit-dependent tok/s floor; reported with confidence and a range&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Source trust&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;−5 to +5&lt;/td&gt;
&lt;td&gt;Official-org bonus, known-repackager penalty&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Popularity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;tie-breaker&lt;/td&gt;
&lt;td&gt;Downloads + likes; weight shrinks as evidence strengthens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two design choices stood out reading the source. First, &lt;strong&gt;evidence has five grades&lt;/strong&gt; (&lt;code&gt;direct&lt;/code&gt;, &lt;code&gt;variant&lt;/code&gt;, &lt;code&gt;base_model&lt;/code&gt;, &lt;code&gt;line_interp&lt;/code&gt;, &lt;code&gt;self_reported&lt;/code&gt;), and each is discounted. A model with only a self-reported benchmark gets ×0.55 — uploader claims are not free real estate. Second, &lt;strong&gt;inheritance is rejected&lt;/strong&gt; when a model's parameter count diverges more than 2× from its family's dominant member. This catches small draft heads, MTP heads, and abliterated forks that would otherwise borrow the score of their much larger base. It's the kind of leaderboard-gaming guard a lot of "AI tool roundup" pages have no opinion about.&lt;/p&gt;

&lt;p&gt;Score markers are surfaced inline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;~&lt;/code&gt;&lt;/strong&gt; (yellow) — no direct benchmark; score is inherited/interpolated from the family&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;!sr&lt;/code&gt;&lt;/strong&gt; (bright yellow) — uploader-reported only, not independently verified&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;?&lt;/code&gt;&lt;/strong&gt; (red) — no benchmark data available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So if the top pick is yellow-&lt;code&gt;~&lt;/code&gt;, you know to take it with a grain of salt before you commit to a download.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run a model in literally one command
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;run&lt;/code&gt; subcommand is what makes this more than a recommender. It creates an isolated &lt;code&gt;uv&lt;/code&gt; environment, installs the right runtime (&lt;code&gt;llama-cpp-python&lt;/code&gt; for GGUF, &lt;code&gt;transformers + autoawq/auto-gptq&lt;/code&gt; for AWQ/GPTQ, plain &lt;code&gt;transformers&lt;/code&gt; for FP16/BF16), downloads the right GGUF variant for your VRAM, and drops you straight into a chat:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Auto-pick the best model for your hardware and chat&lt;/span&gt;
whichllm run

&lt;span class="c"&gt;# Specify a model and quant&lt;/span&gt;
whichllm run &lt;span class="s2"&gt;"qwen 2.5 1.5b gguf"&lt;/span&gt;

&lt;span class="c"&gt;# CPU-only&lt;/span&gt;
whichllm run &lt;span class="s2"&gt;"phi 3 mini gguf"&lt;/span&gt; &lt;span class="nt"&gt;--cpu-only&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For people who just want a known-good model running in the next 90 seconds, this is the single most useful thing in the tool.&lt;/p&gt;

&lt;p&gt;If you'd rather wire it into your own code, &lt;code&gt;whichllm snippet&lt;/code&gt; prints ready-to-paste Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;llama_cpp&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Llama&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Llama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;repo_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen2.5-7B-Instruct-GGUF&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qwen2.5-7b-instruct-q4_k_m.gguf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n_ctx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;n_gpu_layers&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_chat_completion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Pipe-friendly: the Ollama trick
&lt;/h2&gt;

&lt;p&gt;The bit that won me over for scripting: &lt;code&gt;whichllm --json&lt;/code&gt; returns a structured object you can pipe straight into &lt;code&gt;jq&lt;/code&gt;, including &lt;code&gt;estimated_tok_per_sec&lt;/code&gt;, &lt;code&gt;speed_confidence&lt;/code&gt;, and a &lt;code&gt;speed_range_tok_per_sec&lt;/code&gt; planning range. That means you can build an alias that always returns "the best model ID for this machine, right now":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add to .bashrc / .zshrc&lt;/span&gt;
&lt;span class="nb"&gt;alias &lt;/span&gt;&lt;span class="nv"&gt;bestllm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'whichllm --top 1 --json | jq -r ".models[0].model_id"'&lt;/span&gt;

&lt;span class="c"&gt;# Then&lt;/span&gt;
ollama run &lt;span class="si"&gt;$(&lt;/span&gt;bestllm&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Caveat: Ollama model names don't always match HuggingFace repo IDs, so you'll usually want a small mapping step in the middle. Profiles available: &lt;code&gt;general&lt;/code&gt;, &lt;code&gt;coding&lt;/code&gt;, &lt;code&gt;vision&lt;/code&gt;, &lt;code&gt;math&lt;/code&gt; — &lt;code&gt;--profile coding --top 1 --json | jq -r '.models[0].model_id'&lt;/code&gt; gets you the best coding model in one line.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sample top picks across hardware
&lt;/h2&gt;

&lt;p&gt;This is from the README and reflects a 2026-05 snapshot — your results will track the live HuggingFace data, but it's a useful directional read:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Hardware&lt;/th&gt;
&lt;th&gt;VRAM&lt;/th&gt;
&lt;th&gt;Top pick&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RTX 5090&lt;/td&gt;
&lt;td&gt;32 GB&lt;/td&gt;
&lt;td&gt;Qwen3.6‑27B · Q6_K · score 94.7&lt;/td&gt;
&lt;td&gt;~40 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4090 / 3090&lt;/td&gt;
&lt;td&gt;24 GB&lt;/td&gt;
&lt;td&gt;Qwen3.6‑27B · Q5_K_M · score 92.8&lt;/td&gt;
&lt;td&gt;~27 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RTX 4060&lt;/td&gt;
&lt;td&gt;8 GB&lt;/td&gt;
&lt;td&gt;Qwen3‑14B · Q3_K_M · score 71.0&lt;/td&gt;
&lt;td&gt;~22 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apple M3 Max&lt;/td&gt;
&lt;td&gt;36 GB&lt;/td&gt;
&lt;td&gt;Qwen3.6‑27B · Q5_K_M · score 89.4&lt;/td&gt;
&lt;td&gt;~9 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU only&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;gpt-oss-20b (MoE) · Q4_K_M · score 45.2&lt;/td&gt;
&lt;td&gt;~6 t/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Two readings: Qwen3.6‑27B is doing serious work in the consumer-GPU bracket right now — the smarter quant at smaller params keeps winning over bigger-but-older alternatives. And on CPU, a well-chosen MoE (gpt-oss-20b) wins because only active params load into the bandwidth-bound speed model.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the community is saying
&lt;/h2&gt;

&lt;p&gt;The Show HN thread (144 points) and &lt;code&gt;r/LocalLLaMA&lt;/code&gt; discussion give you a fair read of where it works and where the rough edges still are.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The good&lt;/strong&gt; — most upvoted comments call out the recency-awareness and the runtime fit modeling specifically. The "biggest that fits" failure mode is the single most common complaint about VRAM calculators, and people noticed &lt;code&gt;whichllm&lt;/code&gt; actually addresses it. The &lt;code&gt;--gpu "RTX 5090"&lt;/code&gt; simulation got flagged repeatedly as the killer feature for pre-purchase planning — finally a way to answer "is upgrading worth it for &lt;em&gt;this&lt;/em&gt; model?" without buying first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The reality check&lt;/strong&gt; — one r/LocalLLaMA thread asks "How accurate can whichllm be?" after a WSL user got reasonable picks but incorrect RAM and disk numbers. That's an honest limitation of running detection inside WSL where the namespace gives you Linux-side counters rather than Windows host counters. Recent releases widened the Windows detector (WMI + registry fields), but on WSL specifically, double-check &lt;code&gt;whichllm hardware&lt;/code&gt; against your actual host before trusting speed estimates.&lt;/p&gt;

&lt;p&gt;The other recurring note: surprise at gpt-oss-20b appearing high on CPU-only lists. That's working as intended — CPU-only triggers a ×0.50 runtime-fit penalty, but MoE active-param speed modeling rescues a 20B model with ~3B active back into the "actually usable" zone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;

&lt;p&gt;A few things to know before you trust the top row blindly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Benchmarks are not your workload.&lt;/strong&gt; LiveBench, Aider, and Arena ELO are great signals, but they're not your domain. If you have a private eval set, run the top three picks against it before committing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed estimates are planning ranges.&lt;/strong&gt; Backend (llama.cpp build, CUDA version), context length, batching, and prompt shape all move real numbers. The &lt;code&gt;~&lt;/code&gt; and &lt;code&gt;?&lt;/code&gt; markers tell you when to be skeptical.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detection on WSL/Windows is the edge case.&lt;/strong&gt; Expect minor inaccuracies on RAM/disk if you're running through WSL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MoE rankings assume active-param routing.&lt;/strong&gt; If your inference stack doesn't fully exploit MoE routing, the speed estimate will be optimistic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Benchmark snapshots can drift.&lt;/strong&gt; Live sources fall back to curated frozen snapshots. The snapshot date is printed beneath the ranking — but only if you look at it.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How it compares
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it answers&lt;/th&gt;
&lt;th&gt;Recency-aware?&lt;/th&gt;
&lt;th&gt;Runs the model for you?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;whichllm&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Best model for my hardware by &lt;em&gt;benchmark&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;Yes (lineage-demoted frozen scores)&lt;/td&gt;
&lt;td&gt;Yes (&lt;code&gt;whichllm run&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VRAM calculators&lt;/td&gt;
&lt;td&gt;Will this model &lt;em&gt;fit&lt;/em&gt;
&lt;/td&gt;
&lt;td&gt;No (size-only)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HuggingFace leaderboards&lt;/td&gt;
&lt;td&gt;Best model overall&lt;/td&gt;
&lt;td&gt;Partially&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama library list&lt;/td&gt;
&lt;td&gt;Popular models by downloads&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes (&lt;code&gt;ollama run&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LM Studio model browser&lt;/td&gt;
&lt;td&gt;GUI-curated picks&lt;/td&gt;
&lt;td&gt;Editor-curated&lt;/td&gt;
&lt;td&gt;Yes (GUI)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The unique seat is: &lt;em&gt;evidence-based ranking + runtime fit + one-command execute&lt;/em&gt;. Each other tool gets one or two of those; nothing else seems to do all three.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is whichllm free?
&lt;/h3&gt;

&lt;p&gt;Yes — MIT license. You only pay for the model weights (HuggingFace downloads) and inference costs if you &lt;code&gt;whichllm run&lt;/code&gt;. Sponsorships exist but the project will stay open-source either way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does it support AMD GPUs and Apple Silicon?
&lt;/h3&gt;

&lt;p&gt;Yes. AMD is detected via &lt;code&gt;rocm-smi&lt;/code&gt; with &lt;code&gt;lspci&lt;/code&gt; fallback on Linux. Apple Silicon is detected via &lt;code&gt;system_profiler&lt;/code&gt; on macOS. Both restrict the candidate set to GGUF for runtime stability. Linux + NVIDIA gets the widest support, including AWQ / GPTQ / FP16 / BF16.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is this different from a VRAM calculator?
&lt;/h3&gt;

&lt;p&gt;A VRAM calculator answers &lt;em&gt;can this model fit&lt;/em&gt;. &lt;code&gt;whichllm&lt;/code&gt; answers &lt;em&gt;which model that fits will actually perform best&lt;/em&gt;, using merged real benchmarks plus runtime-fit and speed scaling. The README example — Qwen3.6‑27B ranked above Qwen3‑32B on the same RTX 4090 — is the canonical demo of the difference.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use it offline?
&lt;/h3&gt;

&lt;p&gt;Partially. Both the model and benchmark caches live under &lt;code&gt;~/.cache/whichllm/&lt;/code&gt; (6h TTL for models, 24h for benchmarks). Curated frozen fallbacks ship with the package for offline / rate-limited use, so a cold offline run still gives you something — just don't trust the recency markers on a stale cache.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does GPU simulation work?
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;whichllm --gpu "RTX 5090"&lt;/code&gt; builds a synthetic &lt;code&gt;GPUInfo&lt;/code&gt; from a curated table of memory bandwidth, VRAM size, and compute capability. The rest of the pipeline (VRAM fit, speed estimate, runtime fit factor) then runs as if that card were present. It's a planning tool, not a guarantee — actual cards depend on cooling, PCIe gen, and your specific motherboard.&lt;/p&gt;

&lt;h3&gt;
  
  
  Will it pick a model I already have downloaded?
&lt;/h3&gt;

&lt;p&gt;Today, ranking is over HuggingFace candidates, not your local cache. If you want "the best of what I already have," pipe &lt;code&gt;whichllm --json&lt;/code&gt; through &lt;code&gt;jq&lt;/code&gt; and filter against your local model directory. A first-class "local-only" mode is a natural feature request — open an issue if you want it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;Local-LLM model selection has been a vibes-and-anecdotes problem for two years. &lt;code&gt;whichllm&lt;/code&gt; is the first tool I've used that turns it into a defensible, auditable answer: "this is the top model for your machine, here's its merged benchmark score, here's the evidence grade, here's the speed range, and here's one command to actually run it."&lt;/p&gt;

&lt;p&gt;It won't replace running your own eval on your own workload. Nothing should. But for the 90% of "which Qwen / Llama / Mistral should I download tonight?" decisions, &lt;code&gt;uvx whichllm@latest&lt;/code&gt; is now the right first move.&lt;/p&gt;

&lt;p&gt;Install: &lt;code&gt;uv tool install whichllm&lt;/code&gt;. Star and report your top pick at &lt;a href="https://github.com/Andyyyy64/whichllm" rel="noopener noreferrer"&gt;github.com/Andyyyy64/whichllm&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>whichllm</category>
      <category>localllm</category>
      <category>ollama</category>
      <category>llamacpp</category>
    </item>
    <item>
      <title>last30days Review: AI Agent That Searches Past Google</title>
      <dc:creator>Andrew</dc:creator>
      <pubDate>Thu, 11 Jun 2026 10:09:52 +0000</pubDate>
      <link>https://dev.to/andrew-ooo/last30days-review-ai-agent-that-searches-past-google-4l7p</link>
      <guid>https://dev.to/andrew-ooo/last30days-review-ai-agent-that-searches-past-google-4l7p</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Originally published on &lt;a href="https://andrew.ooo/posts/last30days-skill-ai-agent-social-research-review/" rel="noopener noreferrer"&gt;andrew.ooo&lt;/a&gt;&lt;/strong&gt; — visit the original for any updates, code snippets that aged out, or follow-up posts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;/last30days&lt;/code&gt;&lt;/strong&gt; is an AI agent skill that researches any topic across &lt;strong&gt;Reddit, X, YouTube, TikTok, Hacker News, Polymarket, GitHub, and the open web&lt;/strong&gt; in parallel — then has a judge agent synthesize what real people actually engaged with in the last 30 days. It's the &lt;strong&gt;#1 trending repo on GitHub this week&lt;/strong&gt;: &lt;strong&gt;39,455 stars&lt;/strong&gt; total and &lt;strong&gt;11,732 new stars in the past seven days&lt;/strong&gt;, with &lt;strong&gt;5 of today's top 10 trending repos&lt;/strong&gt; being Claude skills (per X user @yieldhunter95).&lt;/p&gt;

&lt;p&gt;The premise is sharp: Google aggregates editors; ChatGPT has a Reddit deal but can't see X; Gemini has YouTube but not Reddit; Claude has none of them natively. Every platform is a walled garden — but you can bring your own keys and let one agent search all of them at once, score results by upvotes/likes/transcripts/prediction-market money, and merge it into a single brief.&lt;/p&gt;

&lt;p&gt;Key facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;39,455 GitHub stars&lt;/strong&gt; (11,732 this week, ranked #1 trending repo)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built by Mason Van Horn&lt;/strong&gt; (&lt;code&gt;/mvanhorn&lt;/code&gt;), with active contributions from &lt;code&gt;/claude&lt;/code&gt;, &lt;code&gt;/tmchow&lt;/code&gt;, &lt;code&gt;/j-sperling&lt;/code&gt;, &lt;code&gt;/dinakars777&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Works as a Claude Code plugin, Codex/Cursor/Copilot/Gemini skill, OpenClaw skill, or claude.ai web skill&lt;/strong&gt; — same SKILL.md spec&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero-config defaults&lt;/strong&gt; — Reddit, HN, Polymarket, and GitHub work immediately with no keys&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optional API keys unlock X, YouTube, TikTok&lt;/strong&gt; via setup wizard (~30 seconds)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;v3 pipeline&lt;/strong&gt; has a Python pre-research "brain" that resolves people→handles, products→founders, names→GitHub profiles &lt;strong&gt;before&lt;/strong&gt; firing a single API call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTML brief export&lt;/strong&gt; — &lt;code&gt;--emit=html&lt;/code&gt; saves a self-contained, dark-mode, print-friendly file&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache 2.0&lt;/strong&gt; — install with &lt;code&gt;/plugin install last30days&lt;/code&gt; or &lt;code&gt;npx skills add mvanhorn/last30days-skill -g&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've been frustrated that LLMs answer with stale 2023 blog summaries when the real answer is in a 1,500-upvote Reddit thread from last week, this is the skill you've been waiting for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Repo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/mvanhorn/last30days-skill" rel="noopener noreferrer"&gt;mvanhorn/last30days-skill&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stars&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;39,455 (11,732 this week, #1 trending)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintainer&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Mason Van Horn (&lt;code&gt;/mvanhorn&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary language&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Install (Claude Code)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/plugin marketplace add mvanhorn/last30days-skill&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Install (anywhere)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;npx skills add mvanhorn/last30days-skill -g&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sources&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reddit, X, YouTube, TikTok, IG Reels, HN, Polymarket, GitHub, Digg, Threads, Pinterest, Bluesky, Perplexity, web&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Zero-config sources&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reddit, HN, Polymarket, GitHub&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Engine&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;v3 — pre-research brain + parallel pipelines + judge agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Trendshift rank&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://trendshift.io/repositories/21997" rel="noopener noreferrer"&gt;#21997 trending&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Why This Matters: Searching People, Not Editors
&lt;/h2&gt;

&lt;p&gt;Here's the thesis the README puts bluntly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A Reddit thread with 1,500 upvotes is a stronger signal than a blog post nobody read. A TikTok with 3.6M views tells you more about what's culturally relevant than a press release. Polymarket odds backed by $66K in volume are harder to argue with than a pundit's guess.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every other AI search tool ranks by &lt;strong&gt;SEO relevance&lt;/strong&gt;. &lt;code&gt;/last30days&lt;/code&gt; ranks by &lt;strong&gt;social relevance&lt;/strong&gt; — upvotes, likes, views, transcripts, prediction-market dollars. The unlock isn't a better search engine. It's bridging a dozen disconnected walled gardens with a single agent that has your API keys.&lt;/p&gt;

&lt;p&gt;The README's canonical example: you Google someone before a meeting and get their 2023 LinkedIn. You &lt;code&gt;/last30days&lt;/code&gt; them and you get &lt;strong&gt;what they're actually doing this month&lt;/strong&gt; — recent X posts, podcast transcripts they appeared on, the Reddit thread where 569 people argued about whether they're "a hero or insufferable," and the 23 PRs they merged at 85% merge rate. None of it was on Google.&lt;/p&gt;

&lt;p&gt;That's the difference between &lt;strong&gt;stale training data&lt;/strong&gt; and &lt;strong&gt;last-30-days truth&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  How v3 Actually Works
&lt;/h2&gt;

&lt;p&gt;The v3 pipeline (the current source of truth in &lt;a href="https://github.com/mvanhorn/last30days-skill/blob/main/skills/last30days/SKILL.md" rel="noopener noreferrer"&gt;&lt;code&gt;skills/last30days/SKILL.md&lt;/code&gt;&lt;/a&gt;) has four interesting innovations:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Pre-research brain resolves entities first
&lt;/h3&gt;

&lt;p&gt;Type &lt;code&gt;OpenClaw&lt;/code&gt; and the engine resolves &lt;code&gt;@steipete&lt;/code&gt; (Peter Steinberger, the creator), &lt;code&gt;r/openclaw&lt;/code&gt;, &lt;code&gt;r/ClaudeCode&lt;/code&gt;, and the right YouTube channels and TikTok hashtags — &lt;strong&gt;before any search runs&lt;/strong&gt;. It's a Python module built by &lt;code&gt;/j-sperling&lt;/code&gt; that does bidirectional resolution: person→company, product→founder, name→GitHub profile.&lt;/p&gt;

&lt;p&gt;The old v2 engine searched keywords. v3 understands your topic first, then searches the right people and communities. That's why v3 finds content v2 buried.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. A "fun judge" alongside the relevance judge
&lt;/h3&gt;

&lt;p&gt;Reddit and X people are funny. The old engine scored only for relevance and buried the best stuff. v3 has a second judge that scores every result for &lt;strong&gt;humor, wit, and virality&lt;/strong&gt; alongside relevance. Every brief now ends with a "Best Takes" section — the cleverest one-liners, the most viral quotes.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Cluster merging across sources
&lt;/h3&gt;

&lt;p&gt;When the same story appears on Reddit, X, and YouTube, v3 merges them into one cluster instead of showing three separate items. Entity-based overlap detection catches matches even when the titles use different words.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Parallel comparison pipelines
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;/last30days OpenAI --competitors&lt;/code&gt; tells the hosting reasoning model to discover the top 2 peers via WebSearch (Anthropic, xAI), then fan out three full pipelines in parallel and merge into a 3-way comparison. The old "X vs Y" used to be three serial passes (12+ minutes). v3 runs one pass with entity-aware subqueries for both sides simultaneously — same depth, &lt;strong&gt;~3 minutes&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install in 60 Seconds
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Code (recommended — auto-updates via marketplace):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/plugin marketplace add mvanhorn/last30days-skill
/plugin &lt;span class="nb"&gt;install &lt;/span&gt;last30days
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Codex, Cursor, Copilot, Gemini CLI, or any of the &lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;50+ Agent Skills hosts&lt;/a&gt;:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add mvanhorn/last30days-skill &lt;span class="nt"&gt;-g&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;(&lt;code&gt;-g&lt;/code&gt; installs globally for your user, available across all projects. Drop it to scope per-project.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Then just ask:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/last30days Peter Steinberger
/last30days Nano Banana Pro prompting
/last30days OpenClaw vs Hermes vs Paperclip
/last30days Universal Epic Universe
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Reddit, HN, Polymarket, and GitHub work immediately. Run it once and the setup wizard unlocks X, YouTube, TikTok in another 30 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Code: Calling It Like an MCP Tool
&lt;/h2&gt;

&lt;p&gt;For non-Claude-Code clients, the engine ships an MCP server. Wire it in via your standard MCP config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"last30days"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@last30days/mcp"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"env"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"OPENAI_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"XAI_API_KEY"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The optional &lt;code&gt;.env&lt;/code&gt; config (referenced in the &lt;a href="https://skillkit.io/skills/claude-code/last30days" rel="noopener noreferrer"&gt;Skillkit walkthrough&lt;/a&gt;) lives at &lt;code&gt;~/.config/last30days/.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.config/last30days
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ~/.config/last30days/.env &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;ENVEOF&lt;/span&gt;&lt;span class="sh"&gt;'
# Both keys are optional - skill works with WebSearch fallback
# For Reddit/web research (uses OpenAI's web_search tool)
OPENAI_API_KEY=
# For X/Twitter research (uses xAI's x_search tool)
XAI_API_KEY=
&lt;/span&gt;&lt;span class="no"&gt;ENVEOF
&lt;/span&gt;&lt;span class="nb"&gt;chmod &lt;/span&gt;600 ~/.config/last30days/.env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script auto-detects what's configured and reports mode at runtime:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;Mode: both&lt;/code&gt;&lt;/strong&gt; — Reddit + X + WebSearch supplementing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;Mode: reddit-only&lt;/code&gt;&lt;/strong&gt; or &lt;strong&gt;&lt;code&gt;Mode: x-only&lt;/code&gt;&lt;/strong&gt; — partial keys configured&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;Mode: web-only&lt;/code&gt;&lt;/strong&gt; — no API keys, Claude does all research via WebSearch fallback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fallback works. You don't need keys to get value. But you'll get richer results with them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Exporting a shareable HTML brief
&lt;/h3&gt;

&lt;p&gt;This is the killer feature for sharing with humans:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/last30days OpenClaw --emit=html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in plain language:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/last30days Cursor IDE for slack
/last30days Anthropic earnings export as html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The skill emits the synthesis in chat &lt;strong&gt;and&lt;/strong&gt; saves a self-contained brief to &lt;code&gt;${LAST30DAYS_MEMORY_DIR}/{topic}-brief.html&lt;/code&gt; (defaults to &lt;code&gt;~/Documents/Last30Days/&lt;/code&gt;). Dark mode, print-friendly, no JavaScript, system-font fallbacks behind Inter and JetBrains Mono. Drop it in Slack, email, or Notion — no raw markdown leaks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Source-by-Source: What Each Platform Tells You
&lt;/h2&gt;

&lt;p&gt;The README's table is the clearest part of the docs. The signals are deliberately different:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;The signal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reddit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The unfiltered take. Top comments with upvote counts, free via public JSON. The real opinions Google buries.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;X / Twitter&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The hot take, the expert thread, the breaking reaction. First to know, first to argue.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;YouTube&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The 45-minute deep dive. Full transcripts searched for the 5 quotable sentences that matter.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TikTok&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The creator reaching 3.6M people with a take you'll never find on Google.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hacker News&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The developer consensus. Where technical people actually argue.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Polymarket&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Not opinions. Odds. Backed by real money.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;For people: PR velocity, top repos, release notes. For topics: issues and discussions.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Digg&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Curated story clusters from Digg's AI 1000 leaderboard (~1000 high-signal AI accounts on X), no X auth required.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Threads / Bluesky&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The post-Twitter text layer.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Perplexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Grounded web search with citations via Sonar Pro.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each source contributes a different &lt;em&gt;kind&lt;/em&gt; of signal. The synthesis ranks by what real people actually engaged with — not what an SEO team optimized for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Community Reactions
&lt;/h2&gt;

&lt;p&gt;This isn't hype manufactured by the maintainer. The signal is real:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;#1 on GitHub Trending this week.&lt;/strong&gt; 11,732 new stars in seven days. The README notes that "5 of the 10 trending repos on GitHub today are Claude tools" (via X user &lt;code&gt;@yieldhunter95&lt;/code&gt;), and &lt;code&gt;last30days-skill&lt;/code&gt; is the top of that list.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trendshift listed it at &lt;a href="https://trendshift.io/repositories/20881" rel="noopener noreferrer"&gt;#21997&lt;/a&gt;&lt;/strong&gt; as a fast-rising AI repo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;@itswilsoncharles&lt;/code&gt;:&lt;/strong&gt; &lt;em&gt;"You give it a topic, it scrapes Reddit, X, and the web for what people are actually talking about. Not old blog posts. Real conversations from the last 30 days."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;From &lt;a href="https://www.howdoiuseai.com/blog/2026-02-06-how-to-supercharge-claude-code-with-the-last-30-da" rel="noopener noreferrer"&gt;How Do I Use AI&lt;/a&gt;:&lt;/strong&gt; &lt;em&gt;"Actual, opinionated, trend-aware context injected directly into your prompt."&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The contributor list itself is a signal:&lt;/strong&gt; &lt;code&gt;/claude&lt;/code&gt; is the second-largest contributor on the repo, alongside &lt;code&gt;/tmchow&lt;/code&gt; (the maintainer behind several other top Claude skills) and &lt;code&gt;/j-sperling&lt;/code&gt; (who built the v3 pre-research brain).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repo is also visibly active — the v3 pipeline shipped recently, the README explicitly notes that &lt;a href="https://github.com/mvanhorn/last30days-skill/blob/main/skills/last30days/SKILL.md" rel="noopener noreferrer"&gt;&lt;code&gt;SKILL.md&lt;/code&gt;&lt;/a&gt; is the source of truth and gets updated faster than the README, and community contributors keep adding new sources (Truth Social, Xiaohongshu/RED are in the engine with more on the way).&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Limitations
&lt;/h2&gt;

&lt;p&gt;I'd be doing nobody a favor if I painted this as a finished product. Things to know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You're trusting the judge agent.&lt;/strong&gt; The synthesis is an LLM ranking, and LLMs hallucinate. The brief is always a starting point — for high-stakes use cases, you click through the citations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Without API keys, you're in &lt;code&gt;web-only&lt;/code&gt; mode.&lt;/strong&gt; That still works (Claude falls back to WebSearch), but you lose X, TikTok, and richer YouTube transcript coverage. The full magic needs OpenAI + xAI keys (and your own usage budget).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reddit's public JSON is rate-limited&lt;/strong&gt; and occasionally degrades to anonymous quotas. The engine handles thin-evidence runs gracefully, but a brief from a single Reddit page is different from one with 23 threads behind it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;30 days is a window, not a guarantee.&lt;/strong&gt; Some platforms (Polymarket, GitHub) effectively pull more recent activity; others (YouTube) may surface older videos if the transcript matches. Read it as "recent" rather than a strict cutoff.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's a skill, not a SaaS.&lt;/strong&gt; You run it in your agent harness. There's no hosted UI (yet). If your team doesn't already use Claude Code, Codex, Cursor, or OpenClaw, the install path is more steps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost adds up.&lt;/strong&gt; Each &lt;code&gt;/last30days&lt;/code&gt; run can fan out 10+ API calls across providers. The skill itself is free, but you pay your model + xAI/OpenAI tool-call costs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't dealbreakers. They're the trade-offs you make to get social-graph search at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  Comparison: Where Does This Sit?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Coverage&lt;/th&gt;
&lt;th&gt;Search angle&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;/last30days&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reddit, X, YouTube, TikTok, HN, Polymarket, GitHub, +more&lt;/td&gt;
&lt;td&gt;Social engagement (upvotes/likes/$)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Perplexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Web + indexed pages&lt;/td&gt;
&lt;td&gt;SEO relevance + citations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ChatGPT search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Web + Reddit (via deal)&lt;/td&gt;
&lt;td&gt;SEO + Reddit consensus&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemini Deep Research&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Web + YouTube&lt;/td&gt;
&lt;td&gt;SEO + video transcripts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The thing nobody else does: &lt;strong&gt;searching the social graph itself&lt;/strong&gt; with a judge that synthesizes across all of it. That's the moat.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is it free?
&lt;/h3&gt;

&lt;p&gt;Yes — Apache 2.0 license. You pay the LLM and (optionally) the OpenAI/xAI tool-call costs when you run it. Reddit/HN/Polymarket/GitHub work with zero keys.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need Claude Code?
&lt;/h3&gt;

&lt;p&gt;No. It works in any of the 50+ Agent Skills hosts via &lt;code&gt;npx skills add mvanhorn/last30days-skill -g&lt;/code&gt;. That includes Codex, Cursor, Copilot CLI, Gemini CLI, OpenClaw, and claude.ai web. Claude Code is the most polished install path because of the marketplace auto-update.&lt;/p&gt;

&lt;h3&gt;
  
  
  Will it work with my own model?
&lt;/h3&gt;

&lt;p&gt;The skill is model-agnostic — it just needs an agent harness that can call WebSearch and (optionally) &lt;code&gt;web_search&lt;/code&gt; / &lt;code&gt;x_search&lt;/code&gt; tools. The reasoning quality of your model affects the synthesis quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is the Reddit/X data legal to use?
&lt;/h3&gt;

&lt;p&gt;The skill uses public JSON endpoints (Reddit's free public API) and authenticated tool APIs (OpenAI's &lt;code&gt;web_search&lt;/code&gt;, xAI's &lt;code&gt;x_search&lt;/code&gt;). You're using each platform within its terms by using their official endpoints. It does not scrape behind logins.&lt;/p&gt;

&lt;h3&gt;
  
  
  How is this different from "just asking Claude with web search"?
&lt;/h3&gt;

&lt;p&gt;Three things: (1) &lt;strong&gt;parallel coverage&lt;/strong&gt; — Claude with web search only sees what its search provider indexes, which is mostly Google's index. &lt;code&gt;/last30days&lt;/code&gt; deliberately hits walled gardens (X, TikTok, etc.) that Google can't see. (2) &lt;strong&gt;Engagement-weighted ranking&lt;/strong&gt; — Claude treats a 5-upvote post the same as a 5,000-upvote post. The skill weights by engagement. (3) &lt;strong&gt;Pre-research entity resolution&lt;/strong&gt; — the v3 brain knows that "OpenClaw" → &lt;code&gt;@steipete&lt;/code&gt; → &lt;code&gt;r/ClaudeCode&lt;/code&gt; before the search runs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does it work for non-tech topics?
&lt;/h3&gt;

&lt;p&gt;Yes. The README's examples include Kanye West (Wireless Festival, Polymarket odds), Universal Epic Universe (wait times, refurbishment schedules), and pre-sales-call research. The same engine that surfaces a 569-upvote &lt;code&gt;r/ClaudeCode&lt;/code&gt; thread surfaces a 23-thread Reddit consensus about Genie+.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's &lt;code&gt;headroom learn&lt;/code&gt; and &lt;code&gt;--memory&lt;/code&gt;?
&lt;/h3&gt;

&lt;p&gt;Different repo — that's &lt;a href="https://github.com/chopratejas/headroom" rel="noopener noreferrer"&gt;chopratejas/headroom&lt;/a&gt;, the &lt;strong&gt;token compression&lt;/strong&gt; layer that often gets installed alongside agent skills. Worth a look if you're already running heavy multi-tool agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Should You Install It?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Install if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You run Claude Code, Codex, Cursor, OpenClaw, or any skills-capable agent daily and you want &lt;strong&gt;current&lt;/strong&gt; signal in your context&lt;/li&gt;
&lt;li&gt;You do competitive research, customer discovery, pre-meeting prep, or trend analysis and Google keeps giving you 2023 blog posts&lt;/li&gt;
&lt;li&gt;You'd rather pay $0.10 in API calls for a 30-second synthesis than spend 45 minutes manually scrolling Reddit + X&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Skip if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You only need editorial summaries — Perplexity or Gemini Deep Research will be lower friction&lt;/li&gt;
&lt;li&gt;You're in an environment that can't run local agents with API keys (some enterprise setups)&lt;/li&gt;
&lt;li&gt;Your topic of interest isn't on Reddit/X/HN/YouTube to begin with — the engine is only as good as the source data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For my money — running tool research, founder profiles, and competitive comparisons every week — this is the most useful agent skill I've installed this quarter. The synthesis quality is real, the source diversity is real, and the v3 pipeline genuinely beats v2 by a wide margin.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/mvanhorn/last30days-skill" rel="noopener noreferrer"&gt;github.com/mvanhorn/last30days-skill&lt;/a&gt; — give it a star, then &lt;code&gt;/plugin install last30days&lt;/code&gt; and run it on your own name. You'll learn something.&lt;/p&gt;

</description>
      <category>last30days</category>
      <category>claudecode</category>
      <category>agentskills</category>
      <category>aiagents</category>
    </item>
    <item>
      <title>OpenCV 5 Review: New DNN Engine, LLMs, and 80% ONNX Coverage</title>
      <dc:creator>Andrew</dc:creator>
      <pubDate>Wed, 10 Jun 2026 10:12:33 +0000</pubDate>
      <link>https://dev.to/andrew-ooo/opencv-5-review-new-dnn-engine-llms-and-80-onnx-coverage-1jjp</link>
      <guid>https://dev.to/andrew-ooo/opencv-5-review-new-dnn-engine-llms-and-80-onnx-coverage-1jjp</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Originally published on &lt;a href="https://andrew.ooo/posts/opencv-5-release-dnn-engine-llm-vlm-review/" rel="noopener noreferrer"&gt;andrew.ooo&lt;/a&gt;&lt;/strong&gt; — visit the original for any updates, code snippets that aged out, or follow-up posts.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;OpenCV 5.0&lt;/strong&gt; dropped on June 4, 2026 and the pip release followed on June 8 — and unlike most "major version" updates, this one earns the name. The library that's been the duct tape of computer vision for two decades just got a brand-new DNN engine, &lt;strong&gt;ONNX operator coverage jumped from ~22% to over 80%&lt;/strong&gt;, and there's now built-in support for running LLMs and Vision-Language Models inside &lt;code&gt;cv2&lt;/code&gt;. The Hacker News post hit &lt;strong&gt;763 points&lt;/strong&gt; in four days, and the response across &lt;code&gt;/r/computervision&lt;/code&gt; and &lt;code&gt;/r/MachineLearning&lt;/code&gt; has been the same: "wait, OpenCV finally fixed the part that was holding it back."&lt;/p&gt;

&lt;p&gt;Key facts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;86,000+ GitHub stars&lt;/strong&gt;, 1M+ daily installs — this is plumbing for production CV everywhere&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;New typed-graph DNN engine&lt;/strong&gt; with proper shape inference, constant folding, and operator fusion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;80%+ ONNX coverage&lt;/strong&gt; (up from ~22% in 4.x) — modern models with dynamic shapes finally load&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM and VLM support built in&lt;/strong&gt; — you can run quantized language models from &lt;code&gt;cv2.dnn&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardware paths&lt;/strong&gt; for Intel IPP (SSE/AVX), Arm KleidiCV, Qualcomm FastCV, and RISC-V RVV&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attention fusion&lt;/strong&gt; collapses transformer blocks (MatMul → Softmax → MatMul) into single fused ops&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apache 2.0 license&lt;/strong&gt;, available as &lt;code&gt;pip install opencv-python==5.0.0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native GPU in the DNN engine and a non-CPU HAL&lt;/strong&gt; are next on the roadmap&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you maintain a CV pipeline that uses OpenCV 4.x — or one that bypassed OpenCV's DNN module because it kept failing on modern models — OpenCV 5 is the upgrade you actually want to plan this quarter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Reference
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Release date&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;June 4, 2026 (pip: June 8, 2026)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/opencv/opencv/tree/5.x" rel="noopener noreferrer"&gt;opencv/opencv&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Release tag&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/opencv/opencv/releases/tag/5.0.0" rel="noopener noreferrer"&gt;5.0.0&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Install&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pip install opencv-python==5.0.0&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apache 2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stars&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;86,000+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Daily installs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1M+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ONNX coverage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;80%+ (vs ~22% in 4.x)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DNN engines&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;New (graph-based) + Classic (legacy)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;HN front page&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;763 points&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Docs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://docs.opencv.org" rel="noopener noreferrer"&gt;docs.opencv.org&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Why This Release Actually Matters
&lt;/h2&gt;

&lt;p&gt;OpenCV has a weird position in the ML world. It's everywhere — robotics, embedded vision, industrial inspection, AR/VR, medical imaging — but the deep-learning side has felt a step behind for years. You'd export a PyTorch model to ONNX, point &lt;code&gt;cv2.dnn.readNetFromONNX&lt;/code&gt; at it, and half the time get a cryptic error about an operator OpenCV had never heard of. So production pipelines used OpenCV for I/O and preprocessing, then shipped inference off to ONNX Runtime or TensorRT — two libraries doing one library's job.&lt;/p&gt;

&lt;p&gt;OpenCV 5 is the team's answer. The headline numbers tell the story:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;ONNX operator coverage: ~22% → 80%+&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A typed operation graph&lt;/strong&gt; replaced the flat-layer-list interpreter from 4.x&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic shapes, control flow (&lt;code&gt;If&lt;/code&gt;, &lt;code&gt;Loop&lt;/code&gt;), quantization (QDQ), attention fusion&lt;/strong&gt; — all now supported&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That last bullet quietly reshapes a lot of pipelines. The old engine treated a network as a flat list of layers; the new one reads the model as a graph, runs &lt;strong&gt;shape inference, constant folding, and operator fusion&lt;/strong&gt;, then emits an optimized execution plan. Critically, when it sees &lt;code&gt;MatMul → Softmax → MatMul&lt;/code&gt;, it recognizes transformer attention and collapses it into a single fused op. That's how OpenCV 5 ends up competitive with ONNX Runtime — by doing the same optimization passes a real inference engine does.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing OpenCV 5
&lt;/h2&gt;

&lt;p&gt;The pip release went live on June 8, 2026. One line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;opencv-python&lt;span class="o"&gt;==&lt;/span&gt;5.0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the contrib modules (extra algorithms, freetype, ArUco extensions):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;opencv-contrib-python&lt;span class="o"&gt;==&lt;/span&gt;5.0.0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're building from source for hardware-specific acceleration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/opencv/opencv.git
&lt;span class="nb"&gt;cd &lt;/span&gt;opencv &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; git checkout 5.x
&lt;span class="nb"&gt;mkdir &lt;/span&gt;build &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;build
cmake &lt;span class="nt"&gt;-DWITH_IPP&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON &lt;span class="nt"&gt;-DWITH_KLEIDICV&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ON ..
make &lt;span class="nt"&gt;-j&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;nproc&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pip wheel ships with reasonable defaults for x86-64 (SSE/AVX) and Arm. For Qualcomm FastCV or RISC-V RVV optimization, you build from source with the appropriate flags.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Engines, One API
&lt;/h2&gt;

&lt;p&gt;OpenCV 5 ships with two DNN engines under the same &lt;code&gt;cv2.dnn&lt;/code&gt; API: the &lt;strong&gt;classic engine&lt;/strong&gt; (the 4.x interpreter, kept for backwards compatibility) and the &lt;strong&gt;new graph-based engine&lt;/strong&gt; (default for modern ONNX models with dynamic shapes, transformer attention, or recent operators). The library picks automatically when you load a model. Existing 4.x code keeps working; new models that previously failed now succeed; you migrate one model at a time. The &lt;strong&gt;native GPU path in the new engine&lt;/strong&gt; is on the roadmap for 5.1 — until then, GPU inference goes through the classic CUDA backend.&lt;/p&gt;

&lt;h2&gt;
  
  
  Loading a Modern ONNX Model
&lt;/h2&gt;

&lt;p&gt;The use case this release was built for:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;# Load a modern transformer-based detection model
&lt;/span&gt;&lt;span class="n"&gt;net&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dnn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readNetFromONNX&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yolo-v10.onnx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Preprocess: dynamic shapes now work
&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;imread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scene.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;blob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cv2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dnn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;blobFromImage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;scalefactor&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mf"&gt;255.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;640&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;640&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;swapRB&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;crop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# The new engine handles attention fusion automatically
&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;net&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forward&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference from 4.x is everything that didn't work before now does. YOLOv10, RT-DETR, modern transformer-based detectors that mix dynamic shapes with control flow — the 4.x engine choked on them. The 5.x engine eats them for breakfast.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLMs and VLMs Inside OpenCV
&lt;/h2&gt;

&lt;p&gt;This is the part that made HN sit up. OpenCV 5 ships with &lt;strong&gt;built-in support for running quantized LLMs and Vision-Language Models&lt;/strong&gt; through the DNN engine. The supported model list at launch includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quantized small LLMs&lt;/strong&gt; (Phi-3-mini, Qwen2.5 small variants, TinyLlama) for on-device language tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vision-Language Models&lt;/strong&gt; including LLaVA-style models for image-text understanding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LaMa-style inpainting models&lt;/strong&gt; with diffusion support for image restoration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why is this in OpenCV and not a separate library? Because so many CV pipelines now end with "describe this image" or "answer questions about this video frame," and shipping a second inference engine for that step is operational overhead. If OpenCV can do classical CV, deep-learning detection, and then a VLM caption pass in one process with one library, you've cut a layer out of your deployment.&lt;/p&gt;

&lt;p&gt;The honest framing: this isn't going to replace vLLM for serving Llama-405B. It's for &lt;strong&gt;embedded and edge VLM workloads&lt;/strong&gt; — a quantized model running on the same device that's already running OpenCV for camera capture and preprocessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Performance vs ONNX Runtime
&lt;/h2&gt;

&lt;p&gt;OpenCV's own &lt;a href="https://opencv.org/opencv-5/" rel="noopener noreferrer"&gt;release post&lt;/a&gt; and the &lt;a href="https://www.phoronix.com/news/OpenCV-5.0-Released" rel="noopener noreferrer"&gt;Phoronix coverage&lt;/a&gt; both highlight that OpenCV 5 is now competitive with Microsoft's ONNX Runtime on standard vision models. The team has tuned paths for:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Backend&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Intel IPP&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SSE/AVX/AVX-512 optimized kernels&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Arm KleidiCV&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Optimized for modern Cortex-A and server-class ARM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qualcomm FastCV&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Snapdragon optimization paths&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RISC-V RVV&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Vector extension support — first major CV library with mature RVV&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The RISC-V support is genuinely interesting — most CV libraries still treat RISC-V as a curiosity. OpenCV 5 ships first-class RVV kernels, which matters for the wave of RISC-V edge devices shipping in 2026.&lt;/p&gt;

&lt;p&gt;The performance claim isn't "always faster than ONNX Runtime." It's "in the same ballpark on standard models, and you don't need to ship a second inference runtime."&lt;/p&gt;

&lt;h2&gt;
  
  
  Better Python Integration
&lt;/h2&gt;

&lt;p&gt;This isn't headline-grabbing but it's the change that makes day-to-day OpenCV less annoying. The 5.x bindings ship with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Named arguments instead of positional&lt;/strong&gt; — no more guessing whether &lt;code&gt;dst&lt;/code&gt; is the third or fourth parameter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proper type hints&lt;/strong&gt; for &lt;code&gt;mypy&lt;/code&gt; and IDE autocomplete&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0D and 1D tensor support&lt;/strong&gt; — finally compatible with how NumPy and PyTorch actually represent scalars&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native FP16 and BF16 dtypes&lt;/strong&gt; — no more workarounds for half-precision models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real logging&lt;/strong&gt; through Python's standard &lt;code&gt;logging&lt;/code&gt; module&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've spent any time fighting OpenCV's Python bindings on type errors or parameter order, that bullet list is the relief you've been waiting for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Modern Feature Matching and 3D Vision
&lt;/h2&gt;

&lt;p&gt;OpenCV 5 ships &lt;strong&gt;deep-learning-based feature matching&lt;/strong&gt; (LoFTR-style descriptors) in core, not just contrib. SIFT and ORB are still there, but for repetitive textures, low overlap, or large viewpoint changes, the new neural matchers are a real upgrade — changing the default SfM/SLAM recommendation from "SIFT and hope" to "neural matcher and trust."&lt;/p&gt;

&lt;p&gt;The 3D stack also got real attention: proper &lt;strong&gt;ChArUco board support&lt;/strong&gt;, &lt;strong&gt;multi-camera calibration&lt;/strong&gt; out of the box, and a cleaned-up &lt;code&gt;cv::viz&lt;/code&gt; visualization module. Several generations of legacy calibration code from 4.x were retired in favor of the modern paths.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Community Is Saying
&lt;/h2&gt;

&lt;p&gt;The Hacker News thread (763 points, 137 comments) is overwhelmingly positive, with a few honest concerns surfaced repeatedly:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;"The ONNX coverage is the real story."&lt;/strong&gt; This is the most upvoted reaction across HN, Reddit, and Twitter. The 22% → 80%+ jump is what unblocks the modern-model pipelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Wait, you can run LLMs in OpenCV now?"&lt;/strong&gt; Surprised reactions to the LLM/VLM support. Most commenters see this as a good fit for embedded scenarios, not a serving-stack replacement.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"What about CUDA?"&lt;/strong&gt; The native GPU path in the new engine is on the roadmap but not in 5.0. People with GPU production pipelines are waiting for 5.1.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Build complexity."&lt;/strong&gt; OpenCV's CMake build has always been a beast. The good news: pip wheels for 5.0 cover the common cases. The bad news: if you want Qualcomm FastCV or RISC-V RVV tuned builds, source builds are still source builds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Documentation finally readable."&lt;/strong&gt; The 5.x docs are a noticeable step up from the 4.x docs, which were comprehensive but daunting.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Honest Limitations
&lt;/h2&gt;

&lt;p&gt;OpenCV 5 is great, but it isn't magic. The real trade-offs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No native GPU in the new DNN engine yet.&lt;/strong&gt; The 5.x graph engine runs on CPU. GPU users still go through the classic CUDA backend (which works fine, but doesn't get the new graph optimizations).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;80% ONNX coverage means 20% of operators still aren't supported.&lt;/strong&gt; The list mostly covers exotic or recently-added operators. Most production models load cleanly, but always test before assuming.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM/VLM support is for small quantized models.&lt;/strong&gt; This is a CV library, not a model server. Don't plan to run Llama-405B inside &lt;code&gt;cv2&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Migration from 4.x is mostly painless, but not zero.&lt;/strong&gt; A handful of deprecated APIs were retired. Most code "just works," but if you used legacy C API functions, you'll need to update.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Some contrib modules lag the core release.&lt;/strong&gt; Expect the &lt;code&gt;opencv-contrib-python&lt;/code&gt; wheel to catch up over the next few weeks. If you depend on a specific contrib module, check before upgrading.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build complexity for hardware-specific paths.&lt;/strong&gt; The pip wheel is great for portable code. For Qualcomm FastCV or RISC-V RVV optimized builds, source builds with specific CMake flags are still required.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Should You Upgrade?
&lt;/h2&gt;

&lt;p&gt;Three rough buckets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pipelines that use OpenCV for image I/O and classical CV only:&lt;/strong&gt; Upgrade. Faster core, cleaner API, better Python integration. Low risk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipelines that use OpenCV's DNN module for modern models:&lt;/strong&gt; Upgrade ASAP. This is the release you've been waiting for. Models that failed in 4.x will load in 5.x.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pipelines that bypass OpenCV's DNN for ONNX Runtime or TensorRT:&lt;/strong&gt; Reassess. If your reason for bypassing was 4.x's ONNX coverage, that reason is largely gone. You might still want a dedicated inference engine for GPU performance, but for CPU and edge workloads, OpenCV 5 closes the gap.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q: Will my OpenCV 4.x code break?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Mostly no. The team kept backward compatibility for almost everything in &lt;code&gt;cv2&lt;/code&gt;. The deprecated legacy C API was retired, and a handful of obscure functions were renamed. The classic DNN engine is still available, so 4.x DNN code keeps working. The pragmatic upgrade plan: install 5.0 in a fresh venv, run your test suite, fix what breaks (probably nothing major).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Is the new DNN engine faster than the old one?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: For models that worked in 4.x, slightly faster on average because of operator fusion. For models that didn't work in 4.x (modern transformers, dynamic shapes), infinitely faster because they now run at all. The headline benchmark to compare against ONNX Runtime is competitive — OpenCV 5's DNN engine is in the same performance class for CPU inference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: What's the LLM/VLM support actually for?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Embedded and edge use cases where a CV pipeline ends with "describe this image" or "answer a question about this frame." Small quantized models (Phi-3-mini, TinyLlama, LLaVA-Phi) running on the same device as your camera pipeline. Not for production LLM serving — use vLLM or TGI for that.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: When will native GPU support land in the new engine?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: The OpenCV team has flagged it as the next major work item, expected in the 5.1 timeline. Until then, GPU inference goes through the classic engine's CUDA backend, which works but doesn't get the new graph optimizations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Does OpenCV 5 work on Apple Silicon?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Yes. The pip wheel ships ARM64 binaries with NEON optimizations. There's no Metal-specific acceleration yet (it's CPU-only on macOS), but performance on M-series chips is solid via the Arm-optimized paths.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: How does OpenCV 5 compare to ONNX Runtime for CPU inference?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: For standard vision models on x86 and ARM CPUs, OpenCV 5 is now in the same performance ballpark as ONNX Runtime. The trade-off: ONNX Runtime is a dedicated inference engine with broader operator support and more aggressive optimization. OpenCV 5 is a CV library that happens to ship a competitive inference engine. If you're doing image I/O, preprocessing, and inference in one pipeline, OpenCV 5 saves you a dependency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Q: Can I use OpenCV 5 commercially?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A: Yes — Apache 2.0 license, commercial use explicitly permitted. The patent grant in Apache 2.0 also protects you from algorithm patent claims.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verdict
&lt;/h2&gt;

&lt;p&gt;OpenCV 5.0 is the most consequential release in OpenCV history in a long time, and the timing is right. The deep-learning side of the library finally caught up with how modern CV pipelines are built — graph-based execution, broad ONNX coverage, transformer-friendly fusions, hardware-tuned kernels across Intel, Arm, Qualcomm, and RISC-V. The LLM/VLM support is a forward-looking addition that makes sense for edge workloads even if it won't displace dedicated serving stacks.&lt;/p&gt;

&lt;p&gt;For most teams, the upgrade calculus is simple: if you use OpenCV's DNN module, upgrade as soon as you can validate against your test suite. If you bypassed it for ONNX Runtime, reassess — the gap that drove you away just narrowed substantially.&lt;/p&gt;

&lt;p&gt;Install with &lt;code&gt;pip install opencv-python==5.0.0&lt;/code&gt; and read the &lt;a href="https://opencv.org/opencv-5/" rel="noopener noreferrer"&gt;release notes&lt;/a&gt; before you migrate.&lt;/p&gt;

</description>
      <category>opencv</category>
      <category>opencv5</category>
      <category>computervision</category>
      <category>dnn</category>
    </item>
  </channel>
</rss>
