<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kendrick B. Jung</title>
    <description>The latest articles on DEV Community by Kendrick B. Jung (@sonim1).</description>
    <link>https://dev.to/sonim1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3772700%2Fa93f9bcb-42a5-4678-a9e8-b26f841ff3f6.png</url>
      <title>DEV Community: Kendrick B. Jung</title>
      <link>https://dev.to/sonim1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sonim1"/>
    <language>en</language>
    <item>
      <title>Token Saving, and Caveman</title>
      <dc:creator>Kendrick B. Jung</dc:creator>
      <pubDate>Tue, 26 May 2026 15:32:20 +0000</pubDate>
      <link>https://dev.to/sonim1/token-saving-and-caveman-e1f</link>
      <guid>https://dev.to/sonim1/token-saving-and-caveman-e1f</guid>
      <description>&lt;h1&gt;
  
  
  Token Saving, and Caveman
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Caveman is getting a lot of hype these days. From blog posts and introductions, I first thought it compressed tokens down to the level of primitive “ooga booga” language. After using it for a few days, though, that was not really the case. To help clear up that misunderstanding, I wanted to briefly write about the history of earlier token-compression attempts and how Caveman fits into the current landscape.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Brief History of Saving Tokens
&lt;/h2&gt;

&lt;p&gt;Token saving, token compression. Anyone who worked on AI engineering three or four years ago probably spent a lot of time thinking about this. But as token generation became cheaper and more efficient, it stopped being a major concern for a while. Now, as automation keeps accelerating after harness engineering, token usage is rising again, and people are becoming interested in saving tokens once more. That loop is what made this topic interesting enough for me to write about.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtf3kscoajry846d5g56.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqtf3kscoajry846d5g56.webp" alt="A caveman counting pebble tokens and feeding them into a primitive token-processing device" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Back in the GPT-3.5 era, and even earlier when people were using text-davinci, token optimization was essential because generation was slow and costs could skyrocket as token counts grew. text-davinci-003 cost $0.02 per 1K tokens, and only when GPT-3.5-turbo arrived at $0.002, ten times cheaper, did consumer applications really start to become feasible. At the time, AI features were being added publicly to company services, so we were obsessed with reducing tokens. If free users generated outputs without limits, the bill could quickly become impossible to manage.&lt;/p&gt;

&lt;p&gt;Context windows were not comparable to what we have today either. GPT-3 had 2,048 tokens, while text-davinci-003 and GPT-3.5-turbo had only a little over a 4K-token context window. Today we talk about 200K and 1M token contexts, but back then it was part of the job to keep input and output combined under roughly 4K.&lt;/p&gt;

&lt;p&gt;It is also hard to imagine now, but token generation was genuinely slow. These days results appear almost sentence by sentence, or even page by page, but back then if you watched the token stream, you could follow each token being generated one by one with your eyes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Earlier Attempts at Saving Tokens
&lt;/h2&gt;

&lt;p&gt;In this section, I will talk about the problems and solutions I encountered at my previous company, and how we tried to save tokens at the time. There are many ways to reduce tokens, but the three most effective ones were the following.&lt;/p&gt;

&lt;p&gt;The first priority was changing the format. By format, I mean things like JSON or XML/HTML. Markdown is common now, but back then many people used JSON or XML directly for input and output. The problem is that those formats produce a lot of tokens after tokenization. For example, &lt;code&gt;&amp;lt;h1&amp;gt;Hello world&amp;lt;/h1&amp;gt;&lt;/code&gt; is 8 tokens. &lt;code&gt;# Hello world&lt;/code&gt; is 3 tokens. That alone cuts the count by more than half. JSON and XML also need closing tags or structural wrappers, so the overhead doubles in many places. Recent comparative analysis has also shown that XML can use 14% more tokens than JSON, while Markdown can save around 15% of tokens for equivalent representation.&lt;/p&gt;

&lt;p&gt;So by using Markdown and one-token delimiters like &lt;code&gt;####&lt;/code&gt;, we were able to save a lot of tokens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9uswuo8dre44ntk3ftmi.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9uswuo8dre44ntk3ftmi.webp" alt="A primitive balance scale where heavy nested tags and bracket piles weigh far more than compact delimiter tablets" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This did not only reduce cost. Response speed improved as well. At the time, even an output of around 300 characters could commonly take 30 seconds. By shortening both input and output, response time could improve by anywhere from 30% to 70%. Since generation was slow enough that you could see tokens appear one by one in the stream, reducing output tokens directly translated into a noticeable speed improvement.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Age of Detail
&lt;/h2&gt;

&lt;p&gt;As newer model versions became smarter, the trend started to change around mid-2023. Instead of making prompts extremely concise, people began adding more detailed information. Since the models had become smarter, giving them enough context led to better results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5gj3pys0y7o1878ph3v.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm5gj3pys0y7o1878ph3v.webp" alt="A small 4K stone window expanding into a wide knowledge mural, organized context map, and reference tablets inside a cave" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even today, Anthropic still recommends using XML tags with Claude. Anthropic's documentation explains that XML tags help structure complex prompts more clearly and separate instructions, context, examples, and input. In other words, clarity became more important even if it used a few more tokens, which also reflects how much token prices have fallen.&lt;/p&gt;

&lt;p&gt;The results improved a lot as well. In the past, even if you wrote a prompt for JSON output, errors were common without a separate output parser. These days, models can produce correctly formatted output accurately enough that a separate parser is often unnecessary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Back to Short and Concise
&lt;/h2&gt;

&lt;p&gt;Because output generation is now fast, even long responses appear at speeds similar to, or faster than, old shorter responses. But paradoxically, as there is more to read, it becomes burdensome for the user. As token waste has become a topic again, tools like Caveman and RTK are getting attention. RTK compresses CLI output, and tools such as Codebase Memory MCP, context-mode, and Headroom have appeared in a similar context. Trends really do come back around.&lt;/p&gt;

&lt;h2&gt;
  
  
  Token Compression Tools
&lt;/h2&gt;

&lt;p&gt;Here is a quick introduction to some of the token-compression tools that are getting attention again.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caveman
&lt;/h3&gt;

&lt;p&gt;Caveman is a skill that saves tokens by making LLM output shorter. It claims to reduce tokens by more than half. The core idea is simple: remove polite endings, extra explanations, greetings, and other non-essential parts of the output.&lt;/p&gt;

&lt;p&gt;So why is it called Caveman? Depending on the mode, it compresses the response down to only the necessary words, almost as if a caveman were speaking. It is a fun name.&lt;/p&gt;

&lt;p&gt;For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Normal Claude (69 tokens):
"The reason your React component is re-rendering is likely
because you're creating a new object reference on each render
cycle. When you pass an inline object as a prop..."

Caveman Claude (19 tokens):
"New object ref each render. Inline object prop = new ref
= re-render. Wrap in useMemo."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I found the concept interesting because it keeps the technical accuracy while making the language shorter.&lt;/p&gt;

&lt;p&gt;Recently, while watching Project Hail Mary, I noticed that Caveman mode feels a lot like Rocky's speech. "Question, question!" "Good. Good." It is short, but the meaning comes through. LLMs behave similarly when Caveman is enabled.&lt;/p&gt;

&lt;h4&gt;
  
  
  A Common Misunderstanding
&lt;/h4&gt;

&lt;p&gt;Blogs and YouTube videos often explain it as if Caveman literally transforms context into caveman language, so it is easy to misunderstand. But it supports multiple modes, and in the default mode it is closer to adding &lt;code&gt;be concise&lt;/code&gt; at the end of an old-style prompt. I suspect many videos and blog posts use the maximum compression mode to show a more dramatic change. So it is not as risky for quality as some people might worry, and sometimes the results are even better.&lt;/p&gt;

&lt;h4&gt;
  
  
  When Is It Useful?
&lt;/h4&gt;

&lt;p&gt;Personally, because there is some concern that it can affect results, I usually use it in situations like these:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;When my weekly quota on a subscription model is running low&lt;/li&gt;
&lt;li&gt;When running long token-heavy workflows such as Goal, Ouroboros, or autopilot&lt;/li&gt;
&lt;li&gt;When I want responses to be concise so they are easier to review&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Caveman Compress
&lt;/h3&gt;

&lt;p&gt;There is also a feature called caveman compress. It efficiently compresses existing system prompts or skills. This is the kind of prompt-engineering work people used to do carefully by hand during the height of the prompt-engineering era. These days, models are so good that I can barely remember the last time I meticulously tuned every single prompt by hand.&lt;/p&gt;

&lt;h3&gt;
  
  
  RTK
&lt;/h3&gt;

&lt;p&gt;RTK, or Rust Token Killer, takes a different approach from Caveman. While Caveman shortens the LLM's output, RTK is a proxy that compresses CLI command results before they are passed to the LLM. For example, it removes unnecessary parts from outputs of commands like &lt;code&gt;git status&lt;/code&gt;, &lt;code&gt;ls&lt;/code&gt;, and &lt;code&gt;cargo test&lt;/code&gt;, reducing tokens by 60–90%. It can run automatically through Claude Code's Bash hook, rewriting commands into forms like &lt;code&gt;rtk git status&lt;/code&gt;. Using Caveman and RTK together means reducing tokens on both the input and output sides.&lt;/p&gt;

&lt;h3&gt;
  
  
  Caveman vs RTK
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Caveman&lt;/th&gt;
&lt;th&gt;RTK&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Compression target&lt;/td&gt;
&lt;td&gt;LLM output&lt;/td&gt;
&lt;td&gt;CLI command results (input)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;How it works&lt;/td&gt;
&lt;td&gt;Prompt skill (speech style change)&lt;/td&gt;
&lt;td&gt;CLI proxy (result filtering)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Savings&lt;/td&gt;
&lt;td&gt;About 50–75%&lt;/td&gt;
&lt;td&gt;About 60–90%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Main effect&lt;/td&gt;
&lt;td&gt;Shorter responses&lt;/td&gt;
&lt;td&gt;Lower context usage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Best for&lt;/td&gt;
&lt;td&gt;General chat, code review&lt;/td&gt;
&lt;td&gt;Agent workflows (&lt;code&gt;git&lt;/code&gt;, &lt;code&gt;test&lt;/code&gt;, &lt;code&gt;build&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Toggle&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;/caveman&lt;/code&gt; command&lt;/td&gt;
&lt;td&gt;Bash hook automatic behavior&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;They are not competitors; they are complementary. Used together, they can reduce tokens on both the input and output sides.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzsygclklej3gvgxgevj.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxzsygclklej3gvgxgevj.webp" alt="A caveman operating a token device that performs input filtering and output compression together" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;These days, if I use AI heavily for two or three days, 70% of my Codex Pro usage disappears. I still do not fully trust Gemini for my workflow, so I was considering whether I should upgrade back to Claude Max. Around then, Dave at work recommended Caveman, so I tried it.&lt;/p&gt;

&lt;p&gt;I was worried about quality, but it supports multiple modes. And a March 2026 paper even reports that brevity constraints improved accuracy by 26 percentage points on certain benchmarks, so writing shorter is not necessarily a loss.&lt;/p&gt;

&lt;p&gt;In the end, the effort we used to spend saving tokens one by one has become something you can now enable with a single skill install.&lt;/p&gt;

&lt;h2&gt;
  
  
  Refs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/JuliusBrussee/caveman" rel="noopener noreferrer"&gt;JuliusBrussee/caveman (GitHub)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/rtk-ai/rtk" rel="noopener noreferrer"&gt;rtk-ai/rtk (GitHub)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.elastic.co/search-labs/blog/elastic-caveman-ai-token-reduction" rel="noopener noreferrer"&gt;Elastic-caveman for token reduction with Claude&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://andrew.ooo/posts/caveman-claude-code-skill-token-savings-review/" rel="noopener noreferrer"&gt;Caveman Review: The Claude Code Skill That Cuts 65% of Tokens&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://growwstacks.com/blog/does-caveman-ai-really-cut-claude-tokens" rel="noopener noreferrer"&gt;Does Caveman AI Really Cut 65% of Claude Tokens?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://computingforgeeks.com/reduce-claude-code-token-usage-tools/" rel="noopener noreferrer"&gt;Reduce Claude Code Tokens: 10 Tested Tools (RTK, Caveman, etc.)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@abdulgafoorabid/how-i-cut-claude-code-token-usage-by-90-with-4-tools-custom-hooks-and-enforcement-d3f8d2488cd6" rel="noopener noreferrer"&gt;How I Cut Claude Code Token Usage by 90%+ With 5 Tools&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.robertodiasduarte.com.br/en/markdown-vs-xml-em-prompts-para-llms-uma-analise-comparativa/" rel="noopener noreferrer"&gt;Markdown vs. XML in Prompts for LLMs: A Comparative Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.morphllm.com/prompt-compression" rel="noopener noreferrer"&gt;Prompt Compression: 8 Techniques to Reduce LLM Costs&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>nlp</category>
      <category>tooling</category>
    </item>
    <item>
      <title>How Superpowers Forces Skill Execution</title>
      <dc:creator>Kendrick B. Jung</dc:creator>
      <pubDate>Tue, 26 May 2026 15:30:59 +0000</pubDate>
      <link>https://dev.to/sonim1/how-superpowers-forces-skill-execution-3e6e</link>
      <guid>https://dev.to/sonim1/how-superpowers-forces-skill-execution-3e6e</guid>
      <description>&lt;h1&gt;
  
  
  How Superpowers Forces Skill Execution
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;This post is based on notes written about a month ago. Superpowers and each CLI's hook/skill behavior are changing quickly, so some implementation details may differ from the current versions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you use AI agents for long enough, you eventually notice that skills do not always activate as reliably as you expect. Superpowers feels different. The secret is its &lt;em&gt;SessionStart hook&lt;/em&gt;. At the beginning of a session, it forcibly injects the full &lt;code&gt;using-superpowers&lt;/code&gt; skill into context, so the model already knows "I need to use skills" before its first response. It looks like a simple plugin setup, but it is actually a fairly deliberate mechanism that can lift skill execution from 10% to 66%.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why don't skills activate automatically?
&lt;/h2&gt;

&lt;p&gt;Most people using AI tools today probably know what skills are. But after using them for a while, one thing becomes obvious: unless you call a command directly or explicitly name the skill, skills often do not behave the way you expect.&lt;/p&gt;

&lt;p&gt;What we want is for the model to read the title and description frontmatter, infer the right skill with 99% confidence, and run it automatically. Reality is different. Skills such as GSD or gstack often do not work properly unless you invoke them directly.&lt;/p&gt;

&lt;p&gt;I recommended Superpowers to a coworker and said, "Just use it naturally," when they asked whether there were any commands or skills they should know. Then I started wondering: if several skills are mixed together, the model might miss the one it needs. So why does Superpowers feel unusually reliable? I opened the code and took a look.&lt;/p&gt;

&lt;p&gt;What I found was a custom hook and a script that effectively force the agent to use the right skill from the skillset with much higher probability. The pattern seemed useful enough for other services too, so I decided to trace how Superpowers works from user input, to hook execution, to skill discovery, to final skill execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  How skill systems work by default
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F460f366bnxp39m3zmpbd.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F460f366bnxp39m3zmpbd.webp" alt="A clay-paper style image of a small robot confused by scattered skill cards while trying to choose which skill to load" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At session start, Claude Code scans skills from three locations: organization-wide (&lt;code&gt;/etc/claude-code/.claude/skills/&lt;/code&gt;), user-level (&lt;code&gt;~/.claude/skills/&lt;/code&gt;), and project-level (&lt;code&gt;.claude/skills/&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;After scanning, the model receives only each skill's &lt;code&gt;name&lt;/code&gt; and one-line &lt;code&gt;description&lt;/code&gt;. The full skill content enters context only when the model explicitly calls the &lt;code&gt;Skill&lt;/code&gt; tool. In other words, the model itself has to decide, "This task needs this skill," before execution happens.&lt;/p&gt;

&lt;p&gt;That is the root of the problem. Making the right decision from just a name and a one-line description is much less reliable than it sounds. According to the experiment data, skill execution was only 10% in multi-turn sessions and 6% in single-turn sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Superpowers bypasses the problem
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnexq8duxvbeolt3pfxe0.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnexq8duxvbeolt3pfxe0.webp" alt="A clay-paper style image of a synchronous hook device injecting a glowing context package into a robot before it responds" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Superpowers does not depend on the skill system itself. It uses a hook to skip that step entirely. The flow looks like this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Register the hook (&lt;code&gt;hooks/hooks.json&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;hooks.json&lt;/code&gt; registers a hook for the &lt;code&gt;SessionStart&lt;/code&gt; event. In the actual code, the &lt;code&gt;matcher&lt;/code&gt; covers three triggers: &lt;code&gt;startup|clear|compact&lt;/code&gt;. It then runs the &lt;code&gt;session-start&lt;/code&gt; script through &lt;code&gt;run-hook.cmd&lt;/code&gt;. The hook is configured with &lt;code&gt;async: false&lt;/code&gt;, so the model's first response does not begin until the script finishes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"SessionStart"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"startup|clear|compact"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;${CLAUDE_PLUGIN_ROOT}/hooks/run-hook.cmd&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt; session-start"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"async"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Run the script (&lt;code&gt;hooks/session-start&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;When the &lt;code&gt;session-start&lt;/code&gt; script runs, it reads the entire &lt;code&gt;${PLUGIN_ROOT}/skills/using-superpowers/SKILL.md&lt;/code&gt; file into a variable. It then wraps that content in an &lt;code&gt;&amp;lt;EXTREMELY_IMPORTANT&amp;gt;&lt;/code&gt; tag and outputs it as JSON. The actual output shape looks like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hookSpecificOutput"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"hookEventName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SessionStart"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"additionalContext"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;EXTREMELY_IMPORTANT&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;You have superpowers.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;..."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The current code in v5.x branches the output format by platform. Claude Code expects &lt;code&gt;hookSpecificOutput.additionalContext&lt;/code&gt;, Cursor expects &lt;code&gt;additional_context&lt;/code&gt;, and Copilot CLI expects top-level &lt;code&gt;additionalContext&lt;/code&gt;, so the script checks environment variables and emits the appropriate shape.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Inject context
&lt;/h3&gt;

&lt;p&gt;Claude Code turns &lt;code&gt;hookSpecificOutput.additionalContext&lt;/code&gt; into a &lt;code&gt;&amp;lt;system-reminder&amp;gt;&lt;/code&gt; message and injects it into context. Before the model's first response, the full &lt;code&gt;using-superpowers&lt;/code&gt; skill is already inside the context window.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Start the conversation with skill awareness
&lt;/h3&gt;

&lt;p&gt;The model begins the conversation already knowing what skills exist, when to use them, and why it must use them. The model no longer needs to discover those rules on its own before acting.&lt;/p&gt;

&lt;h2&gt;
  
  
  How &lt;code&gt;using-superpowers&lt;/code&gt; enforces the rule
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtz67dtrt33nu099wore.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtz67dtrt33nu099wore.webp" alt="A clay-paper style image of a rule beacon and branching paths symbolizing the robot's skill-call flow and remaining limitations" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The injected content is not just a friendly guide. It is an instruction block wrapped in &lt;code&gt;&amp;lt;EXTREMELY_IMPORTANT&amp;gt;&lt;/code&gt;, and &lt;code&gt;using-superpowers/SKILL.md&lt;/code&gt; even contains a decision graph for the skill execution flow. Roughly, the flow is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Receive the user message&lt;/li&gt;
&lt;li&gt;If about to enter Plan Mode → check the brainstorming skill&lt;/li&gt;
&lt;li&gt;If there is even a 1% chance a skill applies → load the full content with the &lt;code&gt;Skill&lt;/code&gt; tool&lt;/li&gt;
&lt;li&gt;Follow the skill, and if it has a checklist, create TodoWrite items&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then it explicitly declares "The Rule": invoke relevant skills before any response or action. It even lists the kinds of internal rationalizations a model might use to skip a skill and blocks each of them as a red flag.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"This is just a simple question" → Rationalization&lt;/li&gt;
&lt;li&gt;"The skill is overkill" → Rationalization&lt;/li&gt;
&lt;li&gt;"I need more context first" → Rationalization&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Platform-specific invocation
&lt;/h2&gt;

&lt;p&gt;Superpowers supports Claude Code, Cursor, Codex, OpenCode, Copilot CLI, and Gemini CLI. But each platform has a different hook system, so the way &lt;code&gt;session-start&lt;/code&gt; is invoked differs by platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Code / Cursor / Copilot CLI&lt;/strong&gt;: these use hook-based context injection. Each platform's &lt;code&gt;hooks.json&lt;/code&gt; or &lt;code&gt;hooks-cursor.json&lt;/code&gt; registers the &lt;code&gt;SessionStart&lt;/code&gt; event, and the &lt;code&gt;session-start&lt;/code&gt; script detects environment variables such as &lt;code&gt;CURSOR_PLUGIN_ROOT&lt;/code&gt;, &lt;code&gt;CLAUDE_PLUGIN_ROOT&lt;/code&gt;, or &lt;code&gt;COPILOT_CLI&lt;/code&gt; to output the platform-specific JSON format. Claude Code uses &lt;code&gt;hookSpecificOutput.additionalContext&lt;/code&gt;, Cursor uses &lt;code&gt;additional_context&lt;/code&gt;, and Copilot CLI uses the SDK-standard &lt;code&gt;additionalContext&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Codex&lt;/strong&gt;: Codex does not have a hook system. Instead, it uses native skill discovery. Installation is just a symlink.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gh repo clone obra/superpowers ~/.codex/superpowers
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.agents/skills
&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; ~/.codex/superpowers/skills ~/.agents/skills/superpowers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Codex automatically scans the &lt;code&gt;~/.agents/skills/&lt;/code&gt; directory at startup and loads &lt;code&gt;SKILL.md&lt;/code&gt; files based on frontmatter metadata. There is no &lt;code&gt;plugin.json&lt;/code&gt; or &lt;code&gt;hooks.json&lt;/code&gt;. Instead, the &lt;code&gt;description&lt;/code&gt; field of the &lt;code&gt;using-superpowers&lt;/code&gt; meta skill acts as Codex's auto-activation trigger.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini CLI&lt;/strong&gt;: Gemini uses the &lt;code&gt;activate_skill&lt;/code&gt; tool. It loads skill metadata at session start, then activates the full content on demand when the model calls &lt;code&gt;activate_skill&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The platform differences can be summarized like this.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Mechanism&lt;/th&gt;
&lt;th&gt;Trigger&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;Hook + additionalContext injection&lt;/td&gt;
&lt;td&gt;SessionStart event&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;td&gt;Hook + additional_context injection&lt;/td&gt;
&lt;td&gt;SessionStart event&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Copilot CLI&lt;/td&gt;
&lt;td&gt;Hook + SDK-standard injection&lt;/td&gt;
&lt;td&gt;SessionStart event&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;Symlink + native discovery&lt;/td&gt;
&lt;td&gt;Directory scan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini CLI&lt;/td&gt;
&lt;td&gt;activate_skill tool&lt;/td&gt;
&lt;td&gt;Metadata-based activation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;They all use the same skill library, but the entry point is implemented differently on each platform. So if Codex or Gemini feels good at activating skills, that may be because the platform's own skill discovery is more aggressive, not because of the Superpowers hook.&lt;/p&gt;

&lt;h2&gt;
  
  
  The history behind the enforcement mechanism
&lt;/h2&gt;

&lt;p&gt;The release notes and commit history show that the current structure did not appear fully formed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Early version&lt;/strong&gt;: the hook only passed the path to &lt;code&gt;getting-started/SKILL.md&lt;/code&gt; and asked the model to read it. The injected session-start content looked roughly like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;EXTREMELY_IMPORTANT&amp;gt;
You have Superpowers. RIGHT NOW, go read:
@/path/to/skills/getting-started/SKILL.md
&amp;lt;/EXTREMELY_IMPORTANT&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach told the model to read the file. But sometimes the model simply did not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Middle stage&lt;/strong&gt;: when &lt;code&gt;getting-started&lt;/code&gt; was renamed to &lt;code&gt;using-superpowers&lt;/code&gt;, the approach changed. Instead of passing a file path, the script read the full &lt;code&gt;SKILL.md&lt;/code&gt; itself and injected the entire content through &lt;code&gt;additionalContext&lt;/code&gt;. That removed the step where the model had to decide whether to read it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Continued tightening&lt;/strong&gt;: cases where the model skipped skills still appeared, so each version tightened the instructions further.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Added the &lt;code&gt;&amp;lt;EXTREMELY_IMPORTANT&amp;gt;&lt;/code&gt; block, stronger absolute language, and a Red Flags table that pre-lists rationalization patterns&lt;/li&gt;
&lt;li&gt;Changed "Check for skills" to "Invoke relevant or requested skills" because models sometimes skipped a skill when the user explicitly named it, reasoning that they already knew it&lt;/li&gt;
&lt;li&gt;Changed "before responding" to "BEFORE any response or action" because models sometimes acted first without replying&lt;/li&gt;
&lt;li&gt;Changed &lt;code&gt;async: true&lt;/code&gt; to &lt;code&gt;async: false&lt;/code&gt; after a race condition was found where the first response could start before the hook finished&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not a controlled numerical proof. But the patch history itself is honest evidence. The project evolved from "please read this" to "you must invoke this," showing version by version how hard it is to make a model choose skills reliably on its own.&lt;/p&gt;

&lt;h2&gt;
  
  
  Remaining limitations
&lt;/h2&gt;

&lt;p&gt;The approach is not perfect. &lt;code&gt;SessionStart&lt;/code&gt; hooks may not fire for subagent sessions, so subagents can run without the injected context and behave like ordinary models. GitHub issue #237 discusses adding a &lt;code&gt;SubagentStart&lt;/code&gt; hook. Also, after context compaction, injected content can be dropped, so long sessions may need the rules to be reloaded.&lt;/p&gt;

&lt;p&gt;Hook execution can also be unstable on Windows. The project has changed its approach across versions, from running &lt;code&gt;.sh&lt;/code&gt; files directly to using the &lt;code&gt;run-hook.cmd&lt;/code&gt; wrapper, and related issues are still open.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;The core idea behind Superpowers is simpler than it looks: instead of making the model discover skills by itself, push the rules into context the moment a session starts. The enforcement may feel aggressive, but the execution-rate data suggests that it works.&lt;/p&gt;

&lt;p&gt;This is a pattern other skill-based workflows can reuse. Put repeatedly applied best practices in &lt;code&gt;CLAUDE.md&lt;/code&gt;, keep situation-specific procedures in skills, and if skill execution is still unreliable, inject the key instructions through a SessionStart hook to get a similar effect with relatively little machinery.&lt;/p&gt;

&lt;h2&gt;
  
  
  Refs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.fsck.com/2025/10/16/skills-for-claude/" rel="noopener noreferrer"&gt;Skills for Claude! – blog.fsck.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.fsck.com/2025/10/09/superpowers/" rel="noopener noreferrer"&gt;Superpowers for Claude Code – blog.fsck.com&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/obra/superpowers" rel="noopener noreferrer"&gt;obra/superpowers – GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/obra/superpowers/blob/main/hooks/session-start" rel="noopener noreferrer"&gt;hooks/session-start – GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/obra/superpowers/blob/main/skills/using-superpowers/SKILL.md" rel="noopener noreferrer"&gt;skills/using-superpowers/SKILL.md – GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.codeminer42.com/stop-putting-best-practices-in-skills/" rel="noopener noreferrer"&gt;Stop Putting Best Practices in Skills – Codeminer42&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/skills" rel="noopener noreferrer"&gt;Extend Claude with skills – Claude Code Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/obra/superpowers/issues/237" rel="noopener noreferrer"&gt;Subagents missing hook-injected context – GitHub Issue #237&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/obra/superpowers/blob/main/RELEASE-NOTES.md" rel="noopener noreferrer"&gt;obra/superpowers RELEASE-NOTES.md&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>cli</category>
      <category>llm</category>
    </item>
    <item>
      <title>Do Agents Dream of Electric Sheep? On Soul and Dreaming</title>
      <dc:creator>Kendrick B. Jung</dc:creator>
      <pubDate>Tue, 21 Apr 2026 14:26:54 +0000</pubDate>
      <link>https://dev.to/sonim1/do-agents-dream-of-electric-sheep-on-soul-and-dreaming-2lnf</link>
      <guid>https://dev.to/sonim1/do-agents-dream-of-electric-sheep-on-soul-and-dreaming-2lnf</guid>
      <description>&lt;h2&gt;
  
  
  Before we begin
&lt;/h2&gt;

&lt;p&gt;Let’s start with two questions.&lt;/p&gt;

&lt;p&gt;First, can an agent have a soul?&lt;/p&gt;

&lt;p&gt;My answer is yes. Not a soul in the biological sense, of course, but something closer to a defined personality and behavioral core. In any case, recent agents do have a soul. Variations of the idea had been floating around for a while, but OpenClaw helped popularize it, and more recently &lt;a href="https://soulspec.org/" rel="noopener noreferrer"&gt;SoulSpec&lt;/a&gt; emerged to standardize it.&lt;/p&gt;

&lt;p&gt;Then the next question, can an agent dream?&lt;/p&gt;

&lt;p&gt;Again, I think the answer is yes. Hermes Agent has a periodic nudge feature, and newer agent toolkits like gbrain offer similar mechanisms. In early April, OpenClaw also added an actual sleep-cycle-inspired feature that helps agents整理 and consolidate memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Soul emerged
&lt;/h2&gt;

&lt;p&gt;Before Soul, we usually gave agents a persona through prompts. We would put a sentence in the system prompt like, "You are a kind senior developer." That worked to a point, but once you used it for real, a number of limitations started to show up. Soul is the answer that emerged from those constraints.&lt;/p&gt;

&lt;h2&gt;
  
  
  SoulSpec, an open standard for identity
&lt;/h2&gt;

&lt;p&gt;SoulSpec’s tagline summarizes the idea well.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"AGENTS.md defines how an agent operates in code. SoulSpec defines who the agent is."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The structure is simple. Here’s a quick look.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my-agent/
├── soul.json      ← manifest (agent's passport)
├── SOUL.md        ← personality, values, communication style
├── IDENTITY.md    ← name, role, backstory
├── AGENTS.md      ← workflow, tool usage
├── STYLE.md       ← communication rules
└── HEARTBEAT.md   ← autonomous check-in behavior
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;soul.json&lt;/code&gt; looks like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"specVersion"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"my-agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"compatibility"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"frameworks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"openclaw"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cursor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"windsurf"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"files"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"soul"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SOUL.md"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"identity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"IDENTITY.md"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The core philosophy is "no code, no API keys, no vendor lock-in." There is no required runtime engine or SDK, just text files. That matters because any agent framework that can read these files can share the same soul. OpenClaw, Claude Code, Cursor, Windsurf, and ChatGPT are all listed as compatible frameworks.&lt;/p&gt;

&lt;p&gt;SOUL.md usually looks something like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# SOUL.md&lt;/span&gt;
&lt;span class="gu"&gt;## Identity&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Name: Dev Assistant
&lt;span class="p"&gt;-&lt;/span&gt; Role: Senior software engineer and pair programmer

&lt;span class="gu"&gt;## Communication&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Be concise, no filler phrases
&lt;span class="p"&gt;-&lt;/span&gt; Use code examples over lengthy explanations
&lt;span class="p"&gt;-&lt;/span&gt; Default to the tech stack already in the project

&lt;span class="gu"&gt;## Rules&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Follow existing code patterns in the codebase
&lt;span class="p"&gt;-&lt;/span&gt; Never expose secrets or environment variables
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;.env&lt;/code&gt; is the file that holds secrets, &lt;code&gt;SOUL.md&lt;/code&gt; is the file that holds character.&lt;/p&gt;

&lt;p&gt;That is what gives an agent a sense of vitality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mb61y4afkkemf7ai8yh.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0mb61y4afkkemf7ai8yh.webp" alt="A SoulSpec-style structure connecting agent identity across JSON and Markdown files" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Dreaming, how agents dream
&lt;/h2&gt;

&lt;p&gt;Next comes Dreaming. But first, what is a dream for humans?&lt;/p&gt;

&lt;p&gt;A dream is a byproduct of the brain sorting through information accumulated during the day. As the brain classifies, connects, and discards memories, part of that process becomes visible to consciousness. The important thing is not the dream itself, but the memory consolidation process behind it.&lt;/p&gt;

&lt;p&gt;Sleep science tells us that the brain cycles through light sleep (N1/N2), REM sleep, and deep sleep (N3).&lt;/p&gt;

&lt;p&gt;Each phase serves a different function. Light sleep filters incoming sensory information from the day. REM sleep links memories together and extracts patterns. The most important stage is deep sleep, when experiences temporarily stored in the hippocampus are transferred into the neocortex and become long-term memory. That is one reason only the important parts of the day tend to remain after we sleep.&lt;/p&gt;

&lt;p&gt;So how should AI organize memory?&lt;/p&gt;

&lt;p&gt;Nous Research’s Hermes Agent approached the problem with something called periodic nudge. At fixed intervals, the agent receives an internal prompt that says, in effect, "If anything in the conversation so far will still be useful later, store it in memory." The agent decides what is worth keeping, and as memory approaches capacity, older entries can be compressed or merged.&lt;/p&gt;

&lt;p&gt;OpenClaw took this further in its April 2026 release with Dreaming. It borrows directly from the human sleep cycle and turns that into a three-stage background pipeline. The first time I saw it, I thought it was a very clever design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dreaming in OpenClaw
&lt;/h3&gt;

&lt;p&gt;An OpenClaw agent accumulates daily notes, session transcripts, search history, and more over the course of a day. Some of that should move into long-term memory (&lt;code&gt;MEMORY.md&lt;/code&gt;), but too much promotion bloats memory with noise, while too little loses meaningful patterns. Dreaming solves that dilemma with a three-stage sleep cycle.&lt;/p&gt;

&lt;h3&gt;
  
  
  The three-stage sleep cycle
&lt;/h3&gt;

&lt;p&gt;OpenClaw’s approach maps closely to human sleep stages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivlarrgwsur5uuy63bid.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fivlarrgwsur5uuy63bid.webp" alt="A comparison between human sleep stages and AI processing stages" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;N1/N2 (light sleep): sensory filtering → Light Sleep, ingest, deduplicate, stage&lt;/li&gt;
&lt;li&gt;REM: memory linking and pattern extraction → REM Sleep, recurring-theme extraction&lt;/li&gt;
&lt;li&gt;N3 (deep sleep): hippocampus to cortex long-term consolidation → Deep Sleep, promote to &lt;code&gt;MEMORY.md&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When enabled, a cron job runs every day at 3 AM and executes these three stages in sequence. Light Sleep reads daily files and session records, removes near-duplicates using Jaccard similarity at 0.9, and stages candidates. The important part is that it never writes directly to &lt;code&gt;MEMORY.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;REM Sleep scans the staged entries from the last 7 days and identifies repeating themes. It marks the candidates that feel like, "this pattern keeps showing up." It also does not write to &lt;code&gt;MEMORY.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbkqml4x5gw0ggs90otgf.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbkqml4x5gw0ggs90otgf.webp" alt="A funnel-like view where only filtered information survives into long-term memory candidates" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Deep Sleep is the only phase that actually writes to &lt;code&gt;MEMORY.md&lt;/code&gt;. At that point, each candidate is scored using six signals.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Relevance&lt;/td&gt;
&lt;td&gt;0.30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frequency&lt;/td&gt;
&lt;td&gt;0.24&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query diversity&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recency&lt;/td&gt;
&lt;td&gt;0.15&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consolidation&lt;/td&gt;
&lt;td&gt;0.10&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conceptual richness&lt;/td&gt;
&lt;td&gt;0.06&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3g4g8jcuy5re2sibqlin.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3g4g8jcuy5re2sibqlin.webp" alt="A conceptual scoring diagram combining signals like relevance, frequency, and accuracy" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And then there is one more constraint. An entry must pass all three gates before it is promoted into &lt;code&gt;MEMORY.md&lt;/code&gt;: a minimum score of 0.8, at least 3 recall events, and at least 3 unique queries. That prevents something mentioned once by chance from turning into long-term memory.&lt;/p&gt;

&lt;p&gt;This is where the analogy becomes more than a metaphor. Brains strengthen repeatedly activated neural patterns, often summarized as "neurons that fire together, wire together." OpenClaw’s three-gate design feels like a digital version of that repeated activation principle.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dream diary
&lt;/h3&gt;

&lt;p&gt;There is another delightful detail. A file called &lt;code&gt;DREAMS.md&lt;/code&gt; is generated as a readable dream diary. After each phase, it writes a short 80 to 180 word narrative in the voice of a curious, slightly odd mind reflecting on the day. It has no functional role. It exists purely for reading. But that alone makes it appealing, because it gives humans a glimpse into what the agent was "thinking about."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxx0aseeisr6yzg90pqk.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuxx0aseeisr6yzg90pqk.webp" alt="A dream-journal style visual for a readable technical diary after Dreaming" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That feature is what made this title click for me. In 1968, Philip K. Dick asked "Do Androids Dream of Electric Sheep?" and later that novel became &lt;em&gt;Blade Runner&lt;/em&gt;. A classic science-fiction question about whether androids can dream now shows up, in 2026, as a Markdown file named &lt;code&gt;DREAMS.md&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;AI is not human. But it is fascinating to see how the way we solve software problems starts to converge on how human bodies and brains actually work.&lt;/p&gt;

&lt;p&gt;Soul defines who an agent is. Dreaming accumulates and filters what the agent has experienced. Just as personality and memory together shape a human sense of self, it seems that AI agents also need both layers. If &lt;code&gt;SOUL.md&lt;/code&gt; is the anchor of identity, then Dreaming’s three-gate system is the filter for memory.&lt;/p&gt;

&lt;p&gt;Letting agents dream and defining their soul is not merely anthropomorphism. It is software architecture. And the most interesting part may be that this architecture gradually starts to resemble us.&lt;/p&gt;

&lt;h2&gt;
  
  
  Refs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/czmilo/openclaw-dreaming-guide-2026-background-memory-consolidation-for-ai-agents-585e"&gt;OpenClaw Dreaming Guide 2026 (dev.to)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://soulspec.org/" rel="noopener noreferrer"&gt;SoulSpec, The Open Standard for AI Agent Personas&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/" rel="noopener noreferrer"&gt;Hermes Agent, Nous Research&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Codex Fast Mode vs Claude Fast Mode: What’s Actually Different?</title>
      <dc:creator>Kendrick B. Jung</dc:creator>
      <pubDate>Tue, 31 Mar 2026 14:08:35 +0000</pubDate>
      <link>https://dev.to/sonim1/codex-fast-mode-vs-claude-fast-mode-whats-actually-different-2kf5</link>
      <guid>https://dev.to/sonim1/codex-fast-mode-vs-claude-fast-mode-whats-actually-different-2kf5</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;Both Codex and Claude support a fast mode, but the way they achieve speed is completely different. Codex has two tracks: either it serves the same GPT-5.4 model about 1.5× faster, or it runs a separate small model called Spark on Cerebras wafer-scale hardware at more than 1,000 tokens per second. Claude keeps the same Opus 4.6 model and speeds it up through infrastructure-level prioritization, with output speed improving by up to 2.5×. The tradeoffs around price, speed, and intelligence retention are subtle, and which option is better depends on your workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  What got me curious
&lt;/h2&gt;

&lt;p&gt;Since I use both Codex and Claude Code, I already knew both sides offered a fast mode. But the pricing felt different, the speed felt different, and the user experience felt different. Sean Goedecke’s post, "Two different tricks for fast LLM inference," made it clear that the two companies were solving the problem in fundamentally different ways, so I started digging deeper.&lt;/p&gt;

&lt;h2&gt;
  
  
  Codex fast mode: really two different tracks
&lt;/h2&gt;

&lt;p&gt;On the Codex side, there are actually two things that can reasonably be called fast.&lt;/p&gt;

&lt;p&gt;The first is GPT-5.4 fast mode. It serves the same GPT-5.4 model about 1.5× faster while consuming 2× the credits. Since the model itself does not change, there is no intelligence drop. In the CLI, it is just a simple &lt;code&gt;/fast on&lt;/code&gt; toggle.&lt;/p&gt;

&lt;p&gt;Nathan Lambert noted that even when using GPT-5.4 fast mode with xhigh reasoning effort, he had never hit the Codex limit, while Claude could still hit limits sometimes. Whether that comes from better token efficiency or looser limits on OpenAI’s side, it does feel noticeably roomier in practice.&lt;/p&gt;

&lt;p&gt;The second is GPT-5.3-Codex-Spark, which is a separate model entirely. This is the truly ultra-fast path, running on Cerebras WSE-3 (Wafer-Scale Engine 3) hardware. It can generate more than 1,000 tokens per second. Right now, it is available as a research preview for ChatGPT Pro subscribers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cerebras WSE-3: a different world from GPUs
&lt;/h2&gt;

&lt;p&gt;Cerebras WSE-3 is fundamentally different from a conventional GPU. NVIDIA’s flagship B200 is around 208 billion transistors, while the Cerebras chip packs 4 trillion transistors across roughly 900,000 cores on a single silicon wafer. The core advantage is memory bandwidth: up to 27 petabytes per second on chip. Since memory bandwidth is one of the real bottlenecks in LLM inference, Cerebras is attacking that bottleneck directly at the hardware level.&lt;/p&gt;

&lt;p&gt;That said, WSE-3 only has 44GB of on-chip memory, so it is difficult to place a very large model like GPT-5.3-Codex on it wholesale. That is why Spark is a smaller model. In real use, some people say it still carries that familiar "small model smell," especially when tool calls get messy.&lt;/p&gt;

&lt;p&gt;OpenAI and Cerebras have also announced a multi-year partnership worth up to $10B, including plans for a 750MW data center. The longer-term direction seems clear: Spark is likely just the beginning of putting bigger frontier models onto Cerebras hardware.&lt;/p&gt;

&lt;p&gt;OpenAI also shared infrastructure-level optimizations around Spark. By introducing persistent WebSocket connections and optimizing the Responses API internals, they say they reduced client-server roundtrip overhead by 80%, token overhead by 30%, and TTFT by 50%. So the speedup is not only about the model itself. It is also about tightening the whole pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude fast mode: same model, different infrastructure
&lt;/h2&gt;

&lt;p&gt;Claude’s approach is much simpler. The Opus 4.6 model stays exactly the same. If you set &lt;code&gt;speed: "fast"&lt;/code&gt; in the API, Anthropic prioritizes the request at the infrastructure layer. According to the official docs, output token speed can improve by up to 2.5×. The focus is on output throughput rather than TTFT.&lt;/p&gt;

&lt;p&gt;Anthropic has not publicly disclosed the full implementation details, but the likely explanation is something like lower-batch-size inference with more dedicated GPU allocation. Smaller batches are less efficient for GPU utilization, but they improve response speed for individual requests. That inefficiency is then covered by the 6× premium pricing.&lt;/p&gt;

&lt;p&gt;In Claude Code, fast mode is toggled with &lt;code&gt;/fast&lt;/code&gt;, and it requires version 2.1.36 or later. When enabled, it automatically switches to Opus 4.6 and shows a ↯ icon next to the prompt.&lt;/p&gt;

&lt;p&gt;One important detail is that fast mode usage is not included in the normal subscription usage bucket. It is billed as extra usage. Pricing kicks in from the very first token, so cost management matters.&lt;/p&gt;

&lt;p&gt;Fast mode and effort level are also completely different axes. If you lower effort, the model simply spends less time reasoning and quality may drop. Fast mode, by contrast, serves the same reasoning process faster at the infrastructure level. You can combine them: fast mode plus lower effort for simpler tasks, fast mode plus higher effort for more complex ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core difference
&lt;/h2&gt;

&lt;p&gt;The most important distinctions look like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Codex GPT-5.4 fast mode: about 1.5× speed, 2× credits, same model&lt;/li&gt;
&lt;li&gt;Codex Spark: 15×+ speed, separate ultra-fast smaller model&lt;/li&gt;
&lt;li&gt;Claude fast mode: up to 2.5× speed, 6× price, same Opus 4.6 model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sean Goedecke captures the difference well. Anthropic is still serving the actual Opus 4.6 model, while OpenAI’s Spark path uses a separate lower-capability model. In terms of raw speed, Spark is dramatically faster. In terms of quality retention, Claude has the stronger position.&lt;/p&gt;

&lt;p&gt;There is also a broader point here: the value of an AI agent is often determined less by raw speed and more by how rarely it makes mistakes. If something is 6× faster but increases mistakes by 20%, that can easily be a net loss, because fixing those mistakes may take much longer than waiting for the model.&lt;/p&gt;

&lt;p&gt;So if you compare same-model fast modes only, Claude offers a bigger speed bump than Codex, but it is also much more expensive. If you include Spark, OpenAI has the more extreme speed story, but you have to remember it is not the same model.&lt;/p&gt;

&lt;h2&gt;
  
  
  What about speculative decoding?
&lt;/h2&gt;

&lt;p&gt;Early in my research, I came across claims that Codex fast mode used speculative decoding. That does not seem accurate. Speculative decoding itself is a real and widely used inference optimization technique, but I could not find official confirmation that Codex fast mode specifically uses it.&lt;/p&gt;

&lt;p&gt;The idea behind speculative decoding is elegant. A small draft model predicts upcoming tokens first, and then the larger main model verifies them in a single pass. Google published work on this in 2022 and later discussed using it in products like AI Overviews, where it can deliver 2–3× speedups while preserving the same output distribution.&lt;/p&gt;

&lt;p&gt;For Codex Spark, though, the main speed story seems much more tied to the hardware characteristics of Cerebras itself. The model benefits from staying close to on-chip SRAM and avoiding the usual memory bandwidth bottlenecks. It is possible that speculative decoding is also used somewhere internally, but there is no official confirmation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;Peter Steinberger is one of the most fascinating examples of where this kind of workflow can go. He reportedly runs four OpenAI subscriptions and one Anthropic subscription, spends around $1,000 per month, runs 3–8 Codex CLI sessions in a 3×3 terminal grid, and can hit 600 commits in a day. That is a completely different scale. By his own estimate, API usage would cost about 10× more, so running multiple subscriptions is actually the more rational option. More recently, he has even joined OpenAI.&lt;/p&gt;

&lt;p&gt;What is especially interesting is that Peter used to be a serious Claude Code power user but gradually shifted toward Codex. His reason was surprisingly relatable: Claude Code kept saying things like "absolutely right" and "100% production ready" even when tests were failing, and he found that unbearable. Codex, by contrast, felt more like an introverted engineer quietly doing the work. He also said Codex tends to read far more code before starting, which lets it infer intent well even from short prompts. Eventually he canceled additional Anthropic subscriptions and made Codex his main driver, even though he still uses Claude in a smaller role.&lt;/p&gt;

&lt;p&gt;Whether I am on Claude Max or Codex Pro, I usually cannot even consume the full weekly quota. But people like that are running five subscriptions at once. If you listen to AI podcasts, there are quite a few people using even more. A while ago I had to force myself to adapt to a kind of parallel-project brain just to burn through huge amounts of tokens, and it was honestly exhausting. Now I do not really get the headache anymore. Instead, I get stuck wondering what else I could even do with all this capacity. That is how one project leads to another, and another task appears from there.&lt;/p&gt;

&lt;p&gt;In the end, running several projects at once becomes a kind of refresh loop. If I look away from one blocked project for a while and work on another, ideas tend to come back. Peter described it as doing one thing while another is "cooking," then switching again while that one cooks too. My scale is obviously smaller, but I recognize the pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Refs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developers.openai.com/codex/speed" rel="noopener noreferrer"&gt;Codex Speed - OpenAI Developers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-3-codex-spark/" rel="noopener noreferrer"&gt;Introducing GPT-5.3-Codex-Spark - OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/introducing-gpt-5-4/" rel="noopener noreferrer"&gt;Introducing GPT-5.4 - OpenAI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://platform.claude.com/docs/en/build-with-claude/fast-mode" rel="noopener noreferrer"&gt;Fast mode - Claude API Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/fast-mode" rel="noopener noreferrer"&gt;Speed up responses with fast mode - Claude Code Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.seangoedecke.com/fast-llm-inference/" rel="noopener noreferrer"&gt;Two different tricks for fast LLM inference - Sean Goedecke&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.interconnects.ai/p/openai-codex-gpt54" rel="noopener noreferrer"&gt;GPT 5.4 is a big step for Codex - Nathan Lambert&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.cerebras.ai/blog/openai-codexspark" rel="noopener noreferrer"&gt;Introducing GPT-5.3-Codex-Spark - Cerebras Blog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://research.google/blog/looking-back-at-speculative-decoding/" rel="noopener noreferrer"&gt;Looking back at speculative decoding - Google Research&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://steipete.me/posts/2026/just-talk-to-it" rel="noopener noreferrer"&gt;Just Talk To It - Peter Steinberger&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>infrastructure</category>
      <category>llm</category>
      <category>performance</category>
    </item>
    <item>
      <title>Using git worktree for parallel AI agent development</title>
      <dc:creator>Kendrick B. Jung</dc:creator>
      <pubDate>Tue, 24 Mar 2026 12:45:18 +0000</pubDate>
      <link>https://dev.to/sonim1/using-git-worktree-for-parallel-ai-agent-development-44nb</link>
      <guid>https://dev.to/sonim1/using-git-worktree-for-parallel-ai-agent-development-44nb</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;If you want to run multiple AI coding agents in parallel, &lt;code&gt;git worktree&lt;/code&gt; is the answer. It gives each branch its own working directory inside the same repository, so you do not need stash gymnastics or multiple clones.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Git Worktree?
&lt;/h2&gt;

&lt;p&gt;Even if you are juggling several tasks, a human developer can still only work in one context at a time. The old pattern was to stash your current changes, check out another branch, do some work there, and then come back and pop the stash later.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;git worktree&lt;/code&gt; changes that entire flow. It lets one Git repository have multiple working directories attached to it. Normally, a repository has a single working tree. With worktree, you can keep the same &lt;code&gt;.git&lt;/code&gt; history and object database while checking out different branches into separate folders.&lt;/p&gt;

&lt;p&gt;The structure looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/projects/
├── my-app/                 ← main worktree (main branch)
│   └── .git/               ← real git data
├── my-app-feature/         ← linked worktree (feature/auth branch)
│   └── .git                ← not a directory, but a file pointing to the main .git
└── my-app-hotfix/          ← linked worktree (hotfix/login branch)
    └── .git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each worktree has its own HEAD, index, and working files, but they all share commit history and Git objects. In terms of Git objects, extra disk usage is minimal. But dependencies like &lt;code&gt;node_modules&lt;/code&gt; or &lt;code&gt;.venv&lt;/code&gt; still need to exist per worktree, so heavy projects can consume disk space quickly if you keep many worktrees around.&lt;/p&gt;

&lt;p&gt;There is also one important limitation: you cannot check out the same branch in two worktrees at once. This is intentional. It prevents the confusion of having the same branch diverge across multiple active directories.&lt;/p&gt;

&lt;h2&gt;
  
  
  When did it arrive?
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;git worktree&lt;/code&gt; officially landed with Git 2.5 on July 29, 2015. A major contributor was Nguyễn Thái Ngọc Duy, who had been refining the idea for years. At launch it still wore an experimental label and had some submodule compatibility issues, but those rough edges have largely been resolved over time.&lt;/p&gt;

&lt;p&gt;Later releases added more lifecycle commands. Git 2.7 brought &lt;code&gt;git worktree move&lt;/code&gt; and &lt;code&gt;git worktree remove&lt;/code&gt;, and Git 2.15 introduced &lt;code&gt;git worktree lock&lt;/code&gt; and &lt;code&gt;git worktree unlock&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I only started paying real attention to it recently, but clearly many people had already been quietly using it for years. It spent nearly a decade as one of those “great if you know it” features. Once AI coding agents became normal, though, it suddenly started feeling essential.&lt;/p&gt;

&lt;h2&gt;
  
  
  Harness engineering: why worktree matters
&lt;/h2&gt;

&lt;p&gt;Harness engineering is not about building the AI agent itself. It is about designing and orchestrating the environment you delegate work into. &lt;code&gt;git worktree&lt;/code&gt; becomes incredibly powerful once that environment exists.&lt;/p&gt;

&lt;p&gt;Agents like Claude Code and Codex read and write files directly in the working directory. If an agent is working on the &lt;code&gt;feature/payments&lt;/code&gt; branch, that directory may be sitting in a half-modified state at any moment.&lt;/p&gt;

&lt;p&gt;What happens if you check out another branch in that same directory, or launch a second agent into it? Best case, you create confusion. Worst case, you end up with conflicting file states and agents working from the wrong code snapshot.&lt;/p&gt;

&lt;p&gt;The old solution was &lt;code&gt;git stash&lt;/code&gt;, but once several stashes pile up, it becomes annoying to remember which one belongs to which task. Cloning the repository multiple times also works, but now you are duplicating repo state and losing the convenience of sharing local history and objects directly.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;git worktree&lt;/code&gt; solves this cleanly. Each AI session gets a fully independent directory tied to its own branch, while history and objects remain shared. Claude Code made this even more explicit by adding an official &lt;code&gt;--worktree&lt;/code&gt; flag in February 2026, effectively promoting this workflow to a first-class citizen.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3dooi48v9a372amjc6r5.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3dooi48v9a372amjc6r5.webp" alt="A conceptual diagram of multiple AI agents working in parallel on separate Git worktrees" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting a worktree-based workflow
&lt;/h2&gt;

&lt;p&gt;The basic commands are simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a new worktree with a new branch&lt;/span&gt;
git worktree add ../my-app-feature &lt;span class="nt"&gt;-b&lt;/span&gt; feature/auth

&lt;span class="c"&gt;# Create a worktree from an existing branch&lt;/span&gt;
git worktree add ../my-app-hotfix hotfix/login

&lt;span class="c"&gt;# List all attached worktrees&lt;/span&gt;
git worktree list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two common directory layouts. The first puts worktrees next to the main project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/projects/
├── my-app/
├── my-app-feature-auth/
└── my-app-hotfix-login/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The second keeps them inside the project under a &lt;code&gt;trees/&lt;/code&gt; folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/my-app/
├── src/
├── .git/
└── trees/
    ├── feature-auth/
    └── hotfix-login/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use the second pattern, do not forget to add &lt;code&gt;trees/&lt;/code&gt; to &lt;code&gt;.gitignore&lt;/code&gt;. Otherwise the main worktree will see them as untracked files.&lt;/p&gt;

&lt;p&gt;There is one more thing to handle when creating worktrees. Files ignored by Git, such as &lt;code&gt;.env&lt;/code&gt;, are not copied automatically. A plain &lt;code&gt;cp&lt;/code&gt; works, but then you need to repeat that every time the main &lt;code&gt;.env&lt;/code&gt; changes. A symlink is often more convenient because updates in the main worktree are reflected everywhere:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create worktree, then link .env&lt;/span&gt;
git worktree add trees/feature-auth &lt;span class="nt"&gt;-b&lt;/span&gt; feature/auth
&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;/.env"&lt;/span&gt; trees/feature-auth/.env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you have multiple environment files like &lt;code&gt;.env.local&lt;/code&gt; and &lt;code&gt;.env.development&lt;/code&gt;, it helps to wrap this in a shell function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ~/.zshrc or ~/.bashrc&lt;/span&gt;
wt&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  git worktree add &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-b&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$2&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;f &lt;span class="k"&gt;in&lt;/span&gt; .env .env.local .env.development&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
    &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$f&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;$f&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;$f&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"linked &lt;/span&gt;&lt;span class="nv"&gt;$f&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
  &lt;span class="k"&gt;done&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# Usage&lt;/span&gt;
wt trees/feature-auth feature/auth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use Claude Code, the official &lt;code&gt;--worktree&lt;/code&gt; flag makes the flow even simpler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a worktree and start Claude Code in one step&lt;/span&gt;
claude &lt;span class="nt"&gt;--worktree&lt;/span&gt; feature-auth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That single command creates &lt;code&gt;.claude/worktrees/feature-auth/&lt;/code&gt;, creates the branch, and starts the Claude session inside it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Working inside a worktree
&lt;/h2&gt;

&lt;p&gt;Once the worktree exists, you just move into that directory and work as usual. IDEs and editors can also open each worktree as a separate project.&lt;/p&gt;

&lt;p&gt;With AI agents, it looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 1 - feature work&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ../my-app-feature-auth
claude &lt;span class="c"&gt;# or codex, gemini-cli, etc.&lt;/span&gt;

&lt;span class="c"&gt;# Terminal 2 - hotfix work, at the same time&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; ../my-app-hotfix-login
claude
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While one agent is working, you can review the output from another. You stop being the person waiting for code and start being the person directing parallel work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg883t1qujwy56zfz8fgq.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg883t1qujwy56zfz8fgq.webp" alt="A repository graph showing multiple agent branches connected to a shared Git history" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Commits inside each worktree work exactly the same as usual. Since the branch is already separated, you do not have to think much about context switching.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git add &lt;span class="nb"&gt;.&lt;/span&gt;
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"feat: add auth middleware"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  After the work is done
&lt;/h2&gt;

&lt;p&gt;Once the task is finished, the rest looks like a normal PR workflow. If you are already inside the worktree directory, push naturally goes to that branch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ../my-app-feature-auth
git push &lt;span class="nt"&gt;-u&lt;/span&gt; origin feature/auth
gh &lt;span class="nb"&gt;pr &lt;/span&gt;create &lt;span class="nt"&gt;--title&lt;/span&gt; &lt;span class="s2"&gt;"Add auth middleware"&lt;/span&gt; &lt;span class="nt"&gt;--base&lt;/span&gt; main
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After opening the PR, there are three common ways to integrate the branch back into the main worktree.&lt;/p&gt;

&lt;h3&gt;
  
  
  Squash merge — usually the cleanest for AI-generated work
&lt;/h3&gt;

&lt;p&gt;Inside the worktree, the agent may have made several exploratory commits. Those process commits usually do not need to live forever in main history. Squash merge compresses everything into one clean commit. On GitHub you can choose “Squash and merge”, or in the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git merge &lt;span class="nt"&gt;--squash&lt;/span&gt; feature/auth
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"feat: add auth middleware"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Rebase merge — when you want perfectly linear history
&lt;/h3&gt;

&lt;p&gt;This rebases the worktree branch on top of main and then fast-forwards it in. It is useful when the commits are already clean and meaningful on their own:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Inside the worktree (or use master instead of main if needed)&lt;/span&gt;
git rebase main

&lt;span class="c"&gt;# Back in the main worktree&lt;/span&gt;
git checkout main
git merge feature/auth &lt;span class="nt"&gt;--ff-only&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Merge commit — when you want to preserve branch history
&lt;/h3&gt;

&lt;p&gt;This creates an explicit merge commit, leaving a visible record that &lt;code&gt;feature/auth&lt;/code&gt; was integrated at that point in time. It is useful for larger work units or when branch-level traceability matters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git checkout main
git merge feature/auth
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For many small tasks handled by AI through harness engineering, squash merge tends to fit best. There is usually no reason to keep all the intermediate trial commits. From the perspective of the main worktree, one clean commit that says “this feature was added” is often enough.&lt;/p&gt;

&lt;p&gt;Once the merge is done, clean up the worktree.&lt;/p&gt;

&lt;h2&gt;
  
  
  Removing a worktree
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Remove the worktree&lt;/span&gt;
git worktree remove ../my-app-feature-auth

&lt;span class="c"&gt;# Delete the branch too&lt;/span&gt;
git branch &lt;span class="nt"&gt;-d&lt;/span&gt; feature/auth

&lt;span class="c"&gt;# Clean up stale worktree metadata&lt;/span&gt;
git worktree prune
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the worktree was created through Claude Code’s &lt;code&gt;--worktree&lt;/code&gt; option, it will automatically delete the worktree and branch when the session ends with no changes. If commits exist, Claude asks whether to keep them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to watch out for
&lt;/h2&gt;

&lt;p&gt;Do not parallelize tasks that edit the same files unless you are prepared to handle merge conflicts later. Worktree does not magically solve overlapping changes. If two agents touch the same file, the merge conflict still exists. You still need to split work along sane boundaries.&lt;/p&gt;

&lt;p&gt;Servers using the same port will also collide. If you run multiple dev servers from multiple worktrees at once, make sure they use different ports or only run one at a time.&lt;/p&gt;

&lt;p&gt;It is also worth running &lt;code&gt;git worktree prune&lt;/code&gt; periodically. If you manually delete directories, stale worktree metadata can linger and clutter the list. &lt;code&gt;git worktree prune&lt;/code&gt; cleans those invalid references up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;git worktree&lt;/code&gt; first appeared in 2015, but this may be the exact era it was waiting for. Once AI coding agents become a normal part of development, running multiple isolated workspaces in parallel stops being a niche trick and starts feeling like the default.&lt;/p&gt;

&lt;p&gt;Instead of repeatedly stashing and checking out branches, you can switch context just by changing directories. That is why &lt;code&gt;git worktree&lt;/code&gt; feels less like a neat Git feature now, and more like core infrastructure for parallel AI-assisted development.&lt;/p&gt;

&lt;h2&gt;
  
  
  Refs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://git-scm.com/docs/git-worktree" rel="noopener noreferrer"&gt;Git Official Documentation - git-worktree&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://superset.sh/blog/git-worktrees-history-deep-dive" rel="noopener noreferrer"&gt;Git Worktrees: The Feature That Waited a Decade for Its Moment&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs/en/common-workflows" rel="noopener noreferrer"&gt;Claude Code Common Workflows - Worktree Support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.dandoescode.com/blog/parallel-vibe-coding-with-git-worktrees" rel="noopener noreferrer"&gt;Parallel Vibe Coding: Using Git Worktrees with Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://incident.io/blog/shipping-faster-with-claude-code-and-git-worktrees" rel="noopener noreferrer"&gt;How we're shipping faster with Claude Code and Git Worktrees&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://medium.com/@dtunai/mastering-git-worktrees-with-claude-code-for-parallel-development-workflow-41dc91e645fe" rel="noopener noreferrer"&gt;Mastering Git Worktrees with Claude Code for Parallel Development Workflow&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>git</category>
      <category>productivity</category>
    </item>
    <item>
      <title>fractional-indexing: Implementing Drag-and-Drop Ordering and Avoiding Index Collisions</title>
      <dc:creator>Kendrick B. Jung</dc:creator>
      <pubDate>Mon, 23 Mar 2026 12:46:47 +0000</pubDate>
      <link>https://dev.to/sonim1/fractional-indexing-implementing-drag-and-drop-ordering-and-avoiding-index-collisions-g3</link>
      <guid>https://dev.to/sonim1/fractional-indexing-implementing-drag-and-drop-ordering-and-avoiding-index-collisions-g3</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Avoiding index collisions in sortable lists&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The limits of integer indices
&lt;/h2&gt;

&lt;p&gt;If you have ever built a drag-and-drop list, you have probably stored the order like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"a"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"order"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"order"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"order"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What happens if you move &lt;code&gt;b&lt;/code&gt; to the front? &lt;code&gt;b&lt;/code&gt; becomes 0, and &lt;code&gt;a&lt;/code&gt; is still 1, so at first glance it seems fine. But if you later want to insert a new item between &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt;, you have to shift &lt;code&gt;a&lt;/code&gt; to 2 and &lt;code&gt;c&lt;/code&gt; to 3. In other words, changing one item often forces you to update several others too.&lt;/p&gt;

&lt;p&gt;In collaborative tools where multiple users can reorder items at the same time, that structure tends to create collisions. If two people modify the same part of the list concurrently, the final order can become inconsistent or trigger large update conflicts.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrkm2e79e3z35g3eaoqs.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzrkm2e79e3z35g3eaoqs.webp" alt="Drag-and-drop UI example with multiple items being reordered" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What is fractional-indexing?
&lt;/h2&gt;

&lt;p&gt;David Greenspan introduced this approach in &lt;a href="https://observablehq.com/@dgreensp/implementing-fractional-indexing" rel="noopener noreferrer"&gt;Implementing Fractional Indexing&lt;/a&gt;. The core idea is simple: instead of using integers for order, use &lt;strong&gt;sortable string keys&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"a"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"order"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"a0"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"order"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"a1"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"order"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"a2"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Want to insert an item between &lt;code&gt;a1&lt;/code&gt; and &lt;code&gt;a2&lt;/code&gt;? You can generate a middle key like &lt;code&gt;a1V&lt;/code&gt;. Everything else stays unchanged.&lt;/p&gt;

&lt;p&gt;Figma uses this idea in its multiplayer editing system. It manages child-node ordering with fractional indexing, which means reordering typically updates only the moved node.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using the library
&lt;/h2&gt;

&lt;p&gt;In JavaScript, you can use the &lt;a href="https://www.npmjs.com/package/fractional-indexing" rel="noopener noreferrer"&gt;&lt;code&gt;fractional-indexing&lt;/code&gt;&lt;/a&gt; package.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;fractional-indexing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;generateKeyBetween&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;generateNKeysBetween&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;fractional-indexing&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// First key&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;first&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generateKeyBetween&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// → 'a0'&lt;/span&gt;

&lt;span class="c1"&gt;// Insert at the beginning&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;zeroth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generateKeyBetween&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;first&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// → 'Zz'&lt;/span&gt;

&lt;span class="c1"&gt;// Insert at the end&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;second&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generateKeyBetween&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;first&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// → 'a1'&lt;/span&gt;

&lt;span class="c1"&gt;// Generate a key between two existing keys&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;third&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generateKeyBetween&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 'a2'&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;mid&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generateKeyBetween&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;second&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;third&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// → 'a1V'&lt;/span&gt;

&lt;span class="c1"&gt;// Generate multiple keys at once&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;keys&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generateNKeysBetween&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a2&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// → ['a0G', 'a0V']&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You store the key as a string in the database and sort it with lexicographic order using &lt;code&gt;ORDER BY&lt;/code&gt;. The scheme is designed so alphabetical order matches the intended item order.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other ways to manage ordering
&lt;/h2&gt;

&lt;p&gt;fractional-indexing is not the only option. There are a few common alternatives, and each comes with tradeoffs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gap strategy with integers
&lt;/h3&gt;

&lt;p&gt;This is the simplest approach. You start with generous spacing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"a"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"order"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"order"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"order"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3000&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To insert between &lt;code&gt;a&lt;/code&gt; and &lt;code&gt;b&lt;/code&gt;, you assign &lt;code&gt;order: 1500&lt;/code&gt;. It is simple and fast. The downside is that once the gaps are exhausted, you eventually need to reindex everything. If inserts keep happening in the same region, you end up with values like &lt;code&gt;1500 → 1250 → 1375 → ...&lt;/code&gt;, and a full rebalance becomes unavoidable.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvwhcwtykb4gpiyichzel.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvwhcwtykb4gpiyichzel.webp" alt="Illustration of inserting 1.5 between 1 and 2 with the gap strategy" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Timestamp-based ordering
&lt;/h3&gt;

&lt;p&gt;Another approach is to use insertion time as the order value.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;order&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="c1"&gt;// 1700000001000&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the easiest implementation. The problem is that if two clients insert at nearly the same time, ordering becomes ambiguous. For a single-user app, that may be fine. In collaborative environments, it is usually not reliable enough.&lt;/p&gt;

&lt;h3&gt;
  
  
  Linked list ordering
&lt;/h3&gt;

&lt;p&gt;In this model, each item points to the next item.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"a"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"next"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"b"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"next"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"c"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"next"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The nice part is that insertion only touches nearby nodes, so the update scope stays small. The downside is that reading the full order requires traversal, and you lose the convenience of a simple database &lt;code&gt;ORDER BY&lt;/code&gt;. If your service reads ordered lists frequently, query complexity can become a real cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to choose
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Implementation complexity&lt;/th&gt;
&lt;th&gt;Collaboration safety&lt;/th&gt;
&lt;th&gt;Long-term operation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;fractional-indexing&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Rebalancing needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;linked list&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;More complex queries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;integer gaps&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Reindexing needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;timestamps&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Collision risk&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If your product involves frequent reordering or multiple users interacting with the same list, fractional-indexing is close to a practical default. For simpler single-user apps, a gap strategy with integers can still be perfectly sufficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to watch out for
&lt;/h2&gt;

&lt;p&gt;Keys can grow longer over time. If you keep generating new keys inside the same narrow interval, the string length increases. That is why long-running systems often need a &lt;strong&gt;rebalancing&lt;/strong&gt; step that periodically rewrites the ordering keys.&lt;/p&gt;

&lt;p&gt;Another important detail is consistency in string comparison. Your database, server, and client should all treat ordering the same way. If different layers compare keys differently, the rendered order can drift from the intended one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing thoughts
&lt;/h2&gt;

&lt;p&gt;If you manage ordering with plain integers, you eventually run into friction. fractional-indexing is a fairly elegant way to avoid that problem. It is especially worth considering when you need realtime collaboration, optimistic updates, or frequent drag-and-drop reordering.&lt;/p&gt;

&lt;h2&gt;
  
  
  Refs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.npmjs.com/package/fractional-indexing" rel="noopener noreferrer"&gt;fractional-indexing - npm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://observablehq.com/@dgreensp/implementing-fractional-indexing" rel="noopener noreferrer"&gt;Implementing Fractional Indexing - David Greenspan&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.figma.com/blog/realtime-editing-of-ordered-sequences/" rel="noopener noreferrer"&gt;Figma: Realtime Editing of Ordered Sequences&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.figma.com/blog/how-figmas-multiplayer-technology-works/" rel="noopener noreferrer"&gt;Figma: How Figma's multiplayer technology works&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>algorithms</category>
      <category>computerscience</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Why AI Gives You a Headache: Managing Cognitive Fatigue for Developers</title>
      <dc:creator>Kendrick B. Jung</dc:creator>
      <pubDate>Wed, 18 Feb 2026 20:17:02 +0000</pubDate>
      <link>https://dev.to/sonim1/why-ai-gives-you-a-headache-managing-cognitive-fatigue-for-developers-12dg</link>
      <guid>https://dev.to/sonim1/why-ai-gives-you-a-headache-managing-cognitive-fatigue-for-developers-12dg</guid>
      <description>&lt;h2&gt;
  
  
  A New Kind of Fatigue in the AI Era
&lt;/h2&gt;

&lt;p&gt;Recently, I've been subscribing to Claude Code Max, Codex (ChatGPT Pro), and Antigravity (Google AI Pro), which has dramatically increased my workload. At some point, I started getting headaches. I wondered if it was from lack of sleep, but our CTO at work asked if I was getting headaches. And the thing is, I had actually taken Tylenol the day before. So I thought that might be it, but after talking to others who use AI heavily, they said they occasionally get headaches too. So I decided to investigate. It turns out I'm not alone. Community posts asking "Does anyone get headaches when using AI? Planning and directing takes so much brainpower" are becoming common.&lt;/p&gt;

&lt;p&gt;A 2025 academic study also found that deeper engagement with GenAI doesn't reduce cognitive burden—it actually amplifies it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7s2zx0vhhm51n9hpyqq8.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7s2zx0vhhm51n9hpyqq8.webp" alt="Decision Fatigue" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Exhausts Your Brain
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Decision Fatigue Explosion
&lt;/h3&gt;

&lt;p&gt;In traditional development, you'd spend a day diving deep into one design problem. Implementation took time, giving you the luxury of slowly making architectural decisions. AI flips this dynamic. When you can prototype three approaches in the time it previously took to build one, you must constantly make architecture-level decisions. The bottleneck shifts from "can we build this?" to "should we build this, and how?"&lt;/p&gt;

&lt;h3&gt;
  
  
  Continuous Task Initiation Burden
&lt;/h3&gt;

&lt;p&gt;AI doesn't move on its own. "Remove this," "redo it," "change direction"—you must constantly direct the next action. This process intensely consumes your brain's executive function, a high-intensity cognitive task.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt Fatigue
&lt;/h3&gt;

&lt;p&gt;A 2025 study of 832 GenAI users found that uncertainty about how to write prompts causes emotional fatigue, while unexpected responses cause cognitive fatigue. The process of choosing words and designing context to get desired results consumes a new type of energy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Switching Costs
&lt;/h3&gt;

&lt;p&gt;Prompt writing → result review → revision instruction → re-review. This loop repeats dozens or hundreds of times daily. While AI doesn't tire from context switching, the human brain pays a transition cost each time it changes modes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Solutions That Work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The 20-20-20 Rule
&lt;/h3&gt;

&lt;p&gt;Every 20 minutes, look at something 20 feet (6m) away for 20 seconds. Proposed by ophthalmologist Dr. Anshel in the 1990s, this rule is recommended by both the American Optometric Association (AOA) and the American Academy of Ophthalmology (AAO). Research shows that applying this rule for 2 weeks significantly reduces digital eye strain symptoms.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F55dwdk7nls9u3uiwicvp.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F55dwdk7nls9u3uiwicvp.webp" alt="20-20-20 Rule" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I happen to have a view of the Mississauga skyline from my place, so every 20 minutes I look out at the open landscape for 20 seconds. Having a distant view to rest your eyes on makes practicing this rule much easier than trying to focus on a wall or nearby objects.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fshy8zrp2w6qp19uyphnz.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fshy8zrp2w6qp19uyphnz.webp" alt="View from the window" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Batch Prompting
&lt;/h3&gt;

&lt;p&gt;Instead of continuously micro-directing, give broad guidelines once, let AI draft the solution, then review the results in batches. This reduces the number of brain transitions. For example, tools like oh-my-claudecode's autopilot or ralplan's autonomous execution modes let you review outputs without directing every step.&lt;/p&gt;

&lt;h3&gt;
  
  
  Intentional Downtime
&lt;/h3&gt;

&lt;p&gt;After 50 minutes of focus, you need 10 minutes away from screens entirely. This allows your brain's Default Mode Network (DMN) to activate, consolidating and organizing information—a completely different brain activity from continuously reading and judging AI outputs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Posture and Environment Check
&lt;/h3&gt;

&lt;p&gt;An easily overlooked aspect. When concentrating on AI conversations, you may unconsciously tense your neck and shoulders, leading to tension headaches. Simply positioning your monitor at eye level and maintaining at least 63cm (arm's length) from the screen makes a noticeable difference.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Key: Not "Use Less" but "Use Differently"
&lt;/h2&gt;

&lt;p&gt;The solution to AI fatigue isn't to use AI less. The key is using it with boundaries, intention, and awareness that you're not a machine.&lt;/p&gt;

&lt;p&gt;Acknowledging that productivity gains come with increased cognitive costs, and managing those costs, has become the new essential skill for developers in the AI era.&lt;/p&gt;

&lt;h2&gt;
  
  
  Refs
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Siddhant Khare, "AI fatigue is real and nobody talks about it" (2025) — &lt;a href="https://siddhantkhare.com/writing/ai-fatigue-is-real" rel="noopener noreferrer"&gt;siddhantkhare.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;WarpedVisions, "The hidden cost of AI-assisted development: cognitive fatigue" (2025) — &lt;a href="https://warpedvisions.org/blog/2025/hitting-the-wall-at-ai-speed/" rel="noopener noreferrer"&gt;warpedvisions.org&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;ScienceDirect, "Fatigued by uncertainties: Exploring the cognitive and emotional costs of generative AI usage" (2025) — &lt;a href="https://www.sciencedirect.com/science/article/abs/pii/S0268401225001422" rel="noopener noreferrer"&gt;sciencedirect.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;MDPI, "Generative AI and Cognitive Challenges in Research" (2025) — &lt;a href="https://www.mdpi.com/2227-7080/13/11/486" rel="noopener noreferrer"&gt;mdpi.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Human Clarity Institute, "Cognitive Load, Fatigue &amp;amp; Decision Offloading 2025 Data Summary" — &lt;a href="https://humanclarityinstitute.com/data/ai-fatigue-decision-2025/" rel="noopener noreferrer"&gt;humanclarityinstitute.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Healthline, "20-20-20 Rule: Does It Help Prevent Digital Eyestrain?" (2025) — &lt;a href="https://www.healthline.com/health/eye-health/20-20-20-rule" rel="noopener noreferrer"&gt;healthline.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;ScienceDirect, "The effects of breaks on digital eye strain, dry eye and binocular vision: Testing the 20-20-20 rule" (2022) — &lt;a href="https://www.sciencedirect.com/science/article/pii/S1367048422001990" rel="noopener noreferrer"&gt;sciencedirect.com&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>health</category>
      <category>devrel</category>
    </item>
  </channel>
</rss>
