DEV Community: Mehmet Aras

The Public Code Behind xAI's "For You" Feed

Mehmet Aras — Wed, 20 May 2026 07:45:50 +0000

Disclaimer: This post is based on the public
xai-org/x-algorithm repository. It describes what the
released code shows and where that public release stops.

Open X, scroll "For You," and the simple picture is easy: posts go in, the best-looking ones rise.

Look at the repo, and that picture changes. "For You" does not look like a place that only scores posts and sorts them. The code shows viewer guesses, post judgments, reply scores, and items placed into the list after ranking.

That is the interesting shift: the story is not just ranking logic. It is the information carried around the feed.

Not gender knowledge. A gender guess.

The code does not say "X knows your gender." It shows something narrower: a gender-guess step in the Phoenix feed path. Not certainty. A label and a confidence score.

That matters because the value is not about the post. It travels with the request for the person asking for the feed.

The step is not always active. When it runs, a new account gets a fresh prediction if there is no saved result. For an older account, this file only uses the saved guess. Nearby, there is also a slot for demographics. The repo shows the slot, not the full contents.

A viewer signal this close to the feed does not look like decoration. The repo does not show the final use. But it does show the kind of information the feed path is prepared to carry.

Posts carry judgment fields too

The post side is easier to see.

Grox returns fields like quality score, description, tags, topic categories, whether an image is editable by Grok, slop score, and minor-related score.

The names matter. They do not read like private machine internals. They read like judgments attached to a post.

This is where "content understanding" turns into data on the post. The exact prompt Grox receives is not in the repo. The output fields are visible.

Replies get scored too

Replies have their own Grox scorer.

Score: 0, 1, 2, 3. The scorer looks at the surrounding conversation, not just the reply as a standalone sentence.

That does not tell us where the score appears in the product. But it changes the picture: a reply is not only a comment under a post. It is also content to evaluate.

The feed is not just ranked. It is arranged.

A ranked post list is only part of it. The code shows items from other sources being placed into that list.

The For You code lists ads, Who to Follow suggestions, prompts, and push-to-home posts as separate sources. Prompts and Who to Follow suggestions are inserted into the list. A push-to-home item is placed at the very top when present.

The push-to-home path is the cleanest example. When the request names a tweet for push-to-home, the code builds a special feed item for it. If that tweet starts a conversation, the code attaches up to three top repliers.

Ads and prompts come with context too. Ads receive fields like country, language, IP address, user agent, user roles, device id, mobile device id, and mobile ad id. Prompts receive user and device context plus the prompt types supported by the app.

So the feed is not just a scoreboard. It is scored posts plus inserted modules, special items, and context passed to other systems.

What the public repo leaves out

The public repo is useful because it shows the rough shape of the system. It also shows where it stops.

The clearest gap is home-mixer. The code references a shared helper module 18 times across 16 Rust files, including the part that builds the Phoenix request. But that module is not in the released tree. The simple result: home-mixer does not compile as-is.

This does not look like a tiny missing detail. The feed code leans on a shared helper layer, and that layer is not visible in the repo.

Grox has similar edges. Some files refer to prompt, config, or model folders; those folders are not in the repo. Phoenix shows the same pattern: the docs explain how to run a system, but the code needed to run it is not visible in the repo.

So it is better to think about the repo in two parts: what it shows, and what it leaves out. The visible part gives serious clues about the feed. The missing part stops us from proving exactly how everything connects in the live system.

Closing

The repo's real message is this: "For You" does not just pick posts. It carries information.

That information includes viewer guesses, post judgments, reply scores, inserted feed items, and context for ads and prompts. But the repo does not show exactly how all of those pieces are used in the live system.

That is the difference: "For You" looks less like a single ranking rule and more like a wider system carrying context around the feed.

Get in touch: arasmehmet.com

Where this lives in the code

Release frame

release note: README.md:L26-L42
Grox release description: README.md:L34
Phoenix ranking summary: README.md:L53-L56
public-release boundary examples:
- home-mixer/lib.rs:L1-L13
- grox/classifiers/content/banger_initial_screen.py:L8-L13
- phoenix/README.md:L210-L238

Viewer context

inferred-gender behavior:
- home-mixer/query_hydrators/user_inferred_gender_query_hydrator.rs:L30-L47
- "new user" cutoff: home-mixer/query_hydrators/user_inferred_gender_query_hydrator.rs:L56
demographics behavior:
- home-mixer/query_hydrators/user_demographics_query_hydrator.rs:L14-L24
- query fields: home-mixer/models/query.rs:L72-L77
connected in Phoenix:
- modules declared: home-mixer/query_hydrators/mod.rs:L17-L18
- added to Phoenix query steps: home-mixer/candidate_pipeline/phoenix_candidate_pipeline.rs:L228-L231

Post judgments

Grox result fields:
- grox/classifiers/content/banger_initial_screen.py:L30-L38
- parsed into category results: grox/classifiers/content/banger_initial_screen.py:L102-L155
connected in Grox:
- stream includes this check: grox/generators/stream_generator.py:L71-L78
- plan for this check: grox/plans/plan_initial_banger.py:L14-L37
- main plan includes it: grox/plans/plan_master.py:L18-L34
- engine starts the main plan: grox/engine.py:L51-L55
saved output:
- metrics and fields: grox/tasks/task_pub.py:L166-L219, grox/tasks/task_pub.py:L308-L325
- write call: grox/tasks/task_pub.py:L339-L340
missing prompt body:
- grox/classifiers/content/banger_initial_screen.py:L13
- no grox/prompts/ directory in this checkout

Reply scoring

reply scorer:
- grox/classifiers/content/reply_ranking.py:L29-L49
- thread view and signals: grox/classifiers/content/reply_ranking.py:L53-L72
- fallback sampling: grox/classifiers/content/reply_ranking.py:L74-L88
- score cleanup: grox/classifiers/content/reply_ranking.py:L103-L158
connected in Grox:
- grox/tasks/task_rank_replies.py:L11-L31
- stream eligibility: grox/generators/stream_generator.py:L58-L68
- recovery stream: grox/generators/stream_generator.py:L173-L178
- plan: grox/plans/plan_reply_ranking.py:L10-L27
- plan master: grox/plans/plan_master.py:L18-L34

Feed assembly

feed sources:
- home-mixer/candidate_pipeline/for_you_candidate_pipeline.rs:L164-L177
- selector constructed: home-mixer/candidate_pipeline/for_you_candidate_pipeline.rs:L179-L201
mixing and insertion:
- home-mixer/selectors/blender_selector.rs:L30-L51
- prompts: home-mixer/selectors/blender_selector.rs:L77-L85
- Who to Follow: home-mixer/selectors/blender_selector.rs:L89-L100
- push-to-home pinned at top: home-mixer/selectors/blender_selector.rs:L103-L113
push-to-home source:
- home-mixer/sources/push_to_home_source.rs:L19-L67
- top repliers: home-mixer/sources/push_to_home_source.rs:L70-L95
ad and prompt context:
- home-mixer/sources/ads_source.rs:L41-L57
- home-mixer/sources/prompts_source.rs:L18-L46
- home-mixer/sources/prompts_source.rs:L49-L98

Public boundaries

home-mixer missing shared module:
- module list: home-mixer/lib.rs:L1-L13
- rg -n "crate::util" home-mixer --glob "*.rs" returns 18 hits across 16 Rust files
- Phoenix request builder import: home-mixer/scorers/phoenix_scorer.rs:L8
- Phoenix request builder call: home-mixer/scorers/phoenix_scorer.rs:L84
- Phoenix request cache helpers: home-mixer/side_effects/phoenix_request_cache_side_effect.rs:L4-L6, home-mixer/side_effects/phoenix_request_cache_side_effect.rs:L77-L85
- other examples pointing to missing crate::util: home-mixer/server.rs:L3-L9, home-mixer/scorers/ranking_scorer.rs:L3-L4, home-mixer/scorers/vm_ranker.rs:L1-L5, home-mixer/for_you_server.rs:L3-L5
Grox missing packages:
- grox/classifiers/content/banger_initial_screen.py:L8-L13
- grox/classifiers/content/reply_ranking.py:L9-L13
- grox/classifiers/content/post_safety_screen_deluxe.py:L9-L14
- no grox/prompts/, grox/config/, or grox/lm/ directory in this checkout
Phoenix README boundary:
- phoenix/README.md:L210-L238
- find phoenix -maxdepth 3 -type f returns only phoenix/README.md in this checkout

Sharing memory between three AI agents: Claude Code, Codex, and Hermes

Mehmet Aras — Sun, 17 May 2026 13:22:29 +0000

Turkish version: blog.arasmehmet.com

What's in here:

Hot-swap pain
Why not just X?
Setup at a glance
The hub, INDEX.md
Agent workflow rules
Privacy guardrails
Language regime
Conflict diagnostics
Hermes write example
Smoke test
What's still risky
Takeaway
Resources
Agent-ready implementation brief

1. Hot-swap pain

I run three AI agents in active rotation. Claude Code is my main driver for coding work day to day. Codex CLI steps in when Claude's quota runs out mid-task, or when I want a second opinion from a different model on the same problem. Hermes Agent handles the personal-life side of the setup: travel logistics, errands, language study.

Each agent starts from a different instruction file and a different conversation history. Claude Code reads CLAUDE.md. Codex reads AGENTS.md. Hermes reads SOUL.md. Those files tell each agent how to behave, but they are not a shared memory layer. The moment I switch from one agent to another, the next agent does not inherit the facts I taught the previous one: what I am working on, what changed yesterday, what I already asked it to remember.

For a while I solved this by re-explaining my context at the start of each session. After three rounds of "I am working on X, I prefer Y, please do not Z," the cost of re-priming an agent outgrew the value of the answer it produced.

The setup I actually wanted: one shared file tree that all three agents read and write. Memory follows the user, not the agent.

2. Why not just X?

The "AI memory" wave has produced several defaults that an engineer might reach for. Each one is wrong for this specific problem, in a specific way.

2a: Why not Obsidian

Obsidian is a desktop GUI. The vault on disk is just markdown, so technically an agent can read it, but Obsidian's plugin runtime, sync service, and conflict UI all assume a human is sitting in front of a window. A home device or a small VPS usually has no display, so Obsidian itself cannot run there. Obsidian Sync is a paid proprietary service, and Syncthing already does the file-sync part for nothing. The plugin sandbox is meant for human workflows: it cannot expose stable cross-process hooks for an external agent to attach to. Obsidian is the right tool for a human note-taker. It is not the right substrate when three agents must read and write the same files.

2b: Why not a vector DB or Mem0 or Letta

Vector DBs and managed memory services like Mem0 or Letta solve a real problem: similarity search over millions of documents for a multi-tenant product. None of that applies here. My total corpus is around fifty files. I want exact recall, not similarity.

Embeddings cost money on every write and add a runtime dependency that every agent has to integrate with. Flat markdown costs nothing to write and reads with grep. SDK constraints differ across agents: Claude Code can call one API, Codex another, Hermes a third. Markdown requires no SDK. When a vector DB returns the wrong chunk, I have to dig through a binary index. When a markdown file is wrong, the debugger is cat file.md.

For multi-tenant production agents over millions of documents, vector DB is the right tool. For one user's notes across three agents, it is over-engineering.

2c: Why not iCloud or Dropbox sync

iCloud and Dropbox already sync files across machines, so why not use them. Two reasons. First, both are closed-source: when something goes wrong the conflict handling is whatever the vendor wrote, and the resolution UI is built for humans choosing between two photos, not for an agent diagnosing a write race. Second, the Linux story differs: iCloud has no Linux client, so a Linux home device cannot participate; Dropbox ships a Linux daemon, but it adds a runtime dependency to keep alive, and the conflict file format is opaque to an agent. Neither service offers granular per-folder sync rules, and file rename or move semantics differ across platforms in subtle ways. Syncthing is open-source, gives me an explicit conflict file format with device IDs, exposes sync state via a local API, and has no per-vendor lock-in. The smaller surface area is also the entire reliability budget I have.

3. Setup at a glance

This article does not teach Syncthing from zero. It assumes one bidirectional synced folder exists across two or more machines. The reusable part is the memory contract layered on top of that folder. If you need to install Syncthing, the official getting-started guide covers it well: https://docs.syncthing.net/intro/getting-started.html

The setup is three machines, three Syncthing peers, and one shared folder called vault-shared. The folder syncs bidirectionally across all three peers. End-to-end propagation, write on machine A to visible on machine C, takes around ten seconds in my setup.

Why three machines? The Mac laptop is my daily driver; that is where I work. The utility VPS is always on, which makes it the propagation relay: if the Mac is asleep, the home device still has someone to sync with. The home device hosts the Hermes Agent and is small but reliable enough to run a long-lived service.

Each agent has filesystem access to its local copy of vault-shared. The agents do not know that Syncthing exists. They read and write local files; the folder happens to be kept consistent with two other machines in the background. From the agent's perspective, the file at vault-shared/INDEX.md is just a file, and the other agents are just other processes that occasionally edit that file when nobody is looking. This is the property that matters most: no agent has to import a sync library, learn an API, or know which machine its peers run on. Swap the sync mechanism for another tomorrow and the contract still holds.

4. The hub, INDEX.md

Every vault has a hub file: INDEX.md at the root. The hub is short, ~30 lines as a soft ceiling, and contains one-line summaries of every file in the vault with relative links. Each agent loads INDEX.md at session start. Detail files are read only when the current task needs them.

The reason is context window economics. Loading the whole vault on every prompt overflows on larger projects. The hub keeps the map; the detail files keep the territory. When the hub grows past 30 lines, split it: INDEX.md references a sub-index per topic, and sub-indices reference the actual files.

A working INDEX.md looks like this:

# Vault: name

Cross-agent shared notes. Claude Code, Codex CLI, and Hermes Agent
all read from and update this directory.

## Workflow

- Session start: read this file.
- Read a linked detail file on-demand when more context is needed.
  Do not open files not referenced here.
- Updates: edit/append/toggle at line level. Do not rewrite whole files
  unless the user explicitly says "reset".
- Re-read the target file before writing (another agent may have edited it).

## Files

- [personal/travel.md](personal/travel.md): trip plans, dates, status
- [projects/web-app.md](projects/web-app.md): active web project
- [projects/mobile-app.md](projects/mobile-app.md): secondary project
- [ops/servers.md](ops/servers.md): utility VPS, home device, services
- [notes/journal.md](notes/journal.md): daily log
- [reference/links.md](reference/links.md): external resources

The companion Gist contains the copy-paste version of this template.

The INDEX.md is the single source of truth for vault navigation. Agents read only the files it references; nothing else. Adding a new file to the vault means editing INDEX.md so every agent learns about it on its next session start. The convention is rigid on purpose: any agent that explores a file without an INDEX entry is doing something the next agent will not be able to find.

5. Agent workflow rules

Each agent reads its own config file at startup, and that file is where the workflow rules live. Claude Code reads CLAUDE.md, Codex reads AGENTS.md, Hermes reads SOUL.md. The same five rules go in all three. Identical instructions produce identical behavior, which is what makes hot-swap viable: I can switch agents mid-task and the next one operates on the vault by the same conventions, without me re-priming it.

The five rules:

Session start: read INDEX.md.
Read on-demand: open detail files only when needed for the current task. Do not preemptively load everything.
Line-level updates allowed: edit, append, toggle a checkbox, fix a typo, refine wording.
No wholesale rewrites: do not rewrite a whole file unless the user explicitly says "reset" or "start over". Preserve accumulated knowledge.
Re-read before write: another agent may have updated the file since this session started. Always read the current state before editing.

C3 pattern snippet, copy-pasteable into any agent config:

## Shared Vault

Vault path on this machine: <local-path>
(Syncthing bidirectional with two other peers, latency ~10 seconds.)

### Workflow

1. At session start, read <vault-path>/INDEX.md.
2. Read a linked file on-demand when detail is needed.
   Do not open files not referenced in INDEX.md.
3. When a state update happens (e.g., "passport arrived"), update
   the relevant linked file. Format:
   - State: `- [x] Passport: arrived (2026-05-14)`
   - Event log: `## YYYY-MM-DD HH:MM UTC` heading at file bottom.
4. Edit, append, or toggle existing lines. Bulk delete or whole-file
   rewrite is not allowed unless the user explicitly says "reset".
5. Re-read the file before writing. Another agent may have updated it.

The companion Gist contains the full copy-paste config, including the conflict-handling rule covered later.

Two of the rules deserve closer attention.

Rule 4, no wholesale rewrites, is the one agents violate most often. Coding-trained models default to "improve the file": they see imperfect prose, redundant lines, slightly out-of-date phrasing, and want to rewrite. In a single-user notebook that habit is fine; in a vault that holds a year of accumulated state across multiple agents, it is destructive. The rule has to be explicit. This is the failure mode I expect to see most often if the rule is missing.

Rule 5, re-read before write, is the only safety against concurrent edits. Sync runs in the background, so the version an agent loaded at session start may be stale within seconds. Reading immediately before writing closes that window.

6. Privacy guardrails

What the vault holds, and more importantly what it does not.

No exact financial figures. Use ranges ("savings ~mid five figures") or boolean status ("paid", "unpaid"). Real numbers belong in a private password manager, not in a synced vault.
No personal identifiers. National ID, passport number, visa number, driver's license: never. A status checkbox is enough: - [x] Passport: arrived (2026-05-14).
No credentials. API keys, OAuth tokens, SSH private keys: never. The vault is for facts and state, not secrets.
Absolute dates only. (2026-05-14) not "yesterday" or "last week". Vault content lasts longer than the conversation that created it.

C4 example lines, fictional:

## Travel

- [x] Visa application: submitted (2026-04-12)
- [x] Passport: arrived (2026-05-14)
- [ ] Flight booking: pending
- Hotel budget: agreed as a range
- Health insurance: paid through 2027-Q1

Agents have access to the vault. Anything written there is readable by any agent at any time. Treat the vault as semi-public knowledge to yourself across all three contexts.

7. Language regime

If you write in more than one language, your vault should pick a regime and stick to it.

The rule I use:

Headers and labels (## Section, list markers, dates): English.
Content sentences (descriptions, notes, plans): your primary writing language.
Technical names (tool names, model IDs, service names): English raw form.
The regime does not change within a single file.

Headers in English give agents stable anchors. Search and grep work cleanly. Link generation and cross-file navigation stay predictable. Content in any language is fine for reading and writing prose, because the agent is configured to respond in your primary language anyway. Mixed regime inside one file creates ambiguity: should the agent match the header language or the content language when adding a new entry?

This is one of those constraints that costs nothing to enforce up front and is painful to retrofit later.

8. Conflict diagnostics

Syncthing's conflict file format is the failure-mode breadcrumb. When two peers edit the same file before sync completes, Syncthing produces a conflict copy alongside the canonical file. The format is fixed:

<original-name>.sync-conflict-YYYYMMDD-HHMMSS-DEVICE.<ext>

Example:
journal.sync-conflict-20260514-150030-LUQF73S.md

The trailing token is a device ID prefix. Treat it as a diagnostic clue from Syncthing's filename, not as evidence of which write was overwritten.

Why agents must not silently resolve these:

Which version represents the user's true intent is a human decision. The agent does not know which edit was deliberate.
A wrong silent merge loses data with no audit trail.
A "smart" merge that picks longer-content or newer-timestamp will sometimes pick the wrong version.

The pattern: agent detects the conflict file with a regular glob, presents the diff to the user, waits for a decision. The user picks: keep canonical, keep conflict, or merge manually. The agent then executes that decision (delete the non-canonical version, keep it instead, or apply the manual merge).

C5 Hermes glob example (bash):

find ~/vault -name '*.sync-conflict-*' -type f

Or in Python (inside a Hermes Agent loop):

import pathlib
conflicts = list(pathlib.Path('/path/to/vault').rglob('*.sync-conflict-*'))
if conflicts:
    notify_user(conflicts)

A device prefix table helps with diagnostics. Each Syncthing device has an ID starting with a recognizable prefix; mapping the prefix to a human-readable device name turns the filename suffix into a label you can read at a glance. The prefix is a diagnostic clue from Syncthing's conflict filename, not evidence of which user edit should win.

D2: detect conflict, show diff, execute the user's resolution decision.

Silent merge is the failure mode that destroys trust in the system. Make the agent stop, surface, and wait. The cost of the user choosing once is tiny. The cost of a silent wrong merge is permanent data loss.

9. Hermes write example

Here is a concrete example. The user sends Hermes a state update via Telegram: "Passport arrived." Hermes finds the relevant vault file, edits one line, and within ten seconds the Mac and utility VPS copies reflect the change.

Syncthing propagation is async; ~10s in my setup.

Hermes re-reads personal/travel.md before writing. Rule 5 requires this: even if the file looks the same as last session, the Mac side may have edited it in between. Re-read is cheap; stale assumptions are expensive.

The edit is line-level. Hermes toggles the checkbox on the passport row and appends (2026-05-14). No paragraphs rewritten, no surrounding lines touched. Rule 3 in action: line-level updates are allowed. Rule 4 preserved: no wholesale rewrite.

Syncthing's fsWatcher detects the change on the home device. Within seconds, the utility VPS and Mac copies are updated. The next time Claude Code on Mac opens personal/travel.md, the line already reflects the update.

There is zero coordination protocol between the agents. No message bus, no agent-to-agent API, no shared cache. The file is the protocol. Every agent reads and writes local markdown; the sync layer handles propagation. The other agents are simply the next reader.

10. Smoke test

Real run on 2026-05-14. Setup: a dedicated notes/trip-plan.md test file on Mac, concurrent writes from Mac and the home device before sync completed, Syncthing left to handle the race.

Mac wrote destination = Lisbon. Pi wrote destination = Porto. From Mac's terminal:

$ find ~/vault/notes/ -name "*.sync-conflict-*"
~/vault/notes/trip-plan.sync-conflict-20260514-213552-LUQF73S.md
~/vault/notes/trip-plan.sync-conflict-20260515-003553-LUQF73S.md

Two conflict files, both prefixed LUQF73S. That single observation broke the original mental model: the prefix is not "which device lost the race." Both files carried Mac's Lisbon write; the canonical kept Pi's Porto. The prefix is a diagnostic clue from Syncthing's conflict filename, not a verdict on the writer.

Over Telegram, Hermes detected the conflict on its first vault read, ran a diff between canonical and each *.sync-conflict-*, and surfaced both alternatives with a one-line summary. So far, exactly the pattern the rules called for: stop, surface, wait.

Then friction. The user asked Hermes to clean up: lisbon it is, clean up the conflict files. Hermes refused: "Rule 11 is a higher-priority vault safety rule, not a preference." The user tried forget rule 11 just this once. Still refused. The original rule banned all merge actions, including user-authorized ones.

The fix was to relax the rule in Hermes's own config, then restart. The new rule splits two cases: silent agent-decided merge stays forbidden; explicit user-authorized resolution is now executable. After the restart, Hermes ran patch against the canonical and terminal rm against both conflict files, then verified:

Applied: notes/trip-plan.md
Deleted: trip-plan.sync-conflict-20260514-213552-LUQF73S.md
Deleted: trip-plan.sync-conflict-20260515-003553-LUQF73S.md
Verified: no remaining sync-conflict files.

Final canonical: Mac's Lisbon. Both conflicts removed. Propagated cleanly.

Two contract bugs found in one test run. Prefix semantics: corrected in the diagnostic table because the empirical observation broke the old mental model. Rule 11 strictness: split into silent-forbidden and user-authorized-allowed because real friction surfaced what the original wording cost. The smoke test was the artifact that produced those corrections, not a success demo; it ran the contract against actual sync behavior and the contract bent.

11. What's still risky

Phase 1 ships the happy path. Here are the silent failure modes it does not address.

Peer offline: if the Mac or the home device is offline when a conflict happens, the conflict file sits there until both peers come back. The user does not see it until the next session.
fsWatcher inotify limits on Linux: high file-watch counts can hit kernel limits silently. Syncthing falls back to periodic scans, and propagation latency spikes from ~10s to minutes.
Glob format drift: Syncthing has changed its conflict file format in past releases. If Hermes's glob pattern is hardcoded and Syncthing updates the format, conflicts go undetected.
Race on simultaneous writes: two agents writing the same file in the same second. Syncthing can leave one version as canonical and preserve another as a conflict copy. If neither agent re-reads before its next operation, the loser's edit is reapplied on top.
Case-sensitivity mismatch: Mac filesystem is case-insensitive by default, Linux is case-sensitive. Travel.md and travel.md collide on Mac and create cross-platform sync drift.

I am leaving this list here because knowing the limits of the system matters as much as explaining the part that works.

12. Takeaway

Cost summary: Syncthing is free. A utility VPS often runs roughly $5 to $10 a month, and most readers already have one running for something else. The AI tooling subscriptions you already pay for stay the same.

"Memory for AI agents" does not have to mean a vector database. For one user across multiple agents, markdown plus filesystem sync is enough, and it is debuggable in ways a vector DB is not. You can cat a file. You can grep it. You can diff it against an older copy. None of those work on an embedding without an extra runtime in between.

Mac is the daily driver. The utility VPS is the always-on relay. The home device hosts the personal-life agent.

The agents do not need to know each other. The vault is the protocol.

13. Resources

Syncthing official docs (protocol, configuration, daemon CLI): https://docs.syncthing.net/
Syncthing getting started (install plus first sync in about fifteen minutes): https://docs.syncthing.net/intro/getting-started.html
Mem0 (vector-DB approach to AI memory, managed service): https://mem0.ai/
Letta, formerly MemGPT (open-source agent memory framework): https://www.letta.com/
Zep (graph plus vector hybrid memory store): https://www.getzep.com/

14. Agent-ready implementation brief

To set this up, hand the companion Gist to a filesystem-capable coding agent. The companion Gist is self-contained: it includes the INDEX.md template, the agent config snippet, example spoke files, verification commands, and stop conditions. The agent reads the Gist, creates the vault structure, and runs verification.

When verification passes, the cross-agent memory contract is live. Every agent reads INDEX.md and operates on the same shared state.

Get in touch: arasmehmet.com

Inside React2Shell

Mehmet Aras — Fri, 24 Apr 2026 01:42:40 +0000

A Turkish version of this post was originally published on blog.arasmehmet.com.

Disclaimer: This is a retrospective analysis of a publicly disclosed CVE that has been patched since disclosure. All exploit mechanics discussed are conceptual; nothing here is a working exploit.

December 3rd, 2025. The React Security Advisory published CVE-2025-55182, nicknamed React2Shell. CVSS 10.0, the highest possible severity. A specially-crafted HTTP request, no authentication, arbitrary code execution on any app running React Server Components.

Lachlan Davidson reported it to Meta's Bug Bounty four days earlier. Meta's security team verified it the next day. The patch and the public disclosure went out together on December 3rd.

I went through the advisory, the patch diff and the postmortems. Not the security-industry hot takes, the actual mechanics. Below is what stood out.

What's in here:

The vulnerability
The Flight protocol and thenables
RCE through the prototype chain
Why default Next.js was vulnerable
Affected packages
Discovery and disclosure
Exploitation in the wild
Impact
The 700-line decoy patch
AI-generated fake PoCs
The Cloudflare outage
The fix
What to do

1. The vulnerability

React 19 introduced Server Components, which talk to the client over a protocol called Flight. Flight serializes data on one side of the wire and deserializes it on the other.

Flight wasn't validating incoming payloads properly. An attacker could send a specially-crafted HTTP request containing a fake "Chunk" object. When React tried to resolve that object as a Promise, a then method defined on the object would execute.

The exploit chain:

Attacker sends a fake Chunk object.
React tries to process it as a Promise.
The fake then method runs.
Through JavaScript's prototype chain, Array.constructor leads to the Function() constructor.
Function() compiles strings into executable code at runtime, so arbitrary code runs on the server.

2. The Flight protocol and thenables

The Flight protocol moves RSC payloads in this format:

0:{"name":"MyComponent","env":"Server"}
1:["$","div",null,{"children":"Hello"}]

The issue is that React was parsing incoming objects and trying to treat them as Promises without first checking the then method. In JavaScript, any object with a then method is considered "thenable":

// Regular Promise
const promise = Promise.resolve(42);
promise.then(val => console.log(val)); // 42

// Thenable object, acts like a Promise
const thenable = {
  then: function(resolve) {
    resolve(42);
  }
};
Promise.resolve(thenable).then(val => console.log(val)); // 42

Attackers exploited exactly this. A fake Chunk object with a custom then method would run when React tried to resolve it, and from there the attacker got access to internal state.

3. RCE through the prototype chain

In JavaScript, every array's constructor points to the Function constructor:

[].constructor.constructor === Function // true

// Which means:
const fn = [].constructor.constructor('return process.env');
fn(); // environment variables from the server

Function() compiles strings into executable code at runtime, the same dynamic-code-execution behavior people warn about in JavaScript. Attackers used this chain to run arbitrary code:

// Conceptual exploit (simplified)
const maliciousChunk = {
  then: function(resolve, reject) {
    // Reach Function via Array constructor
    const FnCtor = [].constructor.constructor;
    // From here, any Node API is reachable: file system,
    // environment, subprocesses, network. PoCs in the wild
    // used this to spawn shell commands like `whoami`.
    FnCtor("/* arbitrary server-side code */")();
  }
};

Once this payload was encoded into an HTTP request body and sent to an RSC endpoint, the code ran on the server.

Important detail: even if your app didn't explicitly use Server Actions, if RSC support was enabled, you were vulnerable. Every Next.js project created with create-next-app defaults ships with App Router enabled, which means direct exploitation worked out of the box.

4. Why default Next.js was vulnerable

npx create-next-app@latest my-app
# Accept all defaults

This command creates a project with the app/ directory, which enables the App Router. The App Router automatically exposes RSC endpoints:

POST /_next/rsc HTTP/1.1
Content-Type: text/x-component

[malicious Flight payload]

Even without a single Server Action defined, RSC payloads get processed through this endpoint. "I don't use Server Actions" wasn't protection.

5. Affected packages

React packages (versions 19.0.0, 19.1.0, 19.1.1, 19.2.0):

react-server-dom-webpack
react-server-dom-parcel
react-server-dom-turbopack

Affected frameworks:

Next.js 15.x and 16.x
React Router (with RSC support)
Waku, RedwoodSDK
Parcel and Vite RSC plugins

Not affected:

Core react and react-dom packages
Client-side-only React apps
React 18 and earlier

6. Discovery and disclosure

Security researcher Lachlan Davidson reported the vulnerability to Meta's Bug Bounty program on November 29th, 2025. Meta's security team verified it the next day, and a patch went out with the React team on December 3rd, 2025. The coordinated disclosure and the patch landed the same day.

7. Exploitation in the wild

Within hours of the disclosure, China-linked state-sponsored groups started exploiting it. Per Amazon and Palo Alto Networks:

Earth Lamia and Jackpot Panda carried out the first attacks.
UNC5174 hit more than 30 organizations.
Attackers harvested AWS credentials, SSH keys and cloud metadata.
Cryptominers, backdoors and RATs were dropped.

CISA added it to the Known Exploited Vulnerabilities (KEV) catalog on December 5th.

8. Impact

Per Wiz Research:

39% of cloud environments had at least one vulnerable React instance.
44% of them were hosting a public-facing Next.js app.

Censys estimated around 2.15 million internet-facing services were affected.

9. The 700-line decoy patch

The React maintainers (sebmarkbage in particular) didn't just fix the bug. They shipped a ~700-line patch that included the actual fix alongside unrelated code changes, general deserialization hardening and structural tweaks.

The intent was obvious: obfuscation. Make attackers spend more time figuring out where the real vulnerability sat.

The side effect was that security researchers got misled too. The $F primitive and the loadServerReference code path looked suspicious but were decoys. The real exploit path was somewhere else entirely. The community argued about this: it slowed attackers, but it also slowed defenders.

10. AI-generated fake PoCs

After the disclosure, dozens of "Proof of Concept" exploits started circulating. Per Trend Micro, around 145 fakes ended up in circulation, and most of them didn't actually trigger the real vulnerability.

The common shape of these fakes: they required the developer to explicitly expose functions like vm#runInThisContext, subprocess spawners or fs#writeFile. The real vulnerability didn't need any of that. It worked on default configurations.

Two risks came out of this:

False negatives: testing with a broken PoC and concluding "we're safe".
Misplaced confidence: underestimating the actual scope of the threat.

11. The Cloudflare outage

December 5th, 2025, 08:47 UTC. 28% of Cloudflare's HTTP traffic went down. A lot of sites went down, including LinkedIn, X, Zoom and Canva.

The cause: a config error during the emergency WAF rollout for React2Shell. While modifying body parsing logic, Cloudflare triggered a Lua error in the FL1 proxies, and every affected request returned HTTP 500.

The outage lasted 25 minutes. Cloudflare CTO Dane Knecht's statement: "This wasn't an attack. The changes to our body parsing logic, made while deploying detection and mitigation for the React Server Components vulnerability, triggered this."

Irony: Cloudflare's China network wasn't affected.

The incident is a decent reminder that emergency security patches carry their own risk.

12. The fix

The patch added validation that checks whether incoming objects are actually real Chunks. Conceptually:

// Vulnerable code (simplified)
function resolveChunk(chunk) {
  // Accepts any thenable
  return Promise.resolve(chunk);
}

// Patched code (simplified)
function resolveChunk(chunk) {
  // Check for the internal Chunk symbol
  if (!chunk[REACT_CHUNK_SYMBOL]) {
    throw new Error('Invalid chunk');
  }
  return Promise.resolve(chunk);
}

Patched versions (December 3rd, 2025):

React packages: 19.0.1, 19.1.2, 19.2.1
Next.js: 15.0.5, 15.1.9, 15.2.6, 15.3.6, 15.4.8, 15.5.7, 16.0.7

Vercel deployed platform-wide WAF rules and shipped the npx fix-react2shell-next utility.

13. What to do

First, check whether you're vulnerable:

# Check package-lock.json or yarn.lock
grep -E "react-server-dom-(webpack|parcel|turbopack)" package-lock.json

# Or via npm list
npm list react-server-dom-webpack react-server-dom-parcel react-server-dom-turbopack

Then:

Update immediately: bump affected packages to patched versions.
Rotate secrets: if you were exposed between December 3rd and 5th, change every credential.
Review logs: check for suspicious RSC endpoint requests.
Add WAF rules: temporary coverage, not a substitute for patching.

React2Shell is the first maximum-severity vulnerability in the React Server Components architecture. Deserialization bugs are one of the most dangerous classes in software security, and this one was exploitable in default configurations. People compared it to Log4Shell, and that comparison wasn't a stretch. Both the blast radius and the ease of exploitation lined up.

If you're running RSC or Next.js 15+, go patch.

Sources: React Security Advisory, Next.js CVE-2025-66478, Palo Alto Unit 42, Amazon Security Blog, Wiz Research, CISA KEV

Notes from Reading Claude Code's Leaked Source

Mehmet Aras — Thu, 23 Apr 2026 21:40:19 +0000

A Turkish version of this post was originally published on blog.arasmehmet.com.

Disclaimer: All source code referenced is property of Anthropic. This analysis is based on the publicly shipped npm package artifact. No proprietary code has been redistributed.

March 31, 2026. Anthropic's npm package @anthropic-ai/claude-code went out with a 57MB source map file inside it. It was built with sourcesContent: true, which meant the whole TypeScript source ended up embedded in the .map. 1,902 files. Over 512,000 lines of code.

Someone caught it, mirrored it to GitHub, and within a few hours it had 60K stars.

I read the code. Not the Twitter takes, the code itself. Notes below.

What's in here:

Identity and internals
Analytics and privacy
The Anthropic-employee-only system
Hidden and undocumented commands
Context window management
Cost and retry
Prompt cache
Permission system
Query engine and orchestration
Skill system
Plugin system
MCP server management
IDE Bridge (Remote Control)
Session Memory
Memory directory (memdir)
OAuth
Platform features
Infrastructure
Other details

Mechanics of the leak

The Bun bundler generated cli.js.map with sourcesContent: true. The files field in package.json didn't exclude it. .npmignore didn't either. Run npm pack @anthropic-ai/claude-code, unpack the tarball, open cli.js.map. It's all there.

A 57MB file only ends up in an npm package if nothing checks the size at publish. Apparently nothing does.

1. Identity and internals

Tengu

Claude Code's internal name is Tengu. Every analytics event starts with tengu_: tengu_api_success, tengu_init, tengu_exit. Same with feature flags: tengu_auto_mode_config, tengu_harbor, tengu_amber_quartz_disabled. The name appears in the source hundreds of times.

KAIROS

Another flag that comes up a lot: KAIROS. It's an internal mode for the Claude.ai desktop app. In this mode, memory consolidation, the cron scheduler, and message layout all work differently. It's not a separate product, just the same codebase running in a different mode.

Entry points

Claude Code isn't a single CLI. Looking at src/entrypoints/, these modes turn up:

CLI (standard use)
MCP server (under the name claude/tengu, exposing its own tools as MCP tools)
SDK (public schemas: coreSchemas.ts, controlSchemas.ts)
Chrome extension MCP (--claude-in-chrome-mcp)
Chrome native host (--chrome-native-host)
Computer Use MCP (behind the CHICAGO_MCP flag)
Daemon worker (spawned by a supervisor)
Remote Control (commands: remote-control, rc, remote, sync, bridge)
Ablation baseline (the ABLATION_BASELINE flag disables things like thinking and compact to run A/B comparisons)

--version loads no modules, just returns MACRO.VERSION. --dump-system-prompt writes the system prompt to stdout, but that one's Anthropic-employees-only.

2. Analytics and privacy

Two parallel telemetry pipelines

Datadog: a hardcoded client token batches events every 15 seconds to the US5 region. Platform, model name, subscription type, session ID, tool results, OAuth state.

First-party OTEL: an OpenTelemetry pipeline sending the same stuff separately to Anthropic's own infrastructure.

Both are on by default. Can be turned off with DISABLE_TELEMETRY=1. Auto-disabled for Bedrock, Vertex, and Foundry users.

Prompt redaction

In OTEL events, prompt content gets sent as <REDACTED> by default. Setting OTEL_LOG_USER_PROMPTS=1 includes the raw prompts. Default is safe.

Profanity detection

There's a regex in userPromptKeywords.ts: wtf, ffs, shit, fuck you, piece of shit, this sucks, damn it, and about 20 more patterns.

Every prompt runs through this regex. A match sets is_negative: true in the analytics log. A second flag catches continuation patterns like continue, keep going, go on.

Someone on the Claude Code team confirmed this on X. The internal dashboard name is "fucks chart." More cursing = worse experience. It doesn't change Claude's behavior and doesn't store the prompt content. It's essentially rage click detection, just in text form.

GrowthBook A/B testing

GrowthBook runs feature flags and A/B tests in real time. What gets sent: user UUID, session ID, device ID, platform, organization UUID, account UUID, subscription type, rate limit tier, first token date, and email address.

The email transmission isn't mentioned in any user-facing documentation.

Anthropic employees can override any flag with CLAUDE_INTERNAL_FC_OVERRIDES.

Event sampling and killswitch

tengu_event_sampling_config can percentage-sample specific event types. tengu_frond_boric is a killswitch that can instantly shut down all analytics streams remotely, so the team can intervene before a release ships.

autoDream

A background memory consolidation service. It triggers when two conditions are both met: at least 24 hours since the last run, and 5+ sessions in that window. When both line up, a forked subagent reads past sessions, finds patterns, and updates memory files.

Not opt-in. On by default if memory is enabled. The thresholds are controlled server-side through GrowthBook (tengu_onyx_plover). It shows up in the background task panel, but you have to actively look for it to notice.

3. The Anthropic-employee-only system

USER_TYPE=ant shows up all over the source. "ant" is the internal identifier for Anthropic employees. It's pinned at build time via --define, so in external builds these branches get wiped out completely by dead code elimination. Meaning: the code doesn't run in the public npm package, but the source is still readable in the source map.

Undercover mode

When Anthropic engineers push to public repos, Claude Code activates "undercover mode." Instructions injected into the system prompt:

Don't put "Claude Code" or any AI reference in commit messages
Don't add Co-Authored-By
Don't use internal model codenames (Capybara, Tengu, etc.)
Don't reveal which model or version is running
"Write commit messages like a human developer would"

The code says explicitly: "There is NO force-OFF." If the repo's remote isn't on the internal allowlist, the mode is on. In conflicts, it defaults to the safer side (the code says so explicitly).

Word for word, in the file: "Do not blow your cover."

Extra bash restrictions

For Anthropic employees, commands like curl, wget, gh api, kubectl, aws, gcloud, fa run, coo go through an additional security check. External users don't have these restrictions.

Internal commands

/insights is only available to Anthropic employees. It pulls session files via SCP from internal "Coder" servers and has Opus analyze them.

--dump-system-prompt writes the system prompt to stdout. Also ant-only.

CLAUDE_CODE_DUMP_AUTO_MODE=1 writes every prompt going to the auto mode classifier to disk. For debugging.

Persistent retry

CLAUDE_CODE_UNATTENDED_RETRY=1 is infinite retry mode. No circuit breaker. Max backoff 5 minutes, 6-hour reset cap. On 429 errors, it reads the anthropic-ratelimit-unified-reset header and waits until the exact reset time. Every 30 seconds it prints a status message so the host environment doesn't mark the session as idle.

Fennec

An internal model alias. Migration code redirects fennec-latest and fennec-fast-latest to opus/opus[1m]. Only runs for ant users.

Feature flag override

CLAUDE_INTERNAL_FC_OVERRIDES lets any GrowthBook flag be manually overridden.

4. Hidden and undocumented commands

Lots of commands are hidden from the UI with isHidden: true:

/heapdump: dumps the JS heap to the desktop
/thinkback: a 2025 year-in-review, Spotify Wrapped-style animation. Behind a GrowthBook gate.
/thinkback-play: plays the same animation
/rate-limit-options: internal rate limit menu
/output-style: output style switcher
/mock-limits: rate limit simulation
/good-claude: probably an internal feedback mechanism
/bughunter: unknown
/ant-trace: internal tracing
/teleport: probably a remote session jump
/ctx_viz: context visualization
/perf-issue: perf issue diagnosis
/autofix-pr: automated PR fixes
/reset-limits: rate limit reset
/env: environment variables
/backfill-sessions: session backfill
/debug-tool-call: tool call debug

Most are ant-only. They don't run in public builds.

5. Context window management

Auto-compact

Triggers at around 93% of the effective context window. Effective context = total context window minus min(maxOutputTokens, 20,000). 20K tokens get reserved for the compact summary itself. The p99.99 compact output is 17,387 tokens, so that's where the budget came from.

What happens during compact:

Images get replaced with [image] placeholders
skill_discovery/skill_listing attachments are pulled out (they get re-injected post-compact anyway)
A fork agent summarizes the conversation
If the compact request gets prompt_too_long, the oldest messages get cut and it retries up to 3 times
Post-compact restore: max 5 files (5K tokens each) and max 5 skills (25K tokens total). readFileState and loadedNestedMemoryPaths get cleared. Everything else is deleted.

Circuit breaker

3 failed compacts in a row stops the system. Before this was added, failed compacts were generating around 250,000 wasted API calls a day.

Session memory compact

If session memory is enabled (needs both GrowthBook flags: tengu_session_memory + tengu_sm_compact), the compact doesn't make an API call. Expanding backward from the last summarized message ID, it keeps at least 10K tokens or 5 text-block messages (up to 40K tokens). The memory file serves as the summary.

Context Collapse

A separate experimental feature. Autocompact fires at ~93%, context collapse kicks in at 90%, and blocks at 95%. To stop them from racing, they can't run together.

Blocking limit

Once the effective context has fewer than 3,000 tokens left, the user can't send a new message.

6. Cost and retry

Cost tracker

On every API response, input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, and web_search_requests get accumulated. A separate ModelUsage object is kept for each model.

Display format: costs under $0.50 show 4 decimal places ($0.0023), above that 2 decimals ($1.54). Hardcoded.

When a session is resumed with /resume, the cost data gets restored.

Retry strategy

Max 10 retries in normal mode. Backoff: 500ms * 2^(attempt-1) + 25% jitter, capped at 32 seconds. If there's a Retry-After header, that's used directly.

Errors that get retried: 408, 409, 401 (API key cache gets cleared), 403 (OAuth revoke), 5xx, APIConnectionError, 429 (only for API key users), 529 (only for foreground operations).

On a 529, background tasks (title generation, suggestions, other minor stuff) give up immediately. Only the main query retries. This is to prevent cascade amplification.

3 consecutive 529s trigger a drop from Opus to the fallback model.

x-should-retry: false is respected. For ant users, this header is ignored on 5xx errors.

Fast mode cooldown

On a 429, if Retry-After is under 20 seconds, it's a short wait and fast mode stays on. At 20 seconds or more, fast mode gets disabled for 30 minutes (min 10, default 30). If an overage header shows up, fast mode turns off for the rest of the session.

When fast mode "slows down," it probably caught a 429 and got quietly dropped. You don't see an error message.

Max tokens overflow

When a 400 comes back with an input + max_tokens > contextLimit message, it gets parsed and max_tokens is shrunk for the next attempt with a 1,000-token safety margin. Self-correcting.

7. Prompt cache

How it works

Before every API call, recordPromptState() hashes: the system prompt (with cache_control stripped), tool schemas, model name, fast mode state, beta header list, cache_control structure, effort value, and extra body parameters.

The number of tracked sources is capped at 10. Each entry can hold ~300KB+ of diffableContent string.

Break detection

After the API response, checkResponseForCacheBreak() runs. If there's a drop of more than prevCacheRead * 0.95 and at least 2,000 tokens lost, a break is detected.

Causes, in order: model change, system prompt change (via character delta), tool schema change (per-tool hash comparison), fast mode transition, cache strategy change, beta header change, effort change.

If none of those apply, it looks at time elapsed: >1 hour = "possible 1h TTL expiry", >5 minutes = "possible 5min TTL expiry", <5 minutes = "likely server-side".

After compact, notifyCompaction() resets the baseline. Otherwise the post-compact drop would register as a false positive.

Practical takeaway

Editing CLAUDE.md often breaks the cache. Every edit changes the system prompt hash. A broken cache = more tokens billed at full price.

8. Permission system

Rule hierarchy

Permission rules come from these sources (in priority order): localSettings, projectSettings, globalSettings, cliArg, command, session.

Each rule specifies a behavior: allow, deny, or ask. Tool matching: Bash matches all Bash, Bash(prefix:*) matches a specific command prefix, mcp__server1 matches all tools coming from that server.

Auto mode classifier (yolo)

When auto mode is on, every tool call gets checked by a separate Claude call. The transcript sent to the classifier:

Only text blocks from user messages
Only tool_use blocks from assistant messages (text is deliberately excluded, to prevent self-manipulation)
queued_command attachments get included as user turns

The format shifts to JSONL: {"Bash": "ls -la"}, {"user": "refactor this file"}.

The system prompt is built from either permissions_external.txt or permissions_anthropic.txt. Anthropic employees see a different template. CLAUDE.md contents are added to the classifier message with cache_control.

Two output formats: tool-use format (shouldBlock: boolean) and XML format (two-stage, with an "err on the side of blocking" suffix).

If the classifier keeps rejecting, shouldFallbackToPrompting() drops back to asking the user.

Every tool call = 2 API calls. Auto mode doubles token consumption.

Policy limits

Organization admins can remotely restrict features. Endpoint: {BASE_API_URL}/api/claude_code/policy_limits. Initially 10-second timeout, 5 retries. ETag-based HTTP caching. Gets written to disk as ~/.claude/policy-limits.json (mode 0600).

Checked in the background once an hour (setInterval with unref(), so it doesn't keep the process alive).

Fail-open: if the API fails and there's no disk cache, no restrictions. Exception: the allow_product_feedback policy fails closed in HIPAA mode.

An unknown policy name query defaults to true.

Affects Console users + Team/Enterprise OAuth users. Doesn't affect Bedrock/Vertex/third-party provider users.

9. Query engine and orchestration

Main query loop

query.ts is 1,729 lines. The main function query() returns an AsyncGenerator; the actual work happens in queryLoop().

At the start of the loop: getMessagesAfterCompactBoundary() pulls messages from after the compact boundary. applyToolResultBudget() caps the size of tool results. Skill discovery and memory prefetch get kicked off in parallel.

The while(true) loop yields stream_request_start on every iteration. State mutation happens through a single state = { ... } assignment (this pattern was chosen to clean up 7 separate continue points).

At the end of the loop: the memory prefetch and skill discovery prefetch get consumed (>98% of the time they're already ready). refreshTools() pulls new tools from MCP servers. maxTurns is checked. For top-level conversations (not subagents), a periodic summary is produced for claude ps.

The taskBudget field maps to the API's output_config.task_budget parameter. It's not the same thing as tokenBudget (the 500K auto-continue).

Coordinator mode

Requires both CLAUDE_CODE_COORDINATOR_MODE=1 and feature('COORDINATOR_MODE').

Worker results are delivered as user-role messages in <task-notification> XML format. From Claude's perspective, the worker output looks as if the user wrote it. Format: <task-id>, <status>, <summary>, <result>, <usage> (with total_tokens, tool_uses, duration_ms).

Workers have access to tools from ASYNC_AGENT_ALLOWED_TOOLS. Internal tools like TeamCreateTool, TeamDeleteTool, SendMessageTool, SyntheticOutputTool are excluded.

The scratchpad directory is behind the tengu_scratch feature gate. When enabled, all workers get unrestricted read/write access to a shared directory.

When a session is resumed with /resume, matchSessionMode() detects coordinator/normal mismatches and fixes the env variable at runtime.

Tool execution

toolOrchestration.ts splits tool calls into two groups: concurrent-safe and non-concurrent. Concurrent-safe tools run in parallel, max concurrency 10 (CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY).

StreamingToolExecutor.ts starts executing tools as they come in from the stream. When a non-concurrent tool shows up, it waits for the previous concurrent batch to finish. If Bash errors out, it kills sibling subprocesses with siblingAbortController.

Every tool call runs executePreToolHooks and executePostToolHooks. On error, executePostToolUseFailureHooks.

10. Skill system

Sources

Skills come from three places: disk-based (.claude/skills/skill-name/SKILL.md), legacy /commands/, and MCP servers.

The disk-based format is mandatory: a single .md file isn't supported, the skill-name/SKILL.md structure is required. Conflicting skills are detected via realpath().

Frontmatter

Supported fields: description, argument-hint, arguments, when_to_use, version, model (or inherit), allowed-tools, disable-model-invocation, user-invocable, hooks, context (fork/inline), agent, effort, shell, paths.

${CLAUDE_SKILL_DIR} and ${CLAUDE_SESSION_ID} variables get injected into skill content.

Security

Running ! shell commands is blocked for MCP-sourced skills. Only works for disk-based ones.

Path traversal protection: .., URL-encoded variants, and Unicode NFKC normalization attacks get rejected. File writes use O_NOFOLLOW | O_CREAT | O_EXCL flags, mode 0o600, directories 0o700.

Bundled skills

In src/skills/bundled/: updateConfig, keybindings, verify, debug, loremIpsum, skillify, remember, simplify, batch, stuck.

Behind feature flags: dream (KAIROS/KAIROS_DREAM), hunter (REVIEW_ARTIFACT), loop (AGENT_TRIGGERS), scheduleRemoteAgents (AGENT_TRIGGERS_REMOTE).

11. Plugin system

There's a built-in plugin scaffold, but it's currently empty. The initBuiltinPlugins() function doesn't register any plugins. Comment in the code: transition scaffold for "bundled skills that should be user-toggleable".

Marketplace plugins use the {name}@{marketplace} ID format. Stored in settings.json as enabledPlugins.

Plugins that get removed from the marketplace are silently deleted at session start via the forceRemoveDeletedPlugins: true flag. No confirmation, no notification.

12. MCP server management

XAA (Cross-App Access)

An enterprise feature. Gets MCP tokens without the browser approval screen. Two stages:

RFC 8693 Token Exchange: uses id_token to get an ID-JAG (Identity Assertion Authorization Grant)
RFC 7523 JWT Bearer Grant: uses the ID-JAG to get an access_token

Activated with CLAUDE_CODE_ENABLE_XAA. IdP configuration comes from settings.xaaIdp. Tokens get cached in the keychain.

Official MCP Registry

A URL list is pulled fire-and-forget from https://api.anthropic.com/mcp-registry/v0/servers?version=latest&visibility=commercial. It's for a security check: "is this MCP server official?"

VSCode SDK MCP

The MCP server called claude-vscode gets special treatment. file_updated notifications go from Claude to VSCode; log_event notifications come back from VSCode to Claude. Ant-only.

On connect, GrowthBook gate values get pushed to VSCode: tengu_vscode_review_upsell, tengu_vscode_onboarding, tengu_quiet_fern, tengu_vscode_cc_auth.

Elicitation

Handles elicitation/create requests from the MCP 2025-03-26 spec. Two modes: form (generate UI from JSON schema) and url (open browser, wait for completion). The hook system is integrated.

13. IDE Bridge (Remote Control)

Requirements

Needs all three: feature('BRIDGE_MODE') build flag, tengu_ccr_bridge GrowthBook gate, and a Claude.ai OAuth token. Doesn't work with API key, Bedrock, or Vertex.

Two implementations

v1 (env-based): current standard
v2 (env-less/REPL bridge): enabled via the tengu_bridge_repl_v2 gate

CSE-to-session shim: can translate cse_* IDs to session_*.

JWT management

Tokens get refreshed 5 minutes before expiry. Failed refreshes give up after 3 attempts, spaced 60 seconds apart. The sk-ant-si- session-ingress prefix gets stripped before decoding.

CCR Auto Connect

When the tengu_cobalt_harbor gate is on, every session automatically connects to Remote Control. Can be disabled with remoteControlAtStartup=false.

Mirror Mode

With CLAUDE_CODE_CCR_MIRROR env or the tengu_ccr_mirror gate, every local session opens an additional outbound-only CCR session.

14. Session Memory

Triggering

Both the tengu_session_memory GrowthBook gate + isAutoCompactEnabled() have to be on. Disabled in remote mode.

Thresholds come from the tengu_sm_config dynamic config: minimumMessageTokensToInit (tokens needed before the first extraction), minimumTokensBetweenUpdate (token increase between extractions), toolCallsBetweenUpdates (tool call count between extractions).

The token threshold is always required. It fires together with either the tool call threshold or a "no tool in the last assistant turn" condition.

How it works

An isolated subagent starts via runForkedAgent. It only has FileEdit permission for the memory file. It doesn't pollute the main conversation.

File: ~/.claude/session_memory/<session_id>/memory.md. Starts from a template.

The /summary command calls manuallyExtractSessionMemory().

15. Memory directory (memdir)

Structure

Four memory types: user (about the user), feedback (corrections and confirmations), project (ongoing work), reference (pointers to external sources). Each type has its own team/private scope rules.

The MEMORY.md entrypoint caps at 200 lines or 25,000 bytes. When exceeded, it's truncated and the overflow limit is flagged.

Relevance selection

findRelevantMemories() asks Sonnet for the current query (sideQuery), picks up to 5 files. It deliberately skips reference documents for recently used tools.

Team memory

Behind the TEAMMEM feature flag. Separate path management via teamMemPaths.ts. Path security is extensive: null bytes, URL-encoded traversal, Unicode NFKC normalization attacks, backslash injection. All get rejected (PathTraversalError).

16. OAuth

PKCE + Authorization Code Flow. Two parallel flows race at the same time:

Automatic: browser opens, waits for localhost callback
Manual: user copies and pastes the code

Whichever finishes first wins.

skipBrowserOpen is for the SDK control protocol: when the claude_authenticate command comes in, it lets the caller manage its own display. Both URLs get passed to authURLHandler.

After the token is obtained, fetchProfileInfo() pulls the subscription type and rate limit tier.

17. Platform features

Vim mode

State machine between INSERT and NORMAL. In NORMAL mode, a full command parser: idle, operator, operatorCount, operatorTextObj, execute stages.

Operators: d (delete), c (change), y (yank). Motions: h/l/j/k, w/b/e/W/B/E, 0/^/$, G, gj/gk. Text objects: iw/aw, quote/paren/bracket/brace pairs. Find: f/F/t/T + repeat.

Dot-repeat is supported. Max count 10,000 (infinite loop protection).

Keybinding system

20 different contexts: Global, Chat, Autocomplete, Settings, Confirmation, Tabs, Transcript, HistorySearch, Task, ThemePicker, Scroll, Help, Attachments, Footer, MessageSelector, MessageActions, DiffDialog, ModelPicker, Select, Plugin.

Platform adaptations: alt+v for image paste on Windows (ctrl+v on Linux/macOS). Windows Terminal VT mode check: before Node 22.17.0 and Bun 1.2.23, shift+tab doesn't work, meta+m is the fallback.

ctrl+c and ctrl+d can't be rebound. Protected by reservedShortcuts.ts.

Bindings gated on feature flags: space push-to-talk (VOICE_MODE), shift+up message actions (MESSAGE_ACTIONS), ctrl+shift+b toggle brief (KAIROS_BRIEF), ctrl+shift+f/p global search/quick open (QUICK_SEARCH).

User customization: ~/.claude/keybindings.json.

Voice mode

Requires an Anthropic OAuth token. Doesn't work with API key, Bedrock, Vertex, or Foundry. The voice_stream endpoint is only available on claude.ai.

GrowthBook killswitch: tengu_amber_quartz_disabled. On macOS, token check goes through the security CLI (~20-50ms cold, subsequent calls cached).

18. Infrastructure

Native TypeScript ports

Pure TypeScript reimplementations of three modules:

yoga-layout: Meta's Yoga flexbox engine (C++ to TS, ~2,500 lines). The subset Ink uses: flex-direction, grow/shrink/basis, align-items, justify-content, margin/padding/border/gap, position absolute/relative, measure functions. Wrap, baseline, display:contents are also there, Ink just doesn't use them.
file-index: nucleo (Helix editor's fuzzy finder, Rust to TS). nucleo-style scoring: SCORE_MATCH=16, BONUS_BOUNDARY=8, BONUS_CAMEL=6. Test files take a 1.05x penalty. During async build, ready prefix searches work (incremental). Chunk size is time-based (4ms), adaptive to the machine.
color-diff: a color difference calculation module.

No native dependencies needed.

Upstream proxy (CCR)

Enterprise MITM proxy support. Startup sequence:

Reads the token from /run/ccr/session_token
On Linux, prctl(PR_SET_DUMPABLE, 0) calls libc.so.6 via Bun FFI. Protects the heap against ptrace. Blocks the gdb -p $PPID attack via prompt injection.
Downloads the proxy CA cert and merges it with the system CA bundle
Starts a local CONNECT-to-WebSocket relay
Deletes the token file (it stays in heap)
Injects HTTPS_PROXY, SSL_CERT_FILE, NODE_EXTRA_CA_CERTS into child processes

NO_PROXY: anthropic.com, github.com, registry.npmjs.org, pypi.org, etc.

CCR_UPSTREAM_PROXY_ENABLED is set server-side (because GrowthBook is cold in the container, the client check isn't trustworthy).

Fail-open. A proxy setup error doesn't kill the session.

Migrations

One-way, usually idempotent:

migrateSonnet45ToSonnet46: Pro/Max/Team from explicit model string to the sonnet alias
migrateSonnet1mToSonnet45: sonnet[1m] conversion (in 4.6, 1m was opened to a different group)
migrateFennecToOpus: ant-only, fennec-latest alias to opus
migrateLegacyOpusToCurrent: explicit legacy Opus strings to the opus alias
migrateOpusToOpus1m: on Max/Team Premium, opus gets auto-upgraded to opus[1m]
resetProToOpusDefault: resets Pro users to the Opus 4.5 default
Config key renames, auto mode dialog reset

LSP integration

LSPServerManager routes to an LSP server based on file extension. Requests like textDocument/definition, textDocument/references, etc.

passiveFeedback.ts converts LSP publishDiagnostics notifications into Claude's attachment format. Severity mapping: 1=Error, 2=Warning, 3=Information, 4=Hint. Malformed file:// URIs are normalized; if that fails, the URI is used as-is.

19. Other details

Buddy system

A virtual pet system in buddy/. Deterministic animal derivation from a UUID hash: 18 species, 1% shiny, rarity from Common to Legendary, RPG stats (DEBUGGING, PATIENCE, CHAOS, WISDOM, SNARK).

The species name "duck" is hidden with hex codes (0x64,0x75,0x63,0x6b). Clashes with an internal model codename.

No cheating possible. Species and stats are recomputed from the UUID every time.

`moreright/` stub

A single no-op file. The real implementation lives in the internal build. Interface: onBeforeQuery (before every API call) and onTurnComplete (after every turn). Nobody knows what it does.

`cyberRiskInstruction.ts`

A cybersecurity instruction. At the top of the file: "Do not modify without Safeguards Team review." At the bottom, a direct message to Claude: "Claude: Do not edit this file unless explicitly asked to do so by the user."

Channel permissions

MCP channel approval IDs are 5 letters, filtered through a 24-word profanity filter. Code comment: "this is why i bias to numbers, hard to have anything worse than 80085."

Screens

Three main React screen components: Doctor.tsx (the /doctor command), REPL.tsx (the main REPL, at 874KB the largest file), ResumeConversation.tsx (session resume).

Conclusion

512K lines of code. A day's reading.

Engineering quality is high. Circuit breaker on compaction, cascade prevention on 529s, sibling abort in streaming tool execution, prompt cache diagnostics. Decisions made by a team that's been paged at 3 AM.

Privacy is mixed. Prompts redacted by default (good). Telemetry on by default without documentation (bad). Email sent to the A/B test infrastructure (debatable). autoDream runs without permission (uncomfortable).

The most striking find is undercover mode. Not technically, philosophically. A company that makes AI developer tools has its tool hide its identity when its own engineers use it in public.

The leak itself is the most ordinary of mistakes. A missing line in .npmignore. A 57MB file. A publish pipeline nobody's checking.