DEV Community: Rohan Sharma

Your AI Agent's API Keys Are Exposed. Here's the Structural Fix.

Rohan Sharma — Wed, 25 Mar 2026 17:58:03 +0000

Every agent framework stores credentials in plaintext. Wardn makes that architecturally impossible.

AI agents are shipping fast. CrewAI, AutoGen, LangChain, Claude Code — they all need API keys to function. And they all store them the same way: environment variables, .env files, or config YAML sitting on disk in plaintext.

That's not a configuration problem. It's a structural vulnerability.

A compromised agent, a malicious skill, a commodity stealer, or even a prompt injection — any of these gets full, unrestricted access to your real API keys. And once a key leaks, there's no rate limit, no blast radius control, no way to know which agent was responsible.

We built wardn to fix this at the architecture level.

The Problem, Visualized

┌─────────────────────────────────────────────────────────────┐
│                    TYPICAL AGENT SETUP                       │
│                                                             │
│  ~/.env                                                     │
│  ┌─────────────────────────────────────────┐                │
│  │ OPENAI_KEY=sk-proj-real-key-abc123      │  ← plaintext   │
│  │ ANTHROPIC_KEY=sk-ant-real-key-xyz789    │  ← readable    │
│  └─────────────────────────────────────────┘  ← by anyone   │
│                                                             │
│  Agent Process Memory                                       │
│  ┌─────────────────────────────────────────┐                │
│  │ env::var("OPENAI_KEY")                  │                │
│  │ → "sk-proj-real-key-abc123"             │  ← in memory   │
│  └─────────────────────────────────────────┘                │
│                                                             │
│  LLM Context Window                                         │
│  ┌─────────────────────────────────────────┐                │
│  │ "Use Authorization: Bearer sk-proj-..." │  ← in context  │
│  └─────────────────────────────────────────┘                │
│                                                             │
│  Agent Logs                                                 │
│  ┌─────────────────────────────────────────┐                │
│  │ POST api.openai.com                     │                │
│  │ Authorization: Bearer sk-proj-real-...  │  ← in logs     │
│  └─────────────────────────────────────────┘                │
└─────────────────────────────────────────────────────────────┘

Attack surface: env files, process memory, context window, logs
Any single compromise = full credential access

Four places where your real API key sits exposed. Four vectors for theft. And this is the default in every major agent framework today.

The Fix: Placeholder Tokens + Network-Layer Injection

Wardn introduces a simple but powerful architectural change: agents never hold real credentials. Instead, they get cryptographically random placeholder tokens that are worthless outside the local proxy.

┌─────────────────────────────────────────────────────────────┐
│                     WARDN ARCHITECTURE                       │
│                                                             │
│  Agent Environment                                          │
│  ┌─────────────────────────────────────────┐                │
│  │ OPENAI_KEY=wdn_placeholder_a1b2c3d4e5f6 │  ← useless    │
│  └─────────────────────────────────────────┘                │
│                                                             │
│  Agent Logs                                                 │
│  ┌─────────────────────────────────────────┐                │
│  │ Authorization: Bearer wdn_placeholder_… │  ← useless    │
│  └─────────────────────────────────────────┘                │
│                                                             │
│  LLM Context Window                                         │
│  ┌─────────────────────────────────────────┐                │
│  │ "wdn_placeholder_a1b2c3d4e5f6"         │  ← useless    │
│  └─────────────────────────────────────────┘                │
│                                                             │
│  ┌─────────────────────────────────────────┐                │
│  │           WARDN ENCRYPTED VAULT         │                │
│  │  ┌─────────────────────────────────┐    │                │
│  │  │ AES-256-GCM + Argon2id KDF     │    │                │
│  │  │ OPENAI_KEY = sk-proj-real-...   │    │  ← encrypted  │
│  │  │ ANTHROPIC_KEY = sk-ant-real-... │    │  ← on disk     │
│  │  └─────────────────────────────────┘    │                │
│  └─────────────────────────────────────────┘                │
│                                                             │
│  Attack surface: encrypted vault (passphrase-protected)     │
│  Agent compromise = attacker gets useless placeholder       │
└─────────────────────────────────────────────────────────────┘

The real credential exists in exactly two places: encrypted on disk, and briefly in the proxy's memory during request forwarding. The agent, its logs, its context window — all hold worthless placeholders.

How the Proxy Works

Wardn runs a local HTTP proxy (default localhost:7777) that intercepts agent requests and performs a six-stage pipeline:

        Agent sends request
        Authorization: Bearer wdn_placeholder_a1b2c3d4...
                    │
                    ▼
    ┌───────────────────────────────┐
    │        WARDN PROXY            │
    │      localhost:7777           │
    │                               │
    │  ┌─── REQUEST PIPELINE ────┐  │
    │  │                         │  │
    │  │  ① Identify Agent       │  │  x-warden-agent header
    │  │         │               │  │
    │  │  ② Resolve Placeholder  │  │  wdn_placeholder → credential name
    │  │         │               │  │
    │  │  ③ Check Authorization  │  │  agent + domain allowed?
    │  │         │               │  │
    │  │  ④ Check Rate Limit     │  │  token bucket per agent × cred
    │  │         │               │  │
    │  │  ⑤ Inject Real Key      │  │  decrypt from vault, swap in
    │  │         │               │  │
    │  │  ⑥ Forward Request      │  │  send to external API
    │  │                         │  │
    │  └─────────────────────────┘  │
    │                               │
    │  ┌── RESPONSE PIPELINE ────┐  │
    │  │                         │  │
    │  │  ⑦ Strip Real Key       │  │  remove credential from body
    │  │         │               │  │
    │  │  ⑧ Return to Agent      │  │  clean response, placeholder only
    │  │                         │  │
    │  └─────────────────────────┘  │
    └───────────────────────────────┘
                    │
                    ▼
           External API
    (only place real key exists in transit)

Notice the response pipeline — step 7 strips real credentials from API responses before they reach the agent. Some APIs echo back your key in response headers or error messages. Wardn catches that.

Per-Agent Isolation

This is where it gets interesting. Each agent gets its own unique placeholder for the same credential:

                     ┌──────────────────┐
                     │   WARDN VAULT    │
                     │                  │
                     │  OPENAI_KEY =    │
                     │  sk-proj-real... │
                     └────────┬─────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
     ┌────────────┐  ┌────────────┐  ┌────────────┐
     │ researcher │  │   writer   │  │  analyzer  │
     │            │  │            │  │            │
     │ wdn_plc_   │  │ wdn_plc_   │  │ wdn_plc_   │
     │ a1b2c3d4   │  │ e5f6g7h8   │  │ i9j0k1l2   │
     └────────────┘  └────────────┘  └────────────┘
      Different          Different       Different
      placeholder        placeholder     placeholder
      Same real key      Same real key   Same real key

If one agent is compromised, its placeholder is revoked without affecting others. You know exactly which agent leaked. And the leaked token is useless — it only works through the local proxy with the correct agent identity.

Zero-Downtime Key Rotation

When you need to rotate a compromised key, agents don't even notice:

BEFORE ROTATION                    AFTER ROTATION

Vault:                             Vault:
  OPENAI_KEY = sk-proj-OLD         OPENAI_KEY = sk-proj-NEW  ← changed

researcher → wdn_plc_a1b2c3d4     researcher → wdn_plc_a1b2c3d4  ← same
writer     → wdn_plc_e5f6g7h8     writer     → wdn_plc_e5f6g7h8  ← same

$ wardn vault rotate OPENAI_KEY
# Enter new value → done.
# Zero agent restarts. Zero config changes. Zero downtime.

Placeholders are bound to credential names, not values. Rotate the underlying key and every agent's placeholder keeps working — now resolving to the new key.

The Encryption Stack

Wardn's vault isn't a glorified JSON file with a password. It's built on serious cryptography:

┌────────────────────────────────────────────────────┐
│                VAULT FILE FORMAT                    │
│                                                    │
│  Bytes 0-3    "WDNV"           Magic identifier    │
│  Bytes 4-5    Version          u16 little-endian    │
│  Bytes 6-21   Salt             16 random bytes      │
│  Bytes 22+    Encrypted Payload                     │
│               ├── 12-byte nonce (random per write)  │
│               ├── ciphertext (variable length)      │
│               └── 16-byte authentication tag        │
│                                                    │
├────────────────────────────────────────────────────┤
│              KEY DERIVATION                         │
│                                                    │
│  Algorithm:  Argon2id                              │
│  Memory:     19,456 KiB (19 MiB)                   │
│  Iterations: 2                                     │
│  Parallelism: 1                                    │
│  Output:     256-bit key                           │
│                                                    │
│  (OWASP 2024 minimum parameters)                   │
│                                                    │
├────────────────────────────────────────────────────┤
│              ENCRYPTION                            │
│                                                    │
│  Algorithm:  AES-256-GCM                           │
│  Nonce:      12 bytes (random per encryption)      │
│  Tag:        16 bytes (authenticated encryption)   │
│                                                    │
├────────────────────────────────────────────────────┤
│              MEMORY SAFETY                         │
│                                                    │
│  SensitiveString  →  Zeroized on drop              │
│  SensitiveBytes   →  Zeroized on drop              │
│  Debug output     →  "[REDACTED]"                  │
│                                                    │
├────────────────────────────────────────────────────┤
│              PERSISTENCE                           │
│                                                    │
│  Atomic writes:  write to .tmp → rename            │
│  No partial state, no corruption window            │
└────────────────────────────────────────────────────┘

Argon2id for key derivation — resistant to GPU and side-channel attacks
AES-256-GCM for authenticated encryption — tamper-evident
Zeroize on drop — sensitive data scrubbed from memory when no longer needed
Atomic writes — vault file is never in a half-written state

Built-In Credential Scanner

Already have keys scattered across your projects? Wardn finds them:

$ wardn migrate --source claude-code --dry-run

╔══════════════════════════════════════════════════════════════╗
║                   CREDENTIAL SCAN RESULTS                   ║
╠══════════════════════════════════════════════════════════════╣
║                                                              ║
║  Source: ~/.claude                                            ║
║  Files scanned: 47                                           ║
║                                                              ║
║  ┌──────────┬──────────────────┬──────────┬────────────┐     ║
║  │ Severity │ Pattern          │ Count    │ Score      │     ║
║  ├──────────┼──────────────────┼──────────┼────────────┤     ║
║  │ CRITICAL │ OpenAI (sk-proj) │    2     │  80 pts    │     ║
║  │ CRITICAL │ Anthropic (sk-a) │    1     │  40 pts    │     ║
║  │ HIGH     │ GitHub (ghp_)    │    3     │  60 pts    │     ║
║  │ MEDIUM   │ Slack (xoxb-)    │    1     │  10 pts    │     ║
║  └──────────┴──────────────────┴──────────┴────────────┘     ║
║                                                              ║
║  Risk Score: 190 / 400  ████████████░░░░░░░░  HIGH           ║
║                                                              ║
║  Run without --dry-run to migrate to encrypted vault         ║
╚══════════════════════════════════════════════════════════════╝

20+ credential patterns detected across severity levels. Supports scanning Claude Code configs, OpenClaw, or any directory. Risk scoring weights critical credentials (OpenAI, Anthropic, Stripe live keys) higher than generic tokens.

MCP Integration: Agent-Native Credential Access

Wardn ships with a built-in MCP server for direct integration with Claude Code, Cursor, and other MCP-capable tools:

$ wardn serve --mcp --agent my-agent

┌────────────────────────────────────────────────────┐
│              WARDN MCP SERVER                       │
│              Transport: stdio                       │
│                                                    │
│  ┌──────────────────────────────────────────────┐  │
│  │  Tool: get_credential_ref                     │  │
│  │  → Returns placeholder token for a credential │  │
│  │  → Per-agent isolation enforced               │  │
│  │  → Real value NEVER returned                  │  │
│  └──────────────────────────────────────────────┘  │
│                                                    │
│  ┌──────────────────────────────────────────────┐  │
│  │  Tool: list_credentials                       │  │
│  │  → Lists authorized credentials + metadata    │  │
│  │  → Filtered by agent's access list            │  │
│  │  → Shows: name, domains, rate_limit (bool)    │  │
│  └──────────────────────────────────────────────┘  │
│                                                    │
│  ┌──────────────────────────────────────────────┐  │
│  │  Tool: check_rate_limit                       │  │
│  │  → Query remaining quota                      │  │
│  │  → Returns: remaining, limit, retry_after     │  │
│  └──────────────────────────────────────────────┘  │
│                                                    │
│  All tools are READ-ONLY                           │
│  No credential values ever returned                │
│  Session bound to agent_id at connection time      │
└────────────────────────────────────────────────────┘

Three read-only tools. An agent can check what credentials it has access to, get its placeholder token, and query its rate limit — but it can never retrieve the actual credential value.

Rate Limiting: Blast Radius Control

A looping agent or a compromised tool can rack up thousands of API calls in minutes. Wardn enforces per-credential, per-agent token bucket rate limiting:

┌────────────────────────────────────────────────────┐
│             RATE LIMIT: Token Bucket                │
│                                                    │
│  Credential: OPENAI_KEY                            │
│  Config:     200 calls / hour                      │
│                                                    │
│  ┌──────────────────────────────────────────────┐  │
│  │                                              │  │
│  │  researcher  ████████████████████░░░░  180   │  │
│  │  writer      ██████████████░░░░░░░░░  140   │  │
│  │  analyzer    ████████████████████████  200   │  │
│  │                                              │  │
│  │  ← tokens remaining (refill: 0.055/sec) →   │  │
│  └──────────────────────────────────────────────┘  │
│                                                    │
│  Credential: ANTHROPIC_KEY                         │
│  Config:     100 calls / hour                      │
│                                                    │
│  ┌──────────────────────────────────────────────┐  │
│  │                                              │  │
│  │  researcher  ██████████████████████░░   92   │  │
│  │                                              │  │
│  │  writer: NOT AUTHORIZED                      │  │
│  │  analyzer: NOT AUTHORIZED                    │  │
│  │                                              │  │
│  └──────────────────────────────────────────────┘  │
│                                                    │
│  Each agent has independent token buckets          │
│  One agent hitting limit doesn't affect others     │
└────────────────────────────────────────────────────┘

Configuration in wardn.toml:

[warden.credentials.OPENAI_KEY]
rate_limit = { max_calls = 200, per = "hour" }
allowed_agents = ["researcher", "writer", "analyzer"]
allowed_domains = ["api.openai.com"]

[warden.credentials.ANTHROPIC_KEY]
rate_limit = { max_calls = 100, per = "hour" }
allowed_agents = ["researcher"]
allowed_domains = ["api.anthropic.com"]

What This Defeats

Attack Vector	Without Wardn	With Wardn
`.env` file theft	Real keys exposed	No `.env` files exist
Malicious skill reads `$OPENAI_KEY`	Gets `sk-proj-real-...`	Gets `wdn_placeholder_...` (useless)
Stealer targets agent config	Finds real credentials	Finds only placeholders
Prompt injection exfiltrates key	Key is in context window	Key was never in context
Agent logs scraped	`Authorization: Bearer sk-proj-...`	`Authorization: Bearer wdn_placeholder_...`
Full agent compromise	Attacker has real key	Attacker has useless token
Looping agent burns budget	Unlimited API calls	Rate limit per agent per credential
API response echoes key	Key reaches agent memory	Stripped by response pipeline

This isn't defense in depth. It's defense by architecture. The real key physically cannot reach the agent process.

Quick Start

# Install
cargo install wardn

# Create an encrypted vault
wardn vault create

# Store your credentials (interactive, no echo)
wardn vault set OPENAI_KEY
wardn vault set ANTHROPIC_KEY

# Get placeholder tokens for your agents
wardn vault get OPENAI_KEY --agent researcher
# → wdn_placeholder_a1b2c3d4e5f6g7h8

# Start the proxy
wardn serve

# Or start proxy + MCP server for Claude Code / Cursor
wardn serve --mcp --agent my-agent

Point your agent's HTTP client at localhost:7777. Set the placeholder as the API key. Done.

The Bigger Picture

Wardn is a security middleware layer for AI agents. Today, it solves credential isolation. The same proxy architecture extends to:

Request auditing — full visibility into what agents are actually calling
Domain allowlisting — agents can only reach approved APIs
Cost attribution — know exactly which agent is spending what
Policy enforcement — agent-specific rules beyond just rate limits

The AI agent ecosystem is growing fast. The security primitives haven't kept up. We think credential isolation is the foundation everything else builds on.

Try It

cargo install wardn

GitHub: github.com/rohansx/wardn
Crates.io: crates.io/crates/wardn
License: MIT

Wardn is written in Rust. ~4,500 lines. Zero unsafe. AES-256-GCM + Argon2id. One binary, no external services.

SQLite as a Graph Database: Recursive CTEs, Semantic Search, and Why We Ditched Neo4j

Rohan Sharma — Tue, 24 Mar 2026 08:05:21 +0000

Knowledge graphs are having a moment. Every AI agent framework wants one. The typical stack looks like this: Neo4j for graph storage, OpenAI for extraction, Docker to run it all. Three moving parts, two network dependencies, one docker-compose.yml you'll fight with for an hour.

We built ctxgraph to see how far you can get with just SQLite. The answer: surprisingly far. 0.800 combined F1 on extraction benchmarks, zero API calls, ~2 seconds for 50 episodes, single binary. No Docker. No API keys. No Neo4j.

This post walks through the actual implementation: the schema, the recursive CTEs that make SQLite behave like a graph database, and the 3-mode search fusion that ties it together.

The Problem: Too Much Infrastructure for a Knowledge Graph

If you want a knowledge graph for your dev team today, the minimum viable stack is:

Neo4j -- requires Docker or a managed instance
An LLM API -- OpenAI, Anthropic, etc. for entity/relation extraction
An embedding service -- for semantic search
A vector database -- Pinecone, Qdrant, etc. for embedding storage

That's four services for what is conceptually a simple thing: "store facts about my codebase and let me query them."

We wanted something that ships as a single binary, works offline, and stores everything in a single file you can cp to a backup drive. SQLite was the obvious choice for storage. The question was whether it could handle graph operations.

Four services, two network dependencies, and a Docker compose file -- or one binary and one file.

The Bet: SQLite + Recursive CTEs = Graph Database

The core insight is that a graph database is really two things: a storage format for nodes and edges, and a query engine that can walk those edges efficiently. SQLite handles the first part trivially. For the second part, recursive Common Table Expressions (CTEs) give you everything you need for multi-hop traversal.

This isn't a new idea. What we found is that with proper indexing, it's fast enough for knowledge graphs in the tens-of-thousands-of-nodes range -- which covers most single-team or single-project use cases.

The Schema

Here's the actual schema from our migration file. Three core tables, two junction tables, three FTS5 virtual tables, and eight indexes.

-- Episodes: raw events (conversations, decisions, incidents)
CREATE TABLE episodes (
    id          TEXT PRIMARY KEY,
    content     TEXT NOT NULL,
    source      TEXT,
    recorded_at TEXT NOT NULL,
    metadata    TEXT,
    embedding   BLOB
);

-- Entities: extracted nodes (people, services, decisions)
CREATE TABLE entities (
    id          TEXT PRIMARY KEY,
    name        TEXT NOT NULL,
    entity_type TEXT NOT NULL,
    summary     TEXT,
    created_at  TEXT NOT NULL,
    metadata    TEXT,
    embedding   BLOB
);

-- Edges: relationships between entities
CREATE TABLE edges (
    id          TEXT PRIMARY KEY,
    source_id   TEXT NOT NULL REFERENCES entities(id),
    target_id   TEXT NOT NULL REFERENCES entities(id),
    relation    TEXT NOT NULL,
    fact        TEXT,
    valid_from  TEXT,
    valid_until TEXT,
    recorded_at TEXT NOT NULL,
    confidence  REAL DEFAULT 1.0,
    episode_id  TEXT REFERENCES episodes(id),
    metadata    TEXT
);

Why Bi-Temporal Matters

Look at the edges table. It has two temporal dimensions:

valid_from / valid_until -- when the fact was true in the real world. "AuthService depends on Redis" was valid from March 1st until March 15th, when we migrated to Postgres.
recorded_at -- when the system learned about the fact. We might record the Redis-to-Postgres migration on March 20th, five days after it happened.

This distinction matters for debugging. "What did we know about the system on March 10th?" is a different question from "What was actually true about the system on March 10th?" Bi-temporal modeling lets you answer both.

In practice, querying "current state" means filtering for valid_until IS NULL:

SELECT * FROM edges
WHERE (source_id = ?1 OR target_id = ?1)
  AND valid_until IS NULL
ORDER BY recorded_at DESC

Invalidating an edge doesn't delete it -- it sets valid_until, preserving the full history:

UPDATE edges SET valid_until = ?1
WHERE id = ?2 AND valid_until IS NULL

This is the same pattern Datomic and Graphiti use. The difference is we do it in a 2MB SQLite file instead of a JVM process or a Neo4j container.

Graph Traversal via Recursive CTE

This is the core of making SQLite act as a graph database. Here's the actual traversal query from our codebase:

WITH RECURSIVE traversal(entity_id, depth) AS (
    -- Base case: start at the given entity
    SELECT ?1, 0

    UNION

    -- Recursive step: walk edges in both directions
    SELECT
        CASE WHEN e.source_id = t.entity_id THEN e.target_id
             ELSE e.source_id END,
        t.depth + 1
    FROM traversal t
    JOIN edges e ON (e.source_id = t.entity_id OR e.target_id = t.entity_id)
    WHERE t.depth < ?2
      AND e.valid_until IS NULL  -- only current edges
)
SELECT DISTINCT ent.id, ent.name, ent.entity_type, ent.summary,
                ent.created_at, ent.metadata, t.depth
FROM traversal t
JOIN entities ent ON ent.id = t.entity_id
ORDER BY t.depth

Here's what the traversal looks like on a small graph. Starting from "AuthService" at depth 0, the CTE walks outward one hop at a time:

The CTE starts with "AuthService" (depth 0, blue), discovers "JWT" and "Redis" (depth 1, purple), then reaches "Postgres", "SessionStore", and "TokenCache" (depth 2, green). UNION deduplicates, so if two paths reach the same node, it appears only once.

Let's break down what this does:

Base case: Seed the traversal with the starting entity at depth 0.
Recursive step: For each entity already discovered, find all edges where it's either the source or the target. This gives you bidirectional traversal -- you walk the graph regardless of edge direction.
The CASE expression: Picks the other end of the edge. If we arrived via source_id, take target_id, and vice versa.
Depth limiting: WHERE t.depth < ?2 caps how many hops you'll walk.
Temporal filtering: AND e.valid_until IS NULL restricts to currently-valid edges. Remove this clause to traverse the full history.
UNION (not UNION ALL): Deduplicates. If entity B is reachable from A via two different paths, it appears once.

After collecting traversed entities, we grab all edges between them in a second query:

SELECT id, source_id, target_id, relation, fact,
       valid_from, valid_until, recorded_at, confidence, episode_id, metadata
FROM edges
WHERE source_id IN (?1, ?2, ...) AND target_id IN (?1, ?2, ...)
  AND valid_until IS NULL
ORDER BY recorded_at DESC

How This Compares to Cypher

The equivalent Neo4j Cypher query would be:

MATCH path = (start:Entity {id: $id})-[*1..3]-(neighbor)
WHERE ALL(r IN relationships(path) WHERE r.valid_until IS NULL)
RETURN DISTINCT neighbor, length(path) AS depth
ORDER BY depth

Cypher is more concise, no question. But the SQL version is self-contained -- no external database process, no Bolt protocol, no connection pooling. And for knowledge graphs under ~50k entities, the performance difference is negligible.

3-Mode Search Fusion

Search is where things get interesting. We have three retrieval modes, each with different strengths:

FTS5 keyword search -- exact token matching via SQLite's built-in full-text search (BM25 ranking)
Semantic similarity -- cosine similarity against embeddings from all-MiniLM-L6-v2 (384 dimensions, runs locally via ONNX)
Graph traversal -- walk edges from entities mentioned in the query

Here's how data flows through the system, from ingestion to query:

Three independent retrieval modes run in parallel, each producing a ranked list. Reciprocal Rank Fusion combines them without needing comparable scores.

The problem: how do you combine ranked results from three systems with completely different scoring scales? BM25 scores, cosine similarities, and graph depth are not comparable.

Reciprocal Rank Fusion (RRF)

The answer is Reciprocal Rank Fusion. Instead of combining scores, you combine ranks:

rrf_score(d) = sum( 1 / (k + rank_i(d)) )  for each mode i where d appears

With k=60 (the standard constant from the original paper), a document ranked #1 in one mode gets 1/61 = 0.0164. Ranked #10 gets 1/70 = 0.0143. The key property: a document appearing in multiple modes gets scores from each, naturally boosting results that are relevant across different retrieval strategies.

Here's the actual implementation from graph.rs:

pub fn search_fused(
    &self,
    query: &str,
    query_embedding: &[f32],
    limit: usize,
) -> Result<Vec<FusedEpisodeResult>> {
    const K: f64 = 60.0;

    let mut scores: HashMap<String, f64> = HashMap::new();
    let mut episodes_map: HashMap<String, Episode> = HashMap::new();

    // --- FTS5 ranked list ---
    let fts_pool = (limit * 10).max(200);
    if let Ok(fts) = self.storage.search_episodes(query, fts_pool) {
        for (rank, (episode, _)) in fts.into_iter().enumerate() {
            let rrf = 1.0 / (K + rank as f64 + 1.0);
            *scores.entry(episode.id.clone()).or_insert(0.0) += rrf;
            episodes_map.insert(episode.id.clone(), episode);
        }
    }

    // --- Semantic (cosine similarity) ranked list ---
    let all_embeddings = self.get_embeddings()?;
    if !all_embeddings.is_empty() && !query_embedding.is_empty() {
        let mut semantic: Vec<(String, f32)> = all_embeddings
            .into_iter()
            .map(|(id, vec)| {
                let sim = cosine_similarity(query_embedding, &vec);
                (id, sim)
            })
            .collect();
        semantic.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());

        for (rank, (ep_id, _sim)) in semantic.into_iter().enumerate() {
            let rrf = 1.0 / (K + rank as f64 + 1.0);
            *scores.entry(ep_id.clone()).or_insert(0.0) += rrf;
        }
    }

    // Sort by total RRF score descending, take top `limit`
    let mut fused: Vec<(String, f64)> = scores.into_iter().collect();
    fused.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
    // ... take(limit) and return
}

The FTS5 query underneath is straightforward:

SELECT e.id, e.content, e.source, e.recorded_at, e.metadata, rank
FROM episodes_fts fts
JOIN episodes e ON e.rowid = fts.rowid
WHERE episodes_fts MATCH ?1
ORDER BY rank
LIMIT ?2

FTS5's rank column is the negative BM25 score (lower is better), so we negate it. But with RRF, the raw score doesn't matter -- only the position in the ranked list.

Why Not Just Use Vector Search?

Pure semantic search misses exact matches. If someone searches for "Redis migration," you want episodes containing the literal string "Redis migration" to rank highly even if their embedding isn't the closest vector. FTS5 catches these. Conversely, semantic search catches rephrased mentions ("moved our caching layer to a different store") that keyword search would miss.

RRF is the simplest way to get both without tuning weights.

Performance: 8 Indexes and Why SQLite Scales

Here are the eight indexes we maintain:

CREATE INDEX idx_edges_source     ON edges(source_id);
CREATE INDEX idx_edges_target     ON edges(target_id);
CREATE INDEX idx_edges_relation   ON edges(relation);
CREATE INDEX idx_edges_valid      ON edges(valid_from, valid_until);
CREATE INDEX idx_entities_type    ON entities(entity_type);
CREATE INDEX idx_episode_entities ON episode_entities(entity_id);
CREATE INDEX idx_episodes_source  ON episodes(source);
CREATE INDEX idx_episodes_recorded ON episodes(recorded_at);

What each optimizes:

Index	Optimizes
`idx_edges_source` / `idx_edges_target`	Graph traversal JOIN on `edges.source_id` and `edges.target_id`. Without these, every recursive CTE step is a full table scan.
`idx_edges_relation`	Filtering edges by relation type ("show me all `depends_on` edges").
`idx_edges_valid`	Temporal queries. The composite index on `(valid_from, valid_until)` lets the planner efficiently filter current vs. historical edges.
`idx_entities_type`	Listing/filtering entities by type. Used during deduplication (find all entities of type "Service" for fuzzy matching).
`idx_episode_entities`	Reverse lookup: "which episodes mention this entity?"
`idx_episodes_source` / `idx_episodes_recorded`	Filtering episodes by source system and time-range queries.

On top of these, we have three FTS5 virtual tables (episodes_fts, entities_fts, edges_fts) with triggers that keep them in sync on every INSERT, UPDATE, and DELETE. FTS5 maintains its own inverted index internally.

SQLite Scales Further Than You Think

Common objection: "SQLite can't handle large datasets." In practice:

SQLite handles millions of rows without issue when properly indexed. Our edge traversal is O(branching_factor^depth), bounded by max_depth, not by total table size.
WAL mode (PRAGMA journal_mode = WAL) allows concurrent reads during writes. We set this on every connection.
Page cache scales with available RAM. SQLite's default 2MB cache can be bumped to 64MB+ for hot datasets.
The semantic search scan (loading all embeddings for cosine similarity) is the bottleneck. At 10k episodes with 384-dim embeddings, that's ~15MB of data. Loads in milliseconds.

When You'd Actually Need HNSW

The brute-force cosine similarity scan works fine up to roughly 50-100k embeddings. Beyond that, you want an approximate nearest neighbor index. Options:

sqlite-vec -- SQLite extension that adds HNSW indexing. Drop-in compatible.
Qdrant/Milvus -- if you need distributed vector search.

We chose brute-force initially because it's zero dependencies and the accuracy is exact (no approximation error). For a single-team knowledge graph, you're unlikely to hit 100k episodes.

The Results

From our benchmark against 50 gold-labeled episodes (ADRs, incident reports, PR descriptions, migration plans):

Metric	ctxgraph (local)	Graphiti + gpt-4o
Entity F1	0.837	0.570
Relation F1	0.763	0.104*
Combined F1	0.800	0.337*
API calls	0	~200+
Cost per run	$0	~$2-5
Latency (50 eps)	~2s	~8min

* Graphiti's free-form relations mapped to ctxgraph's fixed taxonomy with generous keyword heuristics.

The extraction pipeline uses GLiNER Large v2.1 (INT8 ONNX) for NER and a heuristic keyword + proximity scorer for relation extraction. Everything runs locally. The ~2 second total for 50 episodes works out to roughly 40ms per episode.

Why does a local ONNX model beat gpt-4o? It's not that gpt-4o is worse at understanding text. It's that Graphiti extracts differently: multi-word descriptive phrases ("primary Postgres cluster") instead of canonical names ("Postgres"), free-form relation verbs instead of a fixed schema. For structured downstream querying, schema-typed extraction with a focused model wins.

When NOT to Use This

SQLite-as-graph-database is the right call for a specific range of problems. Here's when it isn't:

Scale: >100k entities with deep traversals. The recursive CTE does a breadth-first search via SQL. At depth 4 with an average branching factor of 10, you're visiting 10,000 nodes per query. SQLite handles this in milliseconds with proper indexes, but at 500k entities with depth 6, you'll feel it. Neo4j's native graph storage format is genuinely faster for deep traversals on large graphs.

Concurrency: multi-user concurrent writes. SQLite's write lock is process-wide. WAL mode helps (concurrent reads are fine), but if you have 10 services writing to the same graph simultaneously, you'll hit contention. PostgreSQL with a graph schema, or Neo4j, are better choices.

Query expressiveness: when you need Cypher. Cypher's pattern matching is more expressive than recursive CTEs. "Find all paths between A and B where every intermediate node is a Service" is natural in Cypher and painful in SQL. If your queries look like this regularly, use a graph database.

Distributed systems. SQLite is a single-file embedded database. It doesn't replicate, shard, or cluster. If you need your knowledge graph available across multiple machines, this isn't the architecture.

The Takeaway

SQLite is not a graph database. But with recursive CTEs, FTS5, embeddings stored as BLOBs, and Reciprocal Rank Fusion to tie it all together, it's a remarkably capable one for the single-user, single-machine use case.

The full stack -- entity extraction, relation extraction, graph storage, keyword search, semantic search, graph traversal -- runs as a single binary with no network dependencies. The database is a single file. Backup is cp. Deployment is cargo install.

Sometimes the best architecture is the one with the fewest moving parts.

ctxgraph is open-source (MIT). Star it on GitHub if this was useful. We're actively inviting benchmark episode submissions to independently validate the extraction quality.

Your RAG Pipeline is Leaking - 4 Data Leak Points Nobody Talks About

Rohan Sharma — Fri, 06 Mar 2026 18:33:13 +0000

Every enterprise running RAG today is doing what Samsung engineers did in 2023 — sending sensitive data to LLM providers. Except it's automated, at scale, thousands of times per day.

Samsung's problem wasn't careless employees. It was architectural. And your RAG pipeline has the same architecture.

The 4 Leak Points

Your Documents (contracts, financials, HR, strategy)
        |
        v
   1. Chunking                  ✅ Local, safe
        |
        v
   2. Embedding API call         ❌ LEAK #1: raw text to provider
        |
        v
   3. Vector DB (cloud)          ❌ LEAK #2: invertible embeddings
        |
        v
   4. User query embedding       ❌ LEAK #3: query to embedding API
        |
        v
   5. Retrieved context          (your most sensitive chunks)
        |
        v
   6. LLM generation call        ❌ LEAK #4: query + context in plaintext
        |
        v
   Response to user

Six steps. Four leak points. Every single query.

Your compliance team saw a box labeled "LLM" in the architecture diagram and assumed it was local. It isn't.

"But Embeddings Are Just Numbers"

That was conventional wisdom until Zero2Text (Feb 2026) — a zero-training inversion attack that reconstructs text from embedding vectors with only API access. 1.8x higher ROUGE-L scores vs all prior baselines.

Patient records, legal docs, proprietary code — all recoverable from vectors alone.

A Pinecone/Weaviate breach = full plaintext breach. OWASP now classifies this as a Top 10 LLM vulnerability.

Why Existing Solutions Don't Work

Redaction kills utility:

Before: "Tata Motors reported Rs 3.4L Cr revenue in Q3 2025"
After:  "[REDACTED] reported [REDACTED] revenue in [REDACTED]"

Good luck getting useful embeddings from that. Your vector search returns garbage.

PII detectors (Presidio, LLM Guard):

50-200ms overhead per call (Python NER in hot path)
Only catch names/emails — miss revenue figures, deal sizes, project codenames
Stateless — different replacement each call breaks vector search

Cloud-locked tools: Bedrock guardrails = Bedrock only. Private AI = another SaaS middleman.

                    Consistent   Beyond    <10ms     Self-     Pipeline
                    mapping      PII       latency   hosted    aware
Presidio            ❌           ❌        ❌        ✅        ❌
LLM Guard           ❌           ❌        ❌        ✅        ❌
Bedrock Guardrails  ❌           ⚠️        ✅        ❌        ❌
CloakPipe           ✅           ✅        ✅        ✅        ✅

The Fix: Consistent Pseudonymization

Don't redact. Replace consistently.

Map "Tata Motors" → "ORG_7". Same token, every time, across every document and query.

Before: "Tata Motors reported Rs 3.4L Cr revenue in Q3 2025, up 12%"
After:  "ORG_7 reported AMOUNT_12 revenue in DATE_3, up PCT_3"

Semantic structure preserved → embeddings still meaningful → vector search works → LLM responds with pseudonyms → rehydrate back to real values.

"What was Tata Motors' revenue last quarter?"
        ↓
   Pseudonymize → "What was ORG_7's revenue last quarter?"
        ↓
   Embed + Search → retrieve pseudonymized chunks
        ↓
   LLM → "ORG_7 reported AMOUNT_12 in DATE_3..."
        ↓
   Rehydrate → "Tata Motors reported Rs 3.4L Cr in Q3 2025..."
        ↓
   ✅ User sees real answer. Provider never saw "Tata Motors."

Going Further: Kill 3/4 Leak Points

Vectorless tree search builds a local JSON index and lets the LLM reason about relevance. No embedding API. No vector DB. No inversion risk.

VECTOR RAG (4 leaks):              TREE-BASED RAG (1 leak):

Text → Embedding API  ❌           Tree index built locally  ✅
Vectors → Cloud DB    ❌           Tree stored locally       ✅
Query → Embedding API ❌           LLM navigates tree        ✅
Context → LLM         ❌           Pseudonymized → LLM      ⚠️ (protected)

PageIndex (VectifyAI) proved 98.7% accuracy on FinanceBench vs GPT-4o's ~31% for structured docs.

CloakPipe — Drop-In Privacy Proxy

I built CloakPipe — a Rust-native proxy that sits between your app and any OpenAI-compatible API.

Your App  →  CloakPipe  →  LLM API
                |               |
          "Tata Motors"    Sees "ORG_1"
          → "ORG_1"            |
                |              |
          "ORG_1"         ←----+
          → "Tata Motors"

Setup: change OPENAI_BASE_URL. That's it. Your LangChain/LlamaIndex/OpenAI SDK code works unchanged.

v0.1 features:

Multi-layer detection (API keys, JWTs, emails, IPs, financial amounts, fiscal dates, custom TOML rules)
AES-256-GCM encrypted vault + zeroize memory safety
OpenAI-compatible proxy (/v1/chat/completions, /v1/embeddings)
SSE streaming rehydration
Single binary, <5ms overhead

Coming soon:

🌳 CloakTree — vectorless retrieval, eliminates 3/4 leak points
🔐 CloakVector — distance-preserving vector encryption
🧠 ONNX-based NER
🏗️ TEE support (AWS Nitro, Intel TDX)

The privacy-preserving AI market is $4.25B today, projected $40B by 2035. 75% of enterprise leaders cite security as #1 barrier to AI adoption.

The era of sending raw enterprise data to LLM APIs in plaintext is ending.

github.com/rohansx/cloakpipe — star it, try it, break it.

Star CloakPipe on GitHub

Every AI Agent Tool Creates Git Worktrees. None of Them Make Worktrees Actually Work.

Rohan Sharma — Wed, 04 Mar 2026 23:41:08 +0000

I've been deep in the parallel AI agent ecosystem for months — Conductor, Claude Squad, Agent Deck, Claude Code's native agent teams. They all converge on the same architecture: spin up git worktrees, run an AI agent in each one, merge the results.

And they all have the same problem: the worktree is empty.

git worktree add ../my-feature feature/auth
cd ../my-feature
# Where's my .env? Gone.
# Where's node_modules? Gone.
# Where's my Docker compose state? Gone.
# Where's my database config? Gone.
# Time to write 50-100 lines of bash. Again.

The Bash Scripts Nobody Talks About

Every parallel agent workflow has a dirty secret: a setup script that does the actual work.

Here's what Conductor's docs tell you to write:

#!/bin/bash
cp ../.env .env
pnpm install

Looks simple. Now here's what real Conductor users actually write. A Phoenix developer published his setup script — it symlinks shared configs, copies build artifacts to avoid recompilation, reads Conductor's environment variables, generates workspace-specific .env.local files with unique ports and Docker compose project names, and sets up isolated containers per workspace.

Another team published their scripts — they generate unique database names from workspace directories, allocate workspace-specific ports via a port registry, create PostgreSQL databases per workspace, generate .env files with unique DATABASE_URL and PORT, assign iOS simulators per workspace, and do full cleanup on teardown.

Someone even built train-conductor — a standalone Deno tool that exists purely to do symlinks and scripts in worktrees, because no orchestration tool handles it.

The same pattern repeats everywhere. Claude Squad creates worktrees but doesn't install deps. Agent Deck creates worktrees but doesn't copy your .env. Claude Code's native agent teams create worktrees but don't symlink node_modules. Every tool assumes someone else will solve the environment problem.

Nobody does.

The Real Problem: Code Isolation ≠ Environment Isolation

Git worktrees give you code isolation. Your files are separate. Your branches don't conflict.

But your runtime environment is still shared:

Port conflicts: Two worktrees both run npm run dev on port 3000. One fails.
Database collisions: Two agents write to the same dev database. Data corruption.
Docker chaos: Two worktrees share docker compose project namespace. Containers clobber each other.
Disk explosion: 5 worktrees × 2GB node_modules = 10GB wasted. One Cursor user reported 9.82 GB consumed in 20 minutes.
Missing env files: .env, .envrc, .npmrc, secrets — none of them are tracked by git, so none of them exist in new worktrees.

One developer summed it up perfectly: "Git worktree gives you multiple working directories but they still share the same database, same ports, same Docker daemon — it solves code isolation, not environment isolation."

I Built the Missing Piece

workz is a Rust CLI that makes any git worktree a fully functional dev environment. One command. Zero config.

workz start feature/auth
# ✓ Created worktree
# ✓ Detected Node.js project (pnpm-lock.yaml)
# ✓ Symlinked node_modules, .next, .turbo (saved 2.1 GB)
# ✓ Copied .env, .env.local, .envrc, .npmrc
# ✓ Installed dependencies (pnpm install --frozen-lockfile)
# ✓ You're in the worktree. Ready to code.

What it does automatically

Smart project detection — workz detects your project type (Node/Rust/Python/Go/Java) and only syncs relevant directories. No config file needed.

Symlinks heavy directories — node_modules, target/, .venv, vendor/, .next, .nuxt, .angular, .gradle, build/, and 13 more. One copy on disk, symlinked everywhere. A typical Node project saves 1-3 GB per worktree.

Copies env files — 17 patterns: .env, .env.*, .envrc, .tool-versions, .node-version, .python-version, .npmrc, .yarnrc.yml, docker-compose.override.yml, secrets files. Everything you need, nothing lost.

Auto-installs dependencies — detects your lockfile (bun.lockb, pnpm-lock.yaml, yarn.lock, package-lock.json, uv.lock, poetry.lock, Pipfile.lock, requirements.txt) and runs the right install command automatically.

Docker support — workz start feature/api --docker runs docker compose up -d in the new worktree. workz done stops the containers.

AI-agent ready — workz start feature/auth --ai launches Claude Code directly in the worktree. Also supports Cursor and VS Code.

The new thing: `--isolated`

This is what nobody else has. Not Conductor, not Claude Squad, not anyone:

workz start feature/auth --isolated
# Everything above, PLUS:
# ✓ Assigned PORT=3001 (unique, no conflict)
# ✓ Set DB_NAME=myapp_feature_auth
# ✓ Set COMPOSE_PROJECT_NAME=myapp-feature-auth
# ✓ Wrote workspace-specific .env.local
# ✓ Docker containers are namespaced

Now you can run 5 worktrees with 5 dev servers, 5 databases, and 5 Docker stacks — zero conflicts. And when you're done:

workz done feature/auth
# ✓ Stopped Docker containers
# ✓ Released port 3001
# ✓ Removed worktree
# ✓ Optionally deleted branch

Full cleanup. No orphaned containers. No stale port allocations.

Works With Everything

workz isn't an orchestration tool. It's the environment layer that every orchestration tool is missing.

With Conductor — replace your 50-line setup script:

{
  "scripts": {
    "worktree": "workz sync --isolated",
    "setup": "workz sync --isolated"
  }
}

With Claude Squad — add to your session setup:

cs new --setup "workz sync --isolated"

With Claude Code agent teams — use as a post-worktree hook.

Or standalone — workz works perfectly on its own. No orchestrator required.

The Numbers

For a typical Node.js project with 5 parallel worktrees:

	Without workz	With workz
Disk usage	~12 GB	~3.5 GB
Setup time	3-5 min (npm install × 5)	<10 sec (symlink + copy)
Port conflicts	Guaranteed	Zero
Missing .env	Every time	Never
Cleanup effort	Manual rm + prune + docker down	`workz done`

Install

# Homebrew (macOS/Linux)
brew tap rohansx/tap && brew install workz

# Cargo
cargo install workz

Single binary. No runtime dependencies. Works on Linux, macOS, and Windows.

Add shell integration for auto-cd (like zoxide):

# ~/.zshrc or ~/.bashrc
eval "$(workz init zsh)"

Try It

# Basic: create a worktree with full environment
workz start feature/login

# With isolation: unique ports, Docker, DB naming
workz start feature/auth --isolated

# With AI: launch Claude Code in the worktree
workz start feature/api --ai --isolated --docker

# See all worktrees with status
workz list

# Clean up
workz done feature/login

Star the repo if this solves a pain point for you: github.com/rohansx/workz

workz is open source (MIT). Built in Rust. Contributions welcome.

I got tired of writing the same 80-line setup script for every AI worktree

Rohan Sharma — Tue, 03 Mar 2026 21:08:02 +0000

If you've been running parallel AI coding agents — Claude Code, Aider, Codex, anything — you've hit this wall.

Each agent needs its own worktree. Each worktree needs its own environment. And that means every time you spin one up, you're either manually editing .env files or you've got a setup script that looks something like this:

#!/bin/bash
# conductor.json worktree script — set up isolated environment
BRANCH_NAME="${CONDUCTOR_BRANCH_NAME}"
SLUG="${BRANCH_NAME//\//_}"
SLUG="${SLUG//-/_}"

BASE_PORT=3000
USED_PORTS=$(cat ~/.worktree-ports 2>/dev/null || echo "")
PORT=$BASE_PORT
while echo "$USED_PORTS" | grep -q "^$PORT$"; do
  PORT=$((PORT + 1))
done
echo "$PORT" >> ~/.worktree-ports

DB_NAME="myapp_${SLUG}"
COMPOSE_PROJECT="myapp_${SLUG}"
REDIS_PORT=$((PORT + 1000))

cat > .env.local << EOF
PORT=$PORT
DB_NAME=$DB_NAME
DATABASE_URL=postgres://localhost/$DB_NAME
COMPOSE_PROJECT_NAME=$COMPOSE_PROJECT
REDIS_URL=redis://localhost:$REDIS_PORT
EOF

createdb "$DB_NAME" 2>/dev/null || true
echo "Set up: PORT=$PORT DB=$DB_NAME"

That's 30 lines of bash, no error handling, no cleanup when you tear the worktree down, and it breaks the moment two worktrees race to write to ~/.worktree-ports at the same time.

I've seen this pattern everywhere. The Phoenix community wrote up their version. The folks at 10play wrote theirs. Everyone reinvents the same wheel.

What I built instead
I maintain workz — a Rust CLI for Git worktrees that handles zero-config dep syncing (symlinks node_modules, target, .venv automatically) and AI agent launching. Last week I shipped --isolated:

workz start feature/auth --isolated

Output:

creating worktree for branch 'feature/auth'
worktree created at /home/you/myapp--feature-auth
symlinked node_modules
isolated environment:
PORT=3001 → .env.local
DB_NAME=feature_auth
COMPOSE_PROJECT_NAME=feature_auth
REDIS_URL=redis://localhost:4001
ready!
That's it. It:

Picks the next available port starting from 3000 (atomic, no race conditions)
Sanitizes the branch name into a safe slug (feature/auth → feature_auth)
Writes .env.local with PORT, DB_NAME, DATABASE_URL, COMPOSE_PROJECT_NAME, REDIS_URL
Registers the allocation in ~/.config/workz/ports.json so no two worktrees ever collide
And when you're done:

workz done feature/auth --cleanup-db
Releases the port, removes the worktree, optionally drops the database. No orphaned ports. No stale registry entries.

Running 3 isolated agents in parallel
The reason I care about this: I run multiple Claude Code agents in parallel, each on a different task. Before --isolated, they'd all try to bind to port 3000. First one wins, the other two crash on startup.

Now:

workz fleet start \
--task "add OAuth2 login" \
--task "write integration tests" \
--task "refactor database layer" \
--agent claude \
--isolated
Three isolated worktrees, three unique ports, three separate databases, three .env.local files — all spun up in parallel. Each agent gets a fully isolated environment without touching the others.

workz status shows the full picture:


  main                /home/you/myapp [clean]         342K  2h ago
  feature/auth        /home/you/myapp--feature-auth   89M   5m ago  PORT:3001
  fix/tests           /home/you/myapp--fix-tests       91M   5m ago  PORT:3002
  refactor/db         /home/you/myapp--refactor-db     88M   5m ago  PORT:3003

The Linux angle
I built this partly because Conductor.build — the GUI for parallel Claude Code agents that's been getting a lot of attention — is Mac-only. Apple Silicon required. Linux support is "hopefully soon-ish, but not sure."

Every Linux developer who wants to run parallel AI agents has no Conductor. And even on Mac, Conductor's worktree setup is a bash script you write yourself — there's no environment engine built in.

workz is the answer for that gap:

Linux, Mac (Windows planned)
Open source — MIT, single Rust binary, cargo install workz
Zero-config — auto-detects Node/Rust/Python/Go/Java projects, symlinks deps automatically
--isolated — the environment engine Conductor doesn't have
It's not trying to be a GUI. It's terminal-native, which is what Linux developers actually want.

How the port registry works
The implementation is straightforward. A JSON file at ~/.config/workz/ports.json:

{ "base_port": 3000, "allocations": { "feature_auth": { "port": 3001, "branch": "feature/auth", "db_name": "feature_auth", "compose_project": "feature_auth", "allocated_at": "2026-03-03T20:00:00Z" } } }
On workz start --isolated: scan allocations for the next unused port, write the entry, write .env.local.
On workz done: remove the entry. Atomic reads and writes, no race conditions between concurrent workz start calls.

The branch slug sanitizer handles edge cases: feature/add-auth → feature_add_auth, collapses repeated separators, lowercases everything.

Install

Homebrew

brew tap rohansx/tapbrew install workz

Cargo

cargo install workzAdd shell integration to your .zshrc / .bashrc:

eval "$(workz init zsh)"
This gives you a workz shell function that auto-cds into the new worktree after workz start.

Repo: https://github.com/rohansx/workz

If you're on Mac and already using Conductor — workz works as your setup script. Drop workz sync --isolated into your conductor.json and you get the environment engine without changing your workflow.

I Built workz: The Zoxide for Git Worktrees That Finally Fixes .env + node_modules Hell in 2026

Rohan Sharma — Fri, 27 Feb 2026 19:34:18 +0000

I've been using git worktrees a lot lately — especially for running multiple Claude/Cursor AI agents in parallel without them stepping on each other's toes. The idea is great: fast branch switching, isolated dirs, no stashing mess.

But the reality sucks:

Untracked files like .env*, .npmrc, secrets, docker overrides get left behind every time.
Heavy folders (node_modules, target, .venv, caches, dist…) get duplicated → gigabytes wasted and 5–15 min waits for reinstalls.
No clean, zero-config tool handles symlinking/copying + fuzzy navigation.

So I built workz (Zoxide-inspired for worktrees) to fix exactly that.

What workz actually does

Automatically symlinks 22+ heavy dependency dirs (smart detection via package.json / Cargo.toml / pyproject.toml / go.mod etc.)
Copies env/config patterns (.env*, .secrets, .tool-versions…)
Fuzzy switcher with skim TUI → just type w to search and cd instantly
--ai flag to launch your AI coding agent directly in the new worktree
Optional .workz.toml for custom globs if needed
Single Rust binary (clap + skim + git2), MIT licensed, cross-platform (macOS/Linux focus)

It's fully open source, no tracking/telemetry, and designed to be extensible (Docker/podman hooks coming soon).

How it feels in real life (30-second workflow)


bash
# Create new worktree + launch AI
ws feature/login --ai

# workz prints the cd command for you
cd /Users/rohan/projects/my-app--feature-login

Repo: https://github.com/rohansx/workz

DEV Community: Rohan Sharma

Your AI Agent's API Keys Are Exposed. Here's the Structural Fix.

The Problem, Visualized

The Fix: Placeholder Tokens + Network-Layer Injection

How the Proxy Works

Per-Agent Isolation

Zero-Downtime Key Rotation

The Encryption Stack

Built-In Credential Scanner

MCP Integration: Agent-Native Credential Access

Rate Limiting: Blast Radius Control

What This Defeats

Quick Start

The Bigger Picture

Try It

SQLite as a Graph Database: Recursive CTEs, Semantic Search, and Why We Ditched Neo4j

The Problem: Too Much Infrastructure for a Knowledge Graph

The Bet: SQLite + Recursive CTEs = Graph Database

The Schema

Why Bi-Temporal Matters

Graph Traversal via Recursive CTE

How This Compares to Cypher

3-Mode Search Fusion

Reciprocal Rank Fusion (RRF)

Why Not Just Use Vector Search?

Performance: 8 Indexes and Why SQLite Scales

SQLite Scales Further Than You Think

When You'd Actually Need HNSW

The Results

When NOT to Use This

The Takeaway

Your RAG Pipeline is Leaking - 4 Data Leak Points Nobody Talks About

The 4 Leak Points

"But Embeddings Are Just Numbers"

Why Existing Solutions Don't Work

The Fix: Consistent Pseudonymization

Going Further: Kill 3/4 Leak Points

CloakPipe — Drop-In Privacy Proxy

Every AI Agent Tool Creates Git Worktrees. None of Them Make Worktrees Actually Work.

The Bash Scripts Nobody Talks About

The Real Problem: Code Isolation ≠ Environment Isolation

I Built the Missing Piece

What it does automatically

The new thing: --isolated

Works With Everything

The Numbers

Install

Try It

I got tired of writing the same 80-line setup script for every AI worktree

Homebrew

Cargo

I Built workz: The Zoxide for Git Worktrees That Finally Fixes .env + node_modules Hell in 2026

What workz actually does

How it feels in real life (30-second workflow)

The new thing: `--isolated`