<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chris Yao</title>
    <description>The latest articles on DEV Community by Chris Yao (@chrishohoho).</description>
    <link>https://dev.to/chrishohoho</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3842007%2F7f466cad-bc1a-4fcd-96bb-2fa749a9f8ff.png</url>
      <title>DEV Community: Chris Yao</title>
      <link>https://dev.to/chrishohoho</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chrishohoho"/>
    <language>en</language>
    <item>
      <title>I Built a CLI That X-Rays Your AI Coding Sessions — No LLM, &lt;5ms (Open Source)</title>
      <dc:creator>Chris Yao</dc:creator>
      <pubDate>Wed, 08 Apr 2026 14:01:00 +0000</pubDate>
      <link>https://dev.to/chrishohoho/i-built-a-cli-that-x-rays-your-ai-coding-sessions-no-llm-5ms-open-source-4l01</link>
      <guid>https://dev.to/chrishohoho/i-built-a-cli-that-x-rays-your-ai-coding-sessions-no-llm-5ms-open-source-4l01</guid>
      <description>&lt;p&gt;I score every prompt I send to AI coding tools. My average across 3,140 prompts over ten weeks: 38 out of 100.&lt;/p&gt;

&lt;p&gt;Not because I'm bad at prompting. Because at 2am debugging an auth bug, I type "fix the auth bug" and hit enter. Same intent as a well-structured prompt, completely different quality.&lt;/p&gt;

&lt;p&gt;So I built &lt;a href="https://github.com/ctxray/ctxray" rel="noopener noreferrer"&gt;ctxray&lt;/a&gt; — a CLI that analyzes how you actually use AI coding tools. Not what the AI outputs. What you type into it. Rule-based, local-only, under 5ms per prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before / After
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;ctxray check &lt;span class="s2"&gt;"fix the auth bug"&lt;/span&gt;

  DRAFT · 29

  Clarity     ███████░░░░░░░░░░░░░  9/25
  Context     ░░░░░░░░░░░░░░░░░░░░  0/25
  Position    ████████████████████ 20/20

  Auto-rewrite &lt;span class="o"&gt;(&lt;/span&gt;+24 pts&lt;span class="o"&gt;)&lt;/span&gt;
  ✓ Added debug prompt structure

  Rewritten:
    fix the auth bug
    Error: &amp;lt;&lt;span class="nb"&gt;paste &lt;/span&gt;the error message or stack trace&amp;gt;
    File: &amp;lt;which file and &lt;span class="k"&gt;function&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
    Expected: &amp;lt;what should happen vs what actually happens&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It detected "fix the auth bug" as a debug task and added the slots that debug prompts need. Implement prompts get I/O specs + edge cases. Refactor gets scope + constraints. Five task types, each with different structural scaffolding.&lt;/p&gt;

&lt;p&gt;The same prompt with actual context scores 58:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;ctxray check &lt;span class="s2"&gt;"Fix the NPE in auth.service.ts:47 when session expires,
  expected AuthException not HTTP 200"&lt;/span&gt;

  GOOD · 58

  Clarity     █████████████████░░░ 22/25
  Context     ████████████░░░░░░░░ 16/25
  Position    ████████████████████ 20/20
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same intent. Twice the score. The difference is file path, line number, error message — context that the model needs but I keep forgetting to include.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it actually does
&lt;/h2&gt;

&lt;p&gt;ctxray scans session files from 9 AI coding tools on your machine. Claude Code, Cursor, Aider, Gemini CLI, Cline, OpenClaw, Codex CLI, plus ChatGPT and Claude.ai web exports.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;ctxray

ctxray scan                    &lt;span class="c"&gt;# auto-discover sessions&lt;/span&gt;
ctxray check &lt;span class="s2"&gt;"your prompt"&lt;/span&gt;     &lt;span class="c"&gt;# score + lint + rewrite&lt;/span&gt;
ctxray insights                &lt;span class="c"&gt;# personal patterns vs research benchmarks&lt;/span&gt;
ctxray sessions                &lt;span class="c"&gt;# session quality + frustration signals&lt;/span&gt;
ctxray agent                   &lt;span class="c"&gt;# agent workflow efficiency&lt;/span&gt;
ctxray privacy &lt;span class="nt"&gt;--deep&lt;/span&gt;          &lt;span class="c"&gt;# find leaked API keys in sessions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ctxray insights&lt;/code&gt; is the one that surprised me most. It told me 32% of my prompts were near-duplicates — same structure, different variable names. I'm asking the same thing across sessions without remembering I figured it out last week.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ctxray sessions&lt;/code&gt; scores entire sessions and detects frustration signals — error loops where the same fix gets tried 3+ times, repetitive prompts that signal the model isn't understanding, and sessions where &amp;gt;60% of turns are filler.&lt;/p&gt;

&lt;h2&gt;
  
  
  The compression engine
&lt;/h2&gt;

&lt;p&gt;Your prompts probably contain more filler than you think:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;ctxray compress &lt;span class="s2"&gt;"I was wondering if you could please help me refactor
  the authentication middleware to use JWT tokens instead of session
  cookies. Basically what I need is for the current implementation in
  src/auth/middleware.ts to be updated."&lt;/span&gt;

  Tokens: 50 → 33 &lt;span class="o"&gt;(&lt;/span&gt;34% saved&lt;span class="o"&gt;)&lt;/span&gt;
  Research: Moderate compression improves LLM output &lt;span class="o"&gt;(&lt;/span&gt;Zhang+ 2505.00019&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four layers: character normalization, phrase simplification, filler deletion, structure cleanup. All regex. Works for English and Chinese.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why rule-based in 2026?
&lt;/h2&gt;

&lt;p&gt;Everyone's using LLMs to analyze prompts. I went the other way. Three reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speed.&lt;/strong&gt; Under 5ms per prompt. Runs in a pre-commit hook and nobody notices. An LLM call takes 2-5 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Determinism.&lt;/strong&gt; Same input, same output, every time. I track my scores weekly — if the scoring function shifts with model version, the trend is meaningless. I use &lt;code&gt;ctxray lint --score-threshold 50&lt;/code&gt; in CI. Random failures from a creative LLM would not be fun.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Privacy.&lt;/strong&gt; My prompts contain file paths, function names, error messages with stack traces, sometimes credentials I forgot to redact. That's a map of my codebase. Sending it to another LLM for "improvement" defeats the purpose.&lt;/p&gt;

&lt;p&gt;The tradeoff is real. Structural signals miss semantic intent. An LLM would understand "this is where I changed my whole approach." My heuristics just see a long turn with high vocabulary shift. For daily feedback loops, structural analysis catches 80-90% of what matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  The research behind the scoring
&lt;/h2&gt;

&lt;p&gt;The scoring engine uses 30+ features calibrated against 10 NLP papers. Not as decoration — each paper maps to a specific dimension:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Position bias is architectural&lt;/strong&gt; (Stanford 2307.03172, confirmed by Chowdhury 2603.10123): models weight beginnings and ends of prompts more than the middle. Front-load your instructions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Moderate repetition helps&lt;/strong&gt; (Google 2512.14982): repeating key requirements at the end improves recall up to 76%. But excessive repetition hurts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specificity &amp;gt; length&lt;/strong&gt; (Zi+ 2508.03678): file paths, line numbers, error messages improve output more than verbose explanations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Works in CI
&lt;/h2&gt;

&lt;p&gt;The part that makes this sticky: &lt;code&gt;ctxray lint&lt;/code&gt; runs as a CI quality gate.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# GitHub Action&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ctxray/ctxray-action@v1&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;score-threshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's a &lt;code&gt;.ctxray.toml&lt;/code&gt; config for team rules, a &lt;code&gt;--format github&lt;/code&gt; flag that posts score breakdowns as PR comments, and a pre-commit hook. Think ESLint for AI prompts — configurable, deterministic, fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why now
&lt;/h2&gt;

&lt;p&gt;Every prompt analysis tool is getting acquired. Promptfoo joined OpenAI (March 2026). Humanloop was acqui-hired by Anthropic. PromptPerfect got absorbed into Jina. The tools that measured model behavior now belong to the labs that make models.&lt;/p&gt;

&lt;p&gt;ctxray stays independent and model-agnostic. It's the only tool that sees your Claude Code sessions and your Cursor sessions and your ChatGPT history together, locally, without sending anything anywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I haven't figured out
&lt;/h2&gt;

&lt;p&gt;The scoring handles maybe 30% of what makes a good prompt. The other 70% is stuff only you know — the error message on your screen, the file you just edited, the approach you already tried. No tool can add that for you.&lt;/p&gt;

&lt;p&gt;I also don't think the scoring is "right" yet. A 3-word prompt from someone deep in a debugging session can be more effective than a 200-word structured request from someone who doesn't understand the codebase. Context that lives in your head doesn't show up in a score.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;ctxray
ctxray demo              &lt;span class="c"&gt;# try with built-in sample data&lt;/span&gt;
ctxray scan              &lt;span class="c"&gt;# discover your sessions&lt;/span&gt;
ctxray check &lt;span class="s2"&gt;"your prompt here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;1,941 tests, strict mypy, MIT licensed. Everything local, no account, no telemetry.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/ctxray/ctxray" class="crayons-btn crayons-btn--primary" rel="noopener noreferrer"&gt;Star on GitHub&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;What do your prompts look like when you actually measure them? I'm genuinely curious whether people who use CLAUDE.md files or cursor rules have noticeably different patterns.&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>opensource</category>
      <category>ai</category>
      <category>productivity</category>
    </item>
    <item>
      <title>This CLI Rewrites Your AI Prompts — No LLM, No API, 50ms (Open Source)</title>
      <dc:creator>Chris Yao</dc:creator>
      <pubDate>Wed, 01 Apr 2026 11:29:23 +0000</pubDate>
      <link>https://dev.to/chrishohoho/this-cli-rewrites-your-ai-prompts-no-llm-no-api-50ms-open-source-30p6</link>
      <guid>https://dev.to/chrishohoho/this-cli-rewrites-your-ai-prompts-no-llm-no-api-50ms-open-source-30p6</guid>
      <description>&lt;p&gt;I score every prompt I send to Claude Code. My average is 38 out of 100.&lt;/p&gt;

&lt;p&gt;Not because I'm bad at prompting — because I'm human. At 2am debugging an auth bug, I don't carefully structure my request. I type "fix the auth bug" and hit enter.&lt;/p&gt;

&lt;p&gt;I built a scoring engine. Then a compression engine. They told me &lt;em&gt;what was wrong&lt;/em&gt; but didn't fix anything. So I built the part I actually wanted: a rewrite engine that takes a lazy prompt and makes it better. No LLM. No API call. Just rules extracted from NLP papers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before / After
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;reprompt rewrite &lt;span class="s2"&gt;"I was wondering if you could maybe help me fix the authentication bug that seems to be kind of broken"&lt;/span&gt;

  34 → 56 &lt;span class="o"&gt;(&lt;/span&gt;+22&lt;span class="o"&gt;)&lt;/span&gt;

  ╭─ Rewritten ────────────────────────────────────────╮
  │ Help me fix the authentication bug that seems to   │
  │ be broken.                                         │
  ╰────────────────────────────────────────────────────╯

  Changes
  ✓ Removed filler &lt;span class="o"&gt;(&lt;/span&gt;18% shorter&lt;span class="o"&gt;)&lt;/span&gt;
  ✓ Removed hedging language

  You should also
  → Add actual code snippets or error messages &lt;span class="k"&gt;for &lt;/span&gt;context
  → Reference specific files or functions by name
  → Add constraints &lt;span class="o"&gt;(&lt;/span&gt;e.g., &lt;span class="s2"&gt;"Do not modify existing tests"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "You should also" section is honestly the most useful part. The machine handles what it can — filler removal, restructuring — and tells you what only a human can add.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Rewriter Does
&lt;/h2&gt;

&lt;p&gt;Four transformations, applied in order:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Strip filler.&lt;/strong&gt; "Please help me with", "basically what I need is", "I would like you to" — these add tokens without adding information. 40+ English rules, 40+ Chinese rules (reuses the compression engine).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Front-load instructions.&lt;/strong&gt; If your key ask is buried in the middle, it moves it to the front. This matters: Stanford's "Lost in the Middle" paper found models recall instructions at the start 2-3x better than instructions in the middle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Echo key requirements.&lt;/strong&gt; For long prompts (40+ words) with low repetition, the main instruction gets repeated at the end. Google Research (arXiv:2512.14982) found moderate repetition improves recall by up to 76%. This only fires when the prompt is long enough that the model might lose the thread.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Remove hedging.&lt;/strong&gt; "Maybe", "perhaps", "I was wondering", "kind of", "sort of". These weaken the instruction signal without adding information. 12 regex patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Not Use an LLM to Rewrite?
&lt;/h2&gt;

&lt;p&gt;I thought about it. Three reasons I went rule-based:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's fast.&lt;/strong&gt; Under 50ms. You can run it in a pre-commit hook or CI pipeline and nobody notices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's deterministic.&lt;/strong&gt; Same input, same output. I actually use &lt;code&gt;reprompt lint&lt;/code&gt; in CI with a score threshold — if I used an LLM rewriter, my CI would randomly fail on Tuesdays because GPT was feeling creative.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's private.&lt;/strong&gt; My prompts contain production error messages, internal file paths, sometimes API keys I forgot to redact. That's exactly the kind of thing I don't want sending to another LLM for "improvement."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Broader Toolkit
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;rewrite&lt;/code&gt; is one command. Here's what else is in the box:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;reprompt check &lt;span class="s2"&gt;"your prompt"&lt;/span&gt;          &lt;span class="c"&gt;# full diagnostic: score + lint + rewrite&lt;/span&gt;
reprompt build &lt;span class="s2"&gt;"task"&lt;/span&gt; &lt;span class="nt"&gt;--file&lt;/span&gt; auth.ts  &lt;span class="c"&gt;# assemble a prompt from components&lt;/span&gt;
reprompt compress &lt;span class="s2"&gt;"your prompt"&lt;/span&gt;       &lt;span class="c"&gt;# save 40-60% tokens&lt;/span&gt;
reprompt scan                         &lt;span class="c"&gt;# discover sessions from 9 AI tools&lt;/span&gt;
reprompt privacy &lt;span class="nt"&gt;--deep&lt;/span&gt;               &lt;span class="c"&gt;# find leaked API keys in sessions&lt;/span&gt;
reprompt lint &lt;span class="nt"&gt;--score-threshold&lt;/span&gt; 50    &lt;span class="c"&gt;# CI quality gate (GitHub Action included)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Auto-discovers sessions from Claude Code, Cursor, Aider, Codex CLI, Gemini CLI, Cline, and OpenClaw. ChatGPT and Claude.ai via export. Browser extension shows a live score badge as you type — click it for inline suggestions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I still haven't figured out
&lt;/h2&gt;

&lt;p&gt;The rewriter handles maybe 30% of what makes a good prompt. The other 70% is stuff only you know — the error message you're staring at, the file you just edited, the thing you tried that didn't work. No tool can add that for you.&lt;/p&gt;

&lt;p&gt;I also don't think the scoring is "right" yet. A 3-word prompt from someone deep in a debugging session can be more effective than a beautifully structured 200-word request from someone who doesn't understand the codebase. Context that lives in your head doesn't show up in a score.&lt;/p&gt;

&lt;p&gt;The weights are calibrated against 4 NLP papers, but papers study prompts in isolation. Real prompting happens in the middle of a conversation, at 2am, when you've already explained the problem three times. I'm not sure how to score that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;reprompt-cli
reprompt check &lt;span class="s2"&gt;"your worst prompt"&lt;/span&gt;
reprompt rewrite &lt;span class="s2"&gt;"your worst prompt"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;MIT, local-only, 1,800+ tests. &lt;a href="https://github.com/reprompt-dev/reprompt" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; · &lt;a href="https://pypi.org/project/reprompt-cli/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Honestly curious: do you think about your prompts before sending them, or is it more stream-of-consciousness? I've been tracking mine for months and I still default to lazy prompts when I'm tired. Starting to think that's just how humans work.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Audited 1,000+ Prompts I Sent to AI Coding Tools. Here's What I Found.</title>
      <dc:creator>Chris Yao</dc:creator>
      <pubDate>Sun, 29 Mar 2026 05:13:43 +0000</pubDate>
      <link>https://dev.to/chrishohoho/i-audited-1000-prompts-i-sent-to-ai-coding-tools-heres-what-i-found-4bp9</link>
      <guid>https://dev.to/chrishohoho/i-audited-1000-prompts-i-sent-to-ai-coding-tools-heres-what-i-found-4bp9</guid>
      <description>&lt;p&gt;I've been using AI coding tools daily for months. Claude Code, Cursor, Codex CLI, sometimes Aider. By rough estimate, I've sent over a thousand prompts to various AI services.&lt;/p&gt;

&lt;p&gt;Recently I built a tool to answer a simple question: &lt;strong&gt;what exactly did I send?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer was uncomfortable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Finding 1: Leaked Credentials
&lt;/h3&gt;

&lt;p&gt;Running &lt;code&gt;reprompt privacy --deep&lt;/code&gt; on my prompt history surfaced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;3 API keys&lt;/strong&gt; (OpenAI, GitHub, one internal service)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1 JWT token&lt;/strong&gt; (from a debugging session)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;12 email addresses&lt;/strong&gt; (from log outputs I pasted)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;47 internal file paths&lt;/strong&gt; (including home directory paths)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these were pasted intentionally. They were in error messages, stack traces, and log outputs that I copy-pasted when asking the AI for help debugging. The typical pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"Fix this error: AuthenticationError: Invalid API key 'sk-proj-...' for model gpt-4"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That prompt just sent my API key to whatever service processes it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Finding 2: Agent Error Loops
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;reprompt agent&lt;/code&gt; analyzes Claude Code and Codex CLI sessions for workflow efficiency. It fingerprints each tool call (tool name + target file + error flag) and detects when the agent gets stuck in a loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My error loop rate: 35%.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That means in over a third of my agent sessions, the AI got stuck retrying the same failing approach three or more times. The most common pattern: &lt;code&gt;Bash(test.py):error -&amp;gt; Edit(auth.py) -&amp;gt; Bash(test.py):error&lt;/code&gt; — edit a file, run the test, fail, edit, test, fail.&lt;/p&gt;

&lt;p&gt;The agent burned tokens and time on approaches that clearly weren't working. Knowing this changed how I intervene in agent sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Finding 3: Most Conversation Turns Are Filler
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;reprompt distill&lt;/code&gt; scores every conversation turn using 6 signals (position, length, tool trigger, error recovery, topic shift, vocabulary uniqueness).&lt;/p&gt;

&lt;p&gt;Result: &lt;strong&gt;50-70% of my turns carry near-zero information.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"ok try that", "continue", "looks good", "hmm interesting" — these are the prompting equivalent of "um" and "uh." They don't guide the AI in any useful direction. The actually productive turns — the ones that specify files, constraints, and context — typically make up only 15-20 turns out of a 100-turn session.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Privacy Angle
&lt;/h3&gt;

&lt;p&gt;The EU AI Act took effect in August 2025. Organizations are increasingly required to understand what data flows to AI services. But most developers have no visibility into what they've actually sent.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;reprompt privacy&lt;/code&gt; shows a per-tool breakdown: which adapter (Claude Code, Cursor, ChatGPT) received which types of content. &lt;code&gt;reprompt privacy --deep&lt;/code&gt; goes further and scans for 12 categories of sensitive content: API keys (OpenAI, AWS, GitHub, Anthropic, Stripe), JWT tokens, emails, IP addresses, password assignments, environment secrets, and home directory paths.&lt;/p&gt;

&lt;p&gt;All detection is regex-based. Zero network calls. Your prompts never leave your machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;reprompt reads session files that AI tools already store locally:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Location&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Code&lt;/td&gt;
&lt;td&gt;JSONL&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.claude/projects/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex CLI&lt;/td&gt;
&lt;td&gt;JSONL&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.codex/sessions/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;td&gt;SQLite&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.cursor/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Aider&lt;/td&gt;
&lt;td&gt;Markdown&lt;/td&gt;
&lt;td&gt;&lt;code&gt;.aider.chat.history.md&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini CLI&lt;/td&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.gemini/tmp/&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;No instrumentation required. No code changes. Just:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;reprompt-cli
reprompt scan
reprompt privacy &lt;span class="nt"&gt;--deep&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The scoring engine is calibrated against 4 NLP research papers. The agent analyzer builds tool call fingerprints and detects repetition patterns. The distiller uses TF-IDF cosine similarity for topic shift detection. Everything runs in &amp;lt;50ms for a typical session.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I Changed
&lt;/h3&gt;

&lt;p&gt;After running reprompt on my history:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I stopped copy-pasting full error messages with credentials. Instead, I redact API keys before pasting.&lt;/li&gt;
&lt;li&gt;I intervene earlier in agent sessions when I see the same test failing twice.&lt;/li&gt;
&lt;li&gt;My debug prompts went from averaging 31/100 to 52/100 — not from trying harder, just from seeing the score.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Try It
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;reprompt-cli
reprompt scan                     &lt;span class="c"&gt;# discover sessions from installed AI tools&lt;/span&gt;
reprompt                          &lt;span class="c"&gt;# see your dashboard&lt;/span&gt;
reprompt privacy &lt;span class="nt"&gt;--deep&lt;/span&gt;           &lt;span class="c"&gt;# scan for leaked credentials&lt;/span&gt;
reprompt agent &lt;span class="nt"&gt;--last&lt;/span&gt; 5           &lt;span class="c"&gt;# analyze recent agent sessions&lt;/span&gt;
reprompt distill &lt;span class="nt"&gt;--last&lt;/span&gt; 3         &lt;span class="c"&gt;# extract important turns&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;1,529 tests. MIT license. Zero network calls. Supports 9 AI tools.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/reprompt-dev/reprompt" rel="noopener noreferrer"&gt;reprompt-dev/reprompt&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What would your numbers look like?&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>python</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
