Chris Yao

Posted on Apr 1

This CLI Rewrites Your AI Prompts — No LLM, No API, 50ms (Open Source)

#python #ai #opensource #productivity

I score every prompt I send to Claude Code. My average is 38 out of 100.

Not because I'm bad at prompting — because I'm human. At 2am debugging an auth bug, I don't carefully structure my request. I type "fix the auth bug" and hit enter.

I built a scoring engine. Then a compression engine. They told me what was wrong but didn't fix anything. So I built the part I actually wanted: a rewrite engine that takes a lazy prompt and makes it better. No LLM. No API call. Just rules extracted from NLP papers.

Before / After

$ reprompt rewrite "I was wondering if you could maybe help me fix the authentication bug that seems to be kind of broken"

  34 → 56 (+22)

  ╭─ Rewritten ────────────────────────────────────────╮
  │ Help me fix the authentication bug that seems to   │
  │ be broken.                                         │
  ╰────────────────────────────────────────────────────╯

  Changes
  ✓ Removed filler (18% shorter)
  ✓ Removed hedging language

  You should also
  → Add actual code snippets or error messages for context
  → Reference specific files or functions by name
  → Add constraints (e.g., "Do not modify existing tests")

The "You should also" section is honestly the most useful part. The machine handles what it can — filler removal, restructuring — and tells you what only a human can add.

What the Rewriter Does

Four transformations, applied in order:

1. Strip filler. "Please help me with", "basically what I need is", "I would like you to" — these add tokens without adding information. 40+ English rules, 40+ Chinese rules (reuses the compression engine).

2. Front-load instructions. If your key ask is buried in the middle, it moves it to the front. This matters: Stanford's "Lost in the Middle" paper found models recall instructions at the start 2-3x better than instructions in the middle.

3. Echo key requirements. For long prompts (40+ words) with low repetition, the main instruction gets repeated at the end. Google Research (arXiv:2512.14982) found moderate repetition improves recall by up to 76%. This only fires when the prompt is long enough that the model might lose the thread.

4. Remove hedging. "Maybe", "perhaps", "I was wondering", "kind of", "sort of". These weaken the instruction signal without adding information. 12 regex patterns.

Why Not Use an LLM to Rewrite?

I thought about it. Three reasons I went rule-based:

It's fast. Under 50ms. You can run it in a pre-commit hook or CI pipeline and nobody notices.

It's deterministic. Same input, same output. I actually use reprompt lint in CI with a score threshold — if I used an LLM rewriter, my CI would randomly fail on Tuesdays because GPT was feeling creative.

It's private. My prompts contain production error messages, internal file paths, sometimes API keys I forgot to redact. That's exactly the kind of thing I don't want sending to another LLM for "improvement."

The Broader Toolkit

rewrite is one command. Here's what else is in the box:

reprompt check "your prompt"          # full diagnostic: score + lint + rewrite
reprompt build "task" --file auth.ts  # assemble a prompt from components
reprompt compress "your prompt"       # save 40-60% tokens
reprompt scan                         # discover sessions from 9 AI tools
reprompt privacy --deep               # find leaked API keys in sessions
reprompt lint --score-threshold 50    # CI quality gate (GitHub Action included)

Auto-discovers sessions from Claude Code, Cursor, Aider, Codex CLI, Gemini CLI, Cline, and OpenClaw. ChatGPT and Claude.ai via export. Browser extension shows a live score badge as you type — click it for inline suggestions.

What I still haven't figured out

The rewriter handles maybe 30% of what makes a good prompt. The other 70% is stuff only you know — the error message you're staring at, the file you just edited, the thing you tried that didn't work. No tool can add that for you.

I also don't think the scoring is "right" yet. A 3-word prompt from someone deep in a debugging session can be more effective than a beautifully structured 200-word request from someone who doesn't understand the codebase. Context that lives in your head doesn't show up in a score.

The weights are calibrated against 4 NLP papers, but papers study prompts in isolation. Real prompting happens in the middle of a conversation, at 2am, when you've already explained the problem three times. I'm not sure how to score that.

Try it

pip install reprompt-cli
reprompt check "your worst prompt"
reprompt rewrite "your worst prompt"

MIT, local-only, 1,800+ tests. GitHub · PyPI

Honestly curious: do you think about your prompts before sending them, or is it more stream-of-consciousness? I've been tracking mine for months and I still default to lazy prompts when I'm tired. Starting to think that's just how humans work.

Top comments (3)

Chris Yao • Apr 1

Good point — structural fixes (filler, hedging) only get you from 30 to ~55. The real jump comes from understanding what type of task you're doing and what context is missing.

Latest version actually detects this — debug prompts get Error: , implement prompts get Edge cases: . Task-specific, not generic.

The part I haven't cracked: conversation context. A prompt that scores 25 in isolation might be fine as turn 15 in a session where you've already shared everything. Curious how you approach that with enterprise teams.

Some comments may only be visible to logged-in visitors. Sign in to view all comments. Some comments have been hidden by the post's author - find out more