DEV Community

Cover image for This CLI Rewrites Your AI Prompts — No LLM, No API, 50ms (Open Source)
Chris Yao
Chris Yao

Posted on

This CLI Rewrites Your AI Prompts — No LLM, No API, 50ms (Open Source)

I score every prompt I send to Claude Code. My average is 38 out of 100.

Not because I'm bad at prompting — because I'm human. At 2am debugging an auth bug, I don't carefully structure my request. I type "fix the auth bug" and hit enter.

I built a scoring engine. Then a compression engine. They told me what was wrong but didn't fix anything. So I built the part I actually wanted: a rewrite engine that takes a lazy prompt and makes it better. No LLM. No API call. Just rules extracted from NLP papers.

Before / After

$ reprompt rewrite "I was wondering if you could maybe help me fix the authentication bug that seems to be kind of broken"

  34 → 56 (+22)

  ╭─ Rewritten ────────────────────────────────────────╮
  │ Help me fix the authentication bug that seems to   │
  │ be broken.                                         │
  ╰────────────────────────────────────────────────────╯

  Changes
  ✓ Removed filler (18% shorter)
  ✓ Removed hedging language

  You should also
  → Add actual code snippets or error messages for context
  → Reference specific files or functions by name
  → Add constraints (e.g., "Do not modify existing tests")
Enter fullscreen mode Exit fullscreen mode

The "You should also" section is honestly the most useful part. The machine handles what it can — filler removal, restructuring — and tells you what only a human can add.

What the Rewriter Does

Four transformations, applied in order:

1. Strip filler. "Please help me with", "basically what I need is", "I would like you to" — these add tokens without adding information. 40+ English rules, 40+ Chinese rules (reuses the compression engine).

2. Front-load instructions. If your key ask is buried in the middle, it moves it to the front. This matters: Stanford's "Lost in the Middle" paper found models recall instructions at the start 2-3x better than instructions in the middle.

3. Echo key requirements. For long prompts (40+ words) with low repetition, the main instruction gets repeated at the end. Google Research (arXiv:2512.14982) found moderate repetition improves recall by up to 76%. This only fires when the prompt is long enough that the model might lose the thread.

4. Remove hedging. "Maybe", "perhaps", "I was wondering", "kind of", "sort of". These weaken the instruction signal without adding information. 12 regex patterns.

Why Not Use an LLM to Rewrite?

I thought about it. Three reasons I went rule-based:

It's fast. Under 50ms. You can run it in a pre-commit hook or CI pipeline and nobody notices.

It's deterministic. Same input, same output. I actually use reprompt lint in CI with a score threshold — if I used an LLM rewriter, my CI would randomly fail on Tuesdays because GPT was feeling creative.

It's private. My prompts contain production error messages, internal file paths, sometimes API keys I forgot to redact. That's exactly the kind of thing I don't want sending to another LLM for "improvement."

The Broader Toolkit

rewrite is one command. Here's what else is in the box:

reprompt check "your prompt"          # full diagnostic: score + lint + rewrite
reprompt build "task" --file auth.ts  # assemble a prompt from components
reprompt compress "your prompt"       # save 40-60% tokens
reprompt scan                         # discover sessions from 9 AI tools
reprompt privacy --deep               # find leaked API keys in sessions
reprompt lint --score-threshold 50    # CI quality gate (GitHub Action included)
Enter fullscreen mode Exit fullscreen mode

Auto-discovers sessions from Claude Code, Cursor, Aider, Codex CLI, Gemini CLI, Cline, and OpenClaw. ChatGPT and Claude.ai via export. Browser extension shows a live score badge as you type — click it for inline suggestions.

What I still haven't figured out

The rewriter handles maybe 30% of what makes a good prompt. The other 70% is stuff only you know — the error message you're staring at, the file you just edited, the thing you tried that didn't work. No tool can add that for you.

I also don't think the scoring is "right" yet. A 3-word prompt from someone deep in a debugging session can be more effective than a beautifully structured 200-word request from someone who doesn't understand the codebase. Context that lives in your head doesn't show up in a score.

The weights are calibrated against 4 NLP papers, but papers study prompts in isolation. Real prompting happens in the middle of a conversation, at 2am, when you've already explained the problem three times. I'm not sure how to score that.

Try it

pip install reprompt-cli
reprompt check "your worst prompt"
reprompt rewrite "your worst prompt"
Enter fullscreen mode Exit fullscreen mode

MIT, local-only, 1,800+ tests. GitHub · PyPI


Honestly curious: do you think about your prompts before sending them, or is it more stream-of-consciousness? I've been tracking mine for months and I still default to lazy prompts when I'm tired. Starting to think that's just how humans work.

Top comments (1)

Collapse
 
ali_muwwakkil_a776a21aa9c profile image
Ali Muwwakkil

One surprising insight we've noticed is that the biggest gains in prompt quality often come from understanding the "why" behind agent behaviors, not just tweaking the prompts themselves. In our experience with enterprise teams, focusing on how agents interpret context can push average scores significantly higher. This approach enables more effective prompt rewrites and better alignment with model expectations, leading to dramatic improvements in performance. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)