DEV Community

Cover image for Stop Guessing If Your Prompt Is Better
Shinsuke KAGAWA
Shinsuke KAGAWA

Posted on

Stop Guessing If Your Prompt Is Better

You rewrote your prompt. The output looks different. But is it actually better?

Most of us have been there — reading prompt engineering best practices, tweaking instructions, and hoping the changes help. But without comparison, you're just guessing.

The Problem

When you improve a prompt, you typically:

  1. Run the new version
  2. Look at the output
  3. Think "yeah, this seems better"

But you're comparing against your memory of the old output. Different runs produce different results anyway. How do you know the improvement came from your changes and not just LLM variance?

What I Built

rashomon is a Claude Code plugin that focuses on one practical question: "Did my instruction change actually affect the result?"

It analyzes your prompt, generates an optimized version, runs both in isolated environments, and compares the actual results.

Real Example

I ran this prompt through rashomon:

Add logging to track function calling usage
Enter fullscreen mode Exit fullscreen mode

A reasonable instruction. But vague.

What rashomon detected

Issue Detail
Vague instructions What, where, and why to log are unclear
No output format Log structure not specified
Missing context No project architecture information

The optimized prompt

## Context

This is a Slack bot using Google Gemini API with function calling. 
The project uses a shared `logger` utility with structured logging.
Function calling flows through:
1. `GeminiService.executeWithRetry()` - detects function calls
2. `FunctionHandler.handleFunctionCall()` - executes them

## Task

Add logging to track function calling usage for analytics and debugging.

## Requirements

At Function Call Detection (GeminiService):
- Function name(s) detected
- Number of function calls in response

At Function Execution (FunctionHandler):
- Parameters passed (sanitized - exclude sensitive data)
- Execution duration
- Result status (success/failure)

## Output Format

logger.info('Function call detected', {
  functionName: 'executeWithRetry',
  detectedFunctions: ['searchNotionPages'],
  functionCallCount: 1
})
Enter fullscreen mode Exit fullscreen mode

What changed

Aspect Original Optimized
Logging Scope 1 stage (execution only) 2 stages (detection + execution)
Parameter Sanitization None Passwords, tokens, secrets redacted
Files Modified 2 2

The original prompt looked reasonable, but led the agent to log at only one point. The optimized version covered both detection and execution — with security considerations the original didn't address.

Classification: Structural Improvement

About Variance

Not every difference is an improvement. rashomon distinguishes between structural gains and mere variance.

I tried to create a Variance example — a prompt so clear that optimization wouldn't matter. I couldn't. In practice, the same vague prompt sometimes works beautifully, sometimes completely misses the point.

rashomon just makes that inconsistency visible.

Try It

Requires Claude Code.

claude
/plugin marketplace add shinpr/rashomon
/plugin install rashomon@rashomon
# Restart session
/rashomon Your prompt here
Enter fullscreen mode Exit fullscreen mode

GitHub logo shinpr / rashomon

Compare, improve, and verify prompt changes with evidence — not vibes.

Rashomon

Claude Code License

See what actually changes when you improve your prompts — not just different wording.

Why rashomon?

Inspired by the Rashomon effect — the idea that the same event can produce different outcomes depending on perspective rashomon makes those differences explicit and comparable.

  • Spending too much time on trial-and-error with prompts?
  • Read best practices but not sure how they apply to your case?
  • Want proof that your changes actually made things better?

rashomon analyzes, improves, and compares prompts—so you can see what actually changed, and whether it matters.

Who Is This For?

rashomon is designed for:

  • Developers using Claude Code daily
  • Teams iterating on complex prompts (coding, analysis, writing)
  • Anyone who wants evidence, not vibes, when improving prompts

Not ideal if:

  • You don't use git
  • You want one-shot prompt rewriting without comparison

Quick Example

/rashomon Write a function to sort an array

What You Get

1. Detected Issues

- BP-002

Top comments (0)