Skip to content

DEV Community

Shinsuke KAGAWA

Posted on Jan 22

Stop Guessing If Your Prompt Is Better

#productivity #promptengineering #ai #tutorial

You rewrote your prompt. The output looks different. But is it actually better?

Most of us have been there — reading prompt engineering best practices, tweaking instructions, and hoping the changes help. But without comparison, you're just guessing.

The Problem

When you improve a prompt, you typically:

Run the new version
Look at the output
Think "yeah, this seems better"

But you're comparing against your memory of the old output. Different runs produce different results anyway. How do you know the improvement came from your changes and not just LLM variance?

What I Built

rashomon is a Claude Code plugin that focuses on one practical question: "Did my instruction change actually affect the result?"

It analyzes your prompt, generates an optimized version, runs both in isolated environments, and compares the actual results.

Real Example

I ran this prompt through rashomon:

Add logging to track function calling usage

A reasonable instruction. But vague.

What rashomon detected

Issue	Detail
Vague instructions	What, where, and why to log are unclear
No output format	Log structure not specified
Missing context	No project architecture information

The optimized prompt

## Context

This is a Slack bot using Google Gemini API with function calling. 
The project uses a shared `logger` utility with structured logging.
Function calling flows through:
1. `GeminiService.executeWithRetry()` - detects function calls
2. `FunctionHandler.handleFunctionCall()` - executes them

## Task

Add logging to track function calling usage for analytics and debugging.

## Requirements

At Function Call Detection (GeminiService):
- Function name(s) detected
- Number of function calls in response

At Function Execution (FunctionHandler):
- Parameters passed (sanitized - exclude sensitive data)
- Execution duration
- Result status (success/failure)

## Output Format

logger.info('Function call detected', {
  functionName: 'executeWithRetry',
  detectedFunctions: ['searchNotionPages'],
  functionCallCount: 1
})

What changed

Aspect	Original	Optimized
Logging Scope	1 stage (execution only)	2 stages (detection + execution)
Parameter Sanitization	None	Passwords, tokens, secrets redacted
Files Modified	2	2

The original prompt looked reasonable, but led the agent to log at only one point. The optimized version covered both detection and execution — with security considerations the original didn't address.

Classification: Structural Improvement

About Variance

Not every difference is an improvement. rashomon distinguishes between structural gains and mere variance.

I tried to create a Variance example — a prompt so clear that optimization wouldn't matter. I couldn't. In practice, the same vague prompt sometimes works beautifully, sometimes completely misses the point.

rashomon just makes that inconsistency visible.

Try It

Requires Claude Code.

claude
/plugin marketplace add shinpr/rashomon
/plugin install rashomon@rashomon
# Restart session
/rashomon Your prompt here

shinpr / rashomon

Compare, improve, and verify prompt changes with evidence — not vibes.

See what actually changes when you improve your prompts — not just different wording.

Why rashomon?

Inspired by the Rashomon effect — the idea that the same event can produce different outcomes depending on perspective rashomon makes those differences explicit and comparable.

Spending too much time on trial-and-error with prompts?
Read best practices but not sure how they apply to your case?
Want proof that your changes actually made things better?

rashomon analyzes, improves, and compares prompts—so you can see what actually changed, and whether it matters.

Who Is This For?

rashomon is designed for:

Developers using Claude Code daily
Teams iterating on complex prompts (coding, analysis, writing)
Anyone who wants evidence, not vibes, when improving prompts

Not ideal if:

You don't use git
You want one-shot prompt rewriting without comparison

Quick Example

/rashomon Write a function to sort an array

What You Get

1. Detected Issues

- BP-002

…

Top comments (0)

Subscribe