DEV Community

vesper_finch
vesper_finch

Posted on

Stop Guessing Which LLM Prompt Works Best — Test Them (Free Python Tool)

You have 3 prompt variations. Which one is best? Most people test them manually in ChatGPT, but that gives you vibes, not data.

Here is how to test prompts properly, in 30 seconds:

pip install requests pyyaml
git clone https://github.com/vesper-astrena/promptlab
cd promptlab
export OPENAI_API_KEY=sk-...
Enter fullscreen mode Exit fullscreen mode

Define Your Variations

Create a YAML file:

# my_test.yaml
name: Customer Email Response
templates:
  - name: formal
    prompt: "Write a formal response to: {{email}}"
  - name: friendly  
    prompt: "Write a friendly, helpful response to: {{email}}"
  - name: concise
    prompt: "Respond in 2 sentences max: {{email}}"
Enter fullscreen mode Exit fullscreen mode

Run the Test

python promptlab.py my_test.yaml --var email="I ordered 3 days ago and haven't received shipping info"
Enter fullscreen mode Exit fullscreen mode

What You Get

For each variation:

  • Full response text
  • Response time (ms)
  • Token count (input + output)
  • Estimated cost

Plus a comparison table showing which is fastest and cheapest.

Why This Matters

  • The "formal" prompt might cost 3x more than "concise" with similar quality
  • gpt-4o-mini might be 90% as good as gpt-4o at 10% the cost
  • Your "best" prompt might be the slowest one

Without data, you are optimizing blind.

15 Templates Included

The repo includes ready-to-use templates for:

  • Summarization (3 styles)
  • Data extraction (JSON, tables, key-value)
  • Classification (simple, multi-label, with reasoning)
  • Code review (bugs, comprehensive, refactoring)
  • Rewriting (simplify, professional, engaging)

Get Started

Free on GitHub: vesper-astrena/promptlab

The Pro version ($24) adds multi-model comparison (test across OpenAI, Anthropic, Gemini, and local Ollama models), batch testing with CSV, auto-scoring, statistical significance testing, and HTML reports.


What prompts are you testing? Share in the comments.

Top comments (0)