Stop Guessing Which LLM Prompt Works Best — Test Them (Free Python Tool)

#ai #tutorial #python #productivity

You have 3 prompt variations. Which one is best? Most people test them manually in ChatGPT, but that gives you vibes, not data.

Here is how to test prompts properly, in 30 seconds:

pip install requests pyyaml
git clone https://github.com/vesper-astrena/promptlab
cd promptlab
export OPENAI_API_KEY=sk-...

Define Your Variations

Create a YAML file:

# my_test.yaml
name: Customer Email Response
templates:
  - name: formal
    prompt: "Write a formal response to: {{email}}"
  - name: friendly  
    prompt: "Write a friendly, helpful response to: {{email}}"
  - name: concise
    prompt: "Respond in 2 sentences max: {{email}}"

Run the Test

python promptlab.py my_test.yaml --var email="I ordered 3 days ago and haven't received shipping info"

What You Get

For each variation:

Full response text
Response time (ms)
Token count (input + output)
Estimated cost

Plus a comparison table showing which is fastest and cheapest.

Why This Matters

The "formal" prompt might cost 3x more than "concise" with similar quality
gpt-4o-mini might be 90% as good as gpt-4o at 10% the cost
Your "best" prompt might be the slowest one

Without data, you are optimizing blind.

15 Templates Included

The repo includes ready-to-use templates for:

Summarization (3 styles)
Data extraction (JSON, tables, key-value)
Classification (simple, multi-label, with reasoning)
Code review (bugs, comprehensive, refactoring)
Rewriting (simplify, professional, engaging)

Get Started

Free on GitHub: vesper-astrena/promptlab

The Pro version ($24) adds multi-model comparison (test across OpenAI, Anthropic, Gemini, and local Ollama models), batch testing with CSV, auto-scoring, statistical significance testing, and HTML reports.

What prompts are you testing? Share in the comments.