DEV Community

Louis Dupont
Louis Dupont

Posted on • Edited on

Evaluate your LLM! Ok, but what's next? πŸ€·β€β™‚οΈ

Everyone say you need to Evaluate your LLM. You just did it. Now what? πŸ€·β€β™‚οΈ

You got a score. Great. Now, here’s the trap:

You either:

  • Trust it. ("Nice, let's ship!")
  • Chase a better one. ("Tweak some stuff and re-run!")

Both are horrible ideas.

Step 1: Stop staring at numbers.

Numbers feel scientific, but they lie all the time.

Before doing anything, look at actual examples. What’s failing?

  • Bad output? Fix the model.
  • Good output but bad score? Fix the eval.
  • Both wrong? You’ve got bigger problems.

Step 2: Solve the right problem.

If your model sucks, tweak:

  • Prompts
  • Data retrieval
  • Edge cases

If your eval sucks, rethink:

  • Your scoring function
  • What β€œgood” even means

Step 3: Iterate like a maniac.

Change something β†’ Run eval β†’ Learn β†’ Repeat.

Basically, do Error Analysis on your Evals (instead of on your LLM)!

Chasing numbers isn’t progress. Chasing the right insights is.

Top comments (0)