Skip to content

DEV Community

Louis Dupont

Posted on Feb 16 • Edited on Mar 20

Evaluate your LLM! Ok, but what's next? 🤷‍♂️

#rag #machinelearning #ai #llm

Everyone say you need to Evaluate your LLM. You just did it. Now what? 🤷‍♂️

You got a score. Great. Now, here’s the trap:

You either:

Trust it. ("Nice, let's ship!")
Chase a better one. ("Tweak some stuff and re-run!")

Both are horrible ideas.

Step 1: Stop staring at numbers.

Numbers feel scientific, but they lie all the time.

Before doing anything, look at actual examples. What’s failing?

Bad output? Fix the model.
Good output but bad score? Fix the eval.
Both wrong? You’ve got bigger problems.

Step 2: Solve the right problem.

If your model sucks, tweak:

Prompts
Data retrieval
Edge cases

If your eval sucks, rethink:

Your scoring function
What “good” even means

Step 3: Iterate like a maniac.

Change something → Run eval → Learn → Repeat.

Basically, do Error Analysis on your Evals (instead of on your LLM)!

Chasing numbers isn’t progress. Chasing the right insights is.

Top comments (0)

Subscribe

I build local, custom LLM solutions that prioritise privacy and efficiency. Linkedin 👉 https://www.linkedin.com/in/louis-dup/ Dive Deeper 👉 https://louis-dupont.github.io/Blog/

Work

Deep Learning Engineer
Joined

Nov 26, 2024

How to move beyond Vibe Checking

#ai #rag #machinelearning #llm

Why Most AI Teams Are Stuck 🤔

#ai #machinelearning #rag #llm

DO NOT use these LLM Metrics ⛔ And what to do instead!

#ai #rag #openai #machinelearning