DEV Community

Louis Dupont
Louis Dupont

Posted on

5 1 1

Evaluate your LLM! Ok, but what's next? 🤔

Everyone say you need to Evaluate your LLM. You just did it. Now what? 🤷‍♂️

You got a score. Great. Now, here’s the trap:

You either:

  • Trust it. ("Nice, let's ship!")
  • Chase a better one. ("Tweak some stuff and re-run!")

Both are horrible ideas.

Step 1: Stop staring at numbers.

Numbers feel scientific, but they lie all the time.

Before doing anything, look at actual examples. What’s failing?

  • Bad output? Fix the model.
  • Good output but bad score? Fix the eval.
  • Both wrong? You’ve got bigger problems.

Step 2: Solve the right problem.

If your model sucks, tweak:

  • Prompts
  • Data retrieval
  • Edge cases

If your eval sucks, rethink:

  • Your scoring function
  • What “good” even means

Step 3: Iterate like a maniac.

Change something → Run eval → Learn → Repeat.

Basically, do Error Analysis on your Evals (instead of on your LLM)!

Chasing numbers isn’t progress. Chasing the right insights is.

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

Top comments (0)

The Most Contextual AI Development Assistant

Pieces.app image

Our centralized storage agent works on-device, unifying various developer tools to proactively capture and enrich useful materials, streamline collaboration, and solve complex problems through a contextual understanding of your unique workflow.

👥 Ideal for solo developers, teams, and cross-company projects

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay