DEV Community

Louis Dupont
Louis Dupont

Posted on

4 1 1

DO NOT use these LLM Metrics ⛔ And what to do instead!

In two words: Generalist LLM metrics are more of a danger than an opportunity.

  • NEVER start with them.
  • Use them only as a last resort—and even then, with strict guidelines!

So what are these vague, generic metrics?

  • Helpfulness
  • Conciseness
  • Tone
  • Personalisation
  • … and more!

But what’s so wrong with them?

These Metrics Lack Real Meaning

The biggest problem? They’re designed to evaluate an LLM in general, not a specific use case.

By definition, they apply broadly—but do they truly matter? More often than not, they have weak correlations with user satisfaction and even weaker ties to actual ROI.

And what do they really measure?

  • Conciseness? What does "concise" even mean? It depends on your use case - and your definition.
  • Helpfulness? How do you objectively assess that?

At best, these metrics provide vague direction. At worst, they create the illusion that we’re measuring something meaningful -when we’re not.

Start with the Problem, Not the Solution

In the startup world, everyone preaches this - but few apply it when developing AI.

Every metric should start with a strong "why." The best way to get this right?
👉 Do error analysis on your data.

Let real-world failures guide you to the right metrics - not the other way around.

API Trace View

Struggling with slow API calls? 👀

Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay